Skip to content
Topic pillar · 17 tracked pieces

Topic · AI agent procurement

The contracts, SLAs, and evaluation criteria that distinguish agentic-AI procurement from SaaS procurement.

RFPs, SLAs, contract clauses, and the vendor-evaluation rubrics that survive procurement review.

Agent procurement is procurement with three new variables most contracts don't yet handle: non-deterministic outputs, long-running autonomous workloads, and the question of what counts as a vendor-side defect when the system that fails is a stochastic model.

Standard SaaS procurement clauses don't cover agents. Standard MSAs assume deterministic services delivered to spec. Standard SLAs measure uptime, not output validity. The procurement teams that ship agentic AI without burning a year on contract negotiation are the ones using rubrics built for the new failure modes — and there are very few of those rubrics in public.

This pillar publishes AI agent SLA templates with measurable thresholds for output validity, time-to-detect failure, and time-to-recovery, with examples from named deployments that work. MSA and DPA clauses specific to agentic AI — non-determinism allowances, reproducibility carve-outs, training-data segregation, agent-credential rotation policies. RFP question libraries with 60+ procurement questions mapped to the GAUGE rubric and evidence prompts per question.

Vendor-evaluation rubrics — Anthropic versus OpenAI versus Google versus Microsoft for enterprise agent workloads, with the comparator dated and the methodology declared. Build-versus-buy-versus-partner analysis — the three-way decision most enterprises now face on every agent workload, with named-company case studies of each path.

Pieces here cite real procurement contracts where parties have given permission, anonymised case studies where they haven't, and named vendor documentation in every comparison.

Pillar last refreshed 2026-05-01

What survives review

What has broken

Spoke articles

  • The agentic AI pilot-to-production gap: what vendor 'successful pilot' references do not tell procurement

    Vendor 'successful pilot' references are the most common evidence presented to enterprise procurement committees evaluating agentic AI. McKinsey State of AI 2025 (Nov 2025, n=1,491) reports 23% of enterprises scaling and 39% still experimenting; the documented 2024-2025 walk-backs (Klarna 700-agent reversal, Salesforce Agentforce 200-customer reality, GitHub Copilot April 2026 token-counting bug) describe what those references typically obscure. The gap between vendor-reference pilot success and procuring-enterprise scaled production is operational, and it is the procurement committee's job to make the regime-translation question explicit before the contract closes.

  • Vendor MSA renewal in the post-EU-AI-Act-enforcement window: what changes in the AI MSA red-team checklist after 2 August 2026

    The 38-item AI MSA red-team checklist (RES-005) covered the seven clause families where 2025-2026 enterprise AI MSAs cluster their failure modes. The 2 August 2026 EU AI Act deployer-obligations enforcement window adds three new procurement-defensible asks that were not load-bearing in pre-enforcement contracts: Article 11 technical-file pass-through, Article 16 post-market-monitoring support, and Article 26 deployer-documentation supply. Plus the asymmetric-instrument observation that procurement teams across enterprise and operator scales face the same vendor-citation-chain manipulation pattern with different audit instruments — a 600-word insert that lives at the intersection of this piece's procurement frame.

  • How vendor case studies travel between enterprise and operator AI buyers — and what each cohort gets wrong from the other's evidence

    Enterprise AI buyers and operator AI buyers consume vendor case studies aimed at the other cohort and produce mirror-image misreads. The Fortune-500-bank case lands in operator decks as 'this works at SMB scale too' (it usually does not, in the way the case study describes). The IndieHacker testimonial lands in enterprise decks as 'even small teams ship it' (the small team's operational substrate is structurally different from the enterprise's). The mechanism is the same — vendor citation chains travel cohort-to-cohort with applicability mismatches the readers do not catch — and the procurement cost is paid in both registers. This is the bridge piece between AM-* and OPS-* registers that the four expert reviewers said earned its slot.

  • Foundation-model uptime in 2026: the 24-month outage record across Anthropic, OpenAI, Google, AWS Bedrock, and Azure OpenAI

    Foundation-model providers publish status pages that report on the model API as if it were one service. The 24-month operational record across Anthropic, OpenAI, Google, AWS Bedrock, and Azure OpenAI does not support that framing. The procurement-defensible posture in 2026 is multi-provider routing with documented failover, and the SLA gap between what vendors publish and what enterprise contracts actually need is now wide enough to be the primary procurement signal in foundation-model selection.

  • Agent evaluation in production: eval-set design, drift detection, and regression budgets for the deployed agent

    The four 2026 agent-evaluation platforms (DeepEval, Braintrust, LangSmith, Patronus) covered at AM-122 are the procurement decision. The evaluation discipline that decides whether the chosen platform produces useful signal is the eval-set design, the drift-detection cadence, and the regression-budget framework — the three operational disciplines most enterprises buy a platform for and then under-invest in. This piece walks the in-production cut that sits between the eval-tooling decision and the MTTD-for-Agents observability framework.

  • Agentic AI 2024-2025 retrospective: what actually shipped, what walked back, and what 2026 procurement should learn from each

    Read against audited primary sources rather than vendor decks, agentic AI 2024-2025 produced four classes of evidence the 2026 procurement reader should distinguish: vendor-published wins inside vendor-controlled environments, audited customer pilots with active human oversight, the public walk-backs (Klarna, GitHub Copilot rate-limit, EchoLeak), and the structural failure modes (multi-step reliability, prompt-injection class). Each class produces a different procurement lesson; treating them as one 'AI is working' narrative is the most common 2026 enterprise mistake.

  • Agent observability in 2026: Langfuse, Arize, Helicone, and LangSmith — and the procurement decision that is not the eval decision

    Evaluation tells you whether the agent is right. Observability tells you what the agent did. Production deployments need both, the procurement decisions are different, and conflating them produces SLA architecture that fails its first incident. The four credible 2026 observability platforms (Langfuse, Arize, Helicone, LangSmith) split cleanly on one structural axis: open-source-first vs SaaS-first. Helicone has just gone into maintenance mode.

  • Agent evaluation frameworks in 2026: DeepEval, Braintrust, LangSmith, and Patronus map to four deployment shapes

    The four credible agent-evaluation platforms in 2026 don't compete on capability rank. They fit four distinct deployment shapes. DeepEval is the open-source pytest-native option. Braintrust is the SaaS eval primitive. LangSmith is the LangChain-stack observability and eval bundle. Patronus has pivoted from hallucination specialist to digital-world-model frontier lab. Picking on a generic feature matrix produces the wrong answer for most enterprises.

  • Reinsurance and the catastrophic AI tail: why your cyber renewal is tightening

    Primary cyber-insurance carriers are not the source of 2026 cyber-renewal tightening; the reinsurance market behind them is. Lloyd's of London, Munich Re, and Swiss Re have been recalibrating their assumptions about cascading agent-failure scenarios, and the rate signal travels downstream to the policy your General Counsel is renewing this quarter.

  • AI Bill of Materials in 2026: when AI-BOM becomes a procurement requirement

    AI-BOM is moving from optional security artefact to enforceable procurement requirement, driven by EU AI Act Article 11 documentation and the CycloneDX ML-BOM specification. Enterprises tracking SBOM compliance are blindsided when AI procurement requires a different inventory shape.

  • Agentic-AI vendor contracts: the six gotchas in 2026 enterprise MSAs that procurement teams routinely miss

    2026 agentic-AI MSAs hide six contract patterns that transfer risk from vendor to enterprise. CIOs signing without redlines on all six are absorbing exposure their boards have not approved.

  • Anthropic vs OpenAI vs Google vs Microsoft for enterprise agents in 2026

    The four credible enterprise agentic AI platform plays in 2026 are Anthropic, OpenAI, Google, and Microsoft. The procurement decision between them is no longer primarily about model capability. It is about pricing model, governance and BAA posture, and ecosystem distribution. Treating it as a model-quality bake-off is the most common 2026 procurement mistake.

  • The 2026 Enterprise Agentic AI Procurement Playbook

    A six-stage procurement track integrating build-vs-buy-vs-partner, the 60-question RFP, GAUGE governance scoring, four-vendor comparison, and EU AI Act compliance into one operational sequence. Ships in 8 to 10 weeks for standard enterprise environments. Produces an audit-defensible procurement artifact that satisfies EU AI Act Article 9 by construction.

  • AI agent ROI calculator: the 2026 enterprise framework

    Eight-input ROI calculation framework for enterprise AI agent deployments. Covers what standard SaaS calculators miss: per-session-hour cost, HITL labour, instrumentation, compliance, productivity uplift, avoided incidents, revenue net of regression risk, strategic-option value.

  • AI agent contract exit clauses: 8 provisions for 2026

    Eight contract exit-clause provisions that standard SaaS templates do not cover but enterprise agentic AI procurement requires: audit-log export, trained-state extraction, prompt portability, connector reconfiguration, named handoff, regulatory-evidence preservation, data-residency continuity, liability-tail.

  • AI assistant vs AI agent: the procurement distinction

    AI assistants and AI agents are not the same product class. One suggests; the other acts. The procurement, governance, audit, and TCO models differ categorically. Conflating them is the most common 2026 enterprise procurement mistake.

  • The enterprise agentic AI RFP: 60 vendor questions

    Generic SaaS RFPs miss six dimensions that decide whether an agentic deployment survives 18 months. Here's the GAUGE-mapped 60-question version.

What we're watching next

  • Major frontier-vendor agent-class SLAs going public with action-correctness commitments.As of Q2 2026, no vendor publishes a contractual SLA on output validity for agent-mode workloads. The pillar's contract-gotchas piece predicts this gap closes through 2026; the first vendor to ship one resets the procurement bar.
  • Lloyd's of London or Munich Re publishing a stand-alone agent E&O wording.Existing cyber and tech-E&O policies cover agent risk under stretched interpretations of legacy policy language. A purpose-built agent E&O wording — when it lands — changes the insurance-and-underwriting cluster's verdicts and the procurement contract's risk-allocation clauses.
  • DORA enforcement actions against critical-third-party AI providers.DORA's critical-ICT-provider designation is now active in the EU. The first AI-specific enforcement action will surface what the 'audit rights' and 'sub-outsourcing disclosure' clauses actually require in practice.
  • Industry-standard agent procurement RFP language emerging from analyst firms or industry consortia.The 60-question RFP this pillar publishes is a working draft. Convergence with Gartner, Forrester, or industry-consortium drafts would consolidate procurement vocabulary; divergence would force a clearer position on what the gaps are.

Primary sources we trust for this topic

A curated list of primary research, regulator guidance, and vendor documentation for ai agent procurement. Populated on the quarterly refresh — not a link dump, not competitors.


This pillar page is refreshed quarterly. Last refresh: 19 Apr 2026. Next refresh: 18 Jul 2026.

Vigil · 40 reviewed