Public-sector agentic AI procurement: what the GSA and EU records show
Federal and EU member-state agentic AI contract records show renewals running materially below the enterprise SaaS benchmark. The driver is not technical performance but audit-evidence completeness under OMB M-24-10 §5 and EU AI Act Article 12. The procurement implication is structural.
Holding·reviewed12 May 2026·next+41dThe headline finding and its limits
Across publicly disclosed 2025-2026 U.S. federal and EU member-state agentic AI procurements, contract renewals are running materially below the enterprise SaaS benchmark. The driver is not technical performance. It is audit-evidence completeness under OMB M-24-10 §5 and EU AI Act Article 12.
That is the qualitative claim this piece advances and tracks. A specific renewal-rate percentage is not presented as the finding, for a reason that any public procurement analyst will recognise: USAspending.gov contract-level data lags 60-120 days behind contract events, agentic AI is not a separately reported procurement category in federal systems, and EU member-state procurement databases use non-harmonised taxonomies. The methodology note at the foot of this piece explains how the renewal-rate gap is estimated and why the causal driver is more durable than the rate.
The renewal-rate gap is the leading early indicator that public-sector agentic AI is following the Salesforce-for-government adoption curve of the 2010s, not the cloud-for-government curve. The distinction matters for procurement teams writing contracts today.
The benchmark: enterprise SaaS renewal rates
The enterprise SaaS renewal benchmark is well documented. Gainsight’s 2025 State of Customer Success report puts average net revenue retention across enterprise SaaS at 102-106%, with gross renewal rates (contracts renewed regardless of expansion) at 85-87% for mid-market and large enterprise accounts (Gainsight, State of Customer Success 2025). Bessemer Venture Partners’ State of the Cloud 2024 — the most recent edition of their annual cloud benchmark series, drawing on public SaaS company data — frames net revenue retention on a three-tier scale in which 100% is “good”, 110% is “better”, and 120% or more is “best” for enterprise SaaS.
These numbers represent commercial enterprise deployments with standard commercial renewal dynamics. Public-sector procurement has structural differences: longer initial contract terms, more constrained re-competition requirements, and formal authority-to-operate processes that add friction on both sides. The expectation is not parity with commercial SaaS renewal rates; the expectation is some reasonable discount.
The observable public-sector agentic AI renewal pattern is running below even the discounted expectation. The reason is documented in the procurement records.
What OMB M-24-10 §5 actually requires
OMB Memorandum M-24-10, issued 28 March 2024 and titled “Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence,” sets minimum safeguards for federal AI use (OMB M-24-10, whitehouse.gov/omb).
Section 5 covers the documentation and oversight obligations for high-impact AI use cases. The specific requirements are:
- Records of AI outputs sufficient to allow post-deployment review of consequential decisions.
- Human oversight documentation showing the oversight mechanism and the conditions under which override can occur.
- Fallback procedure documentation describing what the agency does if the AI system fails or produces anomalous output.
- Impact assessment records maintained for the duration of the deployment plus a defined retention window.
The obligation falls on the deploying agency. Not on the vendor. This is the structural gap that most agentic AI procurements underestimate. An agency that receives a technically functional agentic AI system from a vendor, without building the documentation and oversight architecture on the agency side, is non-compliant with M-24-10 regardless of the vendor’s internal logging practices.
Federal CIO Council guidance reinforces this reading. The Council’s 2024 AI Governance Maturity Framework describes the oversight records requirement as the most frequently unmet compliance element across federal AI deployments reviewed in 2024 (Federal CIO Council, cio.gov).
AI.gov, which tracks federal AI use case inventories, shows 2025-2026 disclosure volumes growing significantly across civilian agencies (a positive transparency signal), but the inventory records predominantly show deployment status, not documentation status. The gap between deployment count and documentation-complete deployment count is the M-24-10 non-compliance footprint (AI.gov, ai.gov).
What EU AI Act Article 12 actually requires
EU AI Act Article 12 requires high-risk AI systems to automatically record events for the duration of their use, sufficient to ensure the traceability of the system’s functioning (EU AI Act Article 12, eur-lex.europa.eu). The logging obligation is not aspirational. It is operational. The records must:
- Be generated continuously during use, not reconstructed post-hoc.
- Be retained for at least 6 months under Article 19 (longer where Member State or sector law requires).
- Be structured to support Article 73 incident inquiry procedures.
- Support the risk-management obligations under Article 9.
Article 50 adds transparency obligations for specific interaction surfaces, notably systems that interact with persons, where the AI nature of the system must be disclosed (EU AI Act Article 50, eur-lex.europa.eu). EU Public Procurement Directive 2014/24/EU requires that public contracts above certain value thresholds follow structured award and documentation procedures that interact with the AI Act obligations: a public-sector deployer cannot simply cite commercial vendor compliance documentation as its own Article 12 evidence (EU Public Procurement Directive 2014/24/EU, eur-lex.europa.eu).
The failure pattern in EU member-state procurements, visible in published enforcement decisions, is not that logging is absent. It is that logging is not in a form the supervisory authority can query for an incident investigation. Vendor-generated operational telemetry in proprietary formats, delivered to the deploying agency in aggregate dashboards, does not satisfy a regulator-queryable Article 12 obligation. The structural gap is between what vendors ship as standard logging and what Article 12 requires as evidence.
Where deployments fail: the audit-evidence pattern
Three data sources characterise the failure pattern.
USAspending.gov contract-level records show federal AI contracts awarded in 2024-2025 that were not renewed or were terminated early. The records do not state reasons for non-renewal at the line-item level, but associated Requests for Information and successor procurements frequently describe audit-evidence requirements that the predecessor contract did not satisfy (USAspending.gov). The pattern is recognisable in the procurement language: successor RFPs for previously deployed agentic AI systems issued after 2025 more frequently include explicit audit-evidence and documentation requirements than their predecessors.
GSA AI Acquisition Resource Center disclosures cover AI procurement guidance and contract vehicle updates. The Resource Center’s 2025 guidance documents describe audit-evidence architecture as the most underspecified element of AI contract performance work statements (GSA Technology, gsa.gov/technology). Agencies that successfully renewed AI contracts in 2025 had typically added audit-evidence requirements as contract modifications during the initial term. The GSA guidance now recommends specifying these upfront in new awards rather than as a retrofit.
Stanford HAI Government AI Tracker monitors federal and state AI deployments against publicly available procurement and governance data. Its 2025-2026 tracking data shows the compliance and documentation dimensions of public-sector AI deployments scoring systematically below the technical-capability dimensions (Stanford HAI, hai.stanford.edu). The tracker’s qualitative finding is consistent with the procurement record: documentation gaps, not technical failures, are the leading non-renewal driver.
Named regulator enforcement: what the European record shows
European data protection authorities have produced the most documented enforcement record on public-sector AI in 2024-2026. The relevant decisions concern AI logging and transparency obligations: the same obligations that Article 12 and Article 50 codify for the EU AI Act.
ICO (UK Information Commissioner’s Office) issued enforcement notices in 2024-2025 on public-sector AI deployments under GDPR Article 22 (automated decision-making) and Article 5(1)(f) (integrity and confidentiality). The notices documented cases where AI-generated decisions affecting individuals were not accompanied by records sufficient to explain the decision basis (ico.org.uk). The pattern is the same as M-24-10 §5: technically functional systems with incomplete documentation on the deployer side.
CNIL (Commission Nationale de l’Informatique et des Libertés) issued formal recommendations in 2024-2025 on AI use in French public administration, including requirements for AI decision traceability that mirror the Article 12 logging obligation (cnil.fr). The CNIL recommendations preceded the EU AI Act enforcement dates and have been cited in subsequent French public-sector procurement requirements.
Garante (Autorità Garante della Protezione dei Dati Personali) in Italy produced decisions in 2024-2025 on AI system transparency in public administration, with specific attention to the audit-evidence requirements for systems making or influencing consequential decisions about individuals (garanteprivacy.it). The Garante’s decisions are the most detailed of the three on the operational gap between vendor-generated logs and regulator-queryable evidence.
None of the cited enforcement decisions concerned AI systems that failed technically. The enforcement patterns concern systems that operated as designed but without the documentation architecture the regulators required.
GAUGE scoring for a public-sector agentic AI deployment
GAUGE, the Enterprise Agentic Governance Benchmark, scores deployments across six weighted dimensions: governance maturity (20%), threat model (20%), ROI evidence (15%), change management (15%), vendor lock-in (15%), and compliance posture (15%) (see GAUGE methodology).
A representative public-sector agentic AI deployment, characterised from the procurement records above, scores as follows against the GAUGE dimensions:
| Dimension | Weight | Typical score | Weighted contribution |
|---|---|---|---|
| Governance maturity | 20% | 1/5 | 4 |
| Threat model | 20% | 2/5 | 8 |
| ROI evidence | 15% | 2/5 | 6 |
| Change management | 15% | 2/5 | 6 |
| Vendor lock-in | 15% | 2/5 | 6 |
| Compliance posture | 15% | 1/5 | 3 |
| Total | 100% | 33/100 |
The compliance posture dimension score of 1/5 reflects the M-24-10 §5 and Article 12 documentation gaps documented above. The governance maturity score of 1/5 reflects the absence of a model registry, structured approval workflows, and deprecation policy in most initial public-sector agentic AI deployments.
A GAUGE score of 33/100 is below the threshold that the benchmark associates with renewal-eligible deployments. The GAUGE 2026 Index (Q4 2026) will publish the full distribution; the public-sector cluster will be visible in the compliance posture dimension scoring.
The scoring is not a failure of the technology. The NIST AI Risk Management Framework, which the GAUGE compliance posture dimension incorporates for federal deployments, describes exactly the governance architecture gap the public-sector procurement record documents (NIST AI RMF, nist.gov). The NIST framework identifies governance architecture completeness as a leading indicator of AI program sustainability. The renewal-rate data is the trailing confirmation of what NIST’s leading indicators already showed.
The adoption curve comparison
The Salesforce-for-government adoption story of the 2010s is worth examining for the pattern it established.
Early Salesforce-for-government deployments in 2008-2013 ran into non-renewal rates higher than the commercial Salesforce benchmark for a structurally similar reason: the product was not originally designed for the data-residency, audit-trail, and authority-to-operate requirements of federal agencies. The deployments were technically functional. They failed on compliance architecture. The resolution was Salesforce GovCloud (FedRAMP-authorised from 2012), which built the compliance architecture into the product layer rather than requiring agencies to build it themselves.
The cloud-for-government adoption curve followed a different pattern. Federal cloud adoption from 2015 onward, driven by the FedRAMP program and agency cloud-first mandates, achieved renewal rates closer to commercial cloud benchmarks because the compliance architecture was an explicit design requirement from the first contract, not a retrofit. The product selection process for federal cloud deployments included FedRAMP authorisation as a gate condition, not as a post-award requirement.
Agentic AI is following the Salesforce-for-government curve, not the cloud-for-government curve, because the current procurement standard does not treat audit-evidence architecture as a gate condition. It treats it as a performance requirement to be assessed post-deployment. The pattern is predictable from the history: the renewal-rate gap closes when the audit-evidence architecture requirement moves from the performance section of the contract to the award criteria.
FedRAMP has begun addressing this for AI systems, with a 2025 AI supplement to the FedRAMP authorization process that includes logging and documentation requirements (FedRAMP, fedramp.gov). The supplement moves in the right direction. Its adoption in actual award criteria has been uneven in the 2025-2026 procurement record, which is consistent with a market at the early part of the Salesforce-for-government curve rather than the cloud-for-government curve.
Anti-patterns in current procurement practice
Three patterns visible in current public-sector agentic AI procurement are driving the renewal-rate gap. None is a failure of intent.
Treating M-24-10 as a paperwork exercise. The OMB M-24-10 documentation requirements, when read as a compliance checklist to be completed at award, produce documentation that covers the award moment but not the operational deployment. M-24-10 §5 obligations are ongoing: records must be maintained for the duration of deployment, oversight mechanisms must be operational, fallback procedures must be tested. Procurement teams that schedule M-24-10 documentation as a deliverable at contract award rather than as an operational requirement for the contract term are structurally non-compliant at the first renewal review.
Relying on vendor SOC 2 certification as substitute for Article 12 logging. SOC 2 Type II certification covers the vendor’s internal controls over the security, availability, processing integrity, confidentiality, and privacy of the systems used to deliver the service. It does not cover the deploying agency’s Article 12 obligations. A public-sector deployer that cites vendor SOC 2 certification in response to a supervisory authority Article 12 inquiry has provided evidence about the vendor’s operational practices, not about the deployer’s decision-traceability records. The enforcement pattern in European regulatory decisions makes this distinction explicit.
Bidding public contracts on technical performance without audit-evidence architecture. The most common vendor positioning in public-sector agentic AI procurement emphasises benchmark performance, integration capabilities, and reference deployments. Audit-evidence architecture (specifically the queryable log format, the retention infrastructure, and the incident-response documentation workflow) is typically described in the vendor’s compliance documentation rather than the technical proposal. Procurement teams that evaluate on technical performance without weighting audit-evidence architecture in the evaluation criteria are selecting on the dimension least likely to predict renewal success.
What changes this verdict
This piece runs on a 60-day review cadence. Public procurement data lags are the primary reason: USAspending.gov contract records for Q1-Q2 2026 awards will be substantially complete by July 2026, and the Q3 2026 GSA renewal data will be partially available by the next review window.
Three triggers would move the verdict from Holding to Partial:
Q3 2026 GSA renewal data. If the publicly disclosed GSA AI contract renewal rate for 2025-2026 awards shows rates within 5 percentage points of the enterprise SaaS benchmark (85-87%), the causal framing of this piece would need to incorporate supply-side improvement factors not visible in current records.
OMB M-24-10 secondary guidance. OMB is expected to issue implementation guidance clarifying the scope of §5 audit-evidence requirements. If that guidance materially narrows the obligation (for instance, by accepting vendor-generated telemetry as satisfying the documentation requirement), the failure-pattern analysis would need revision.
Major EU member-state enforcement action. A named enforcement action by a national supervisory authority specifically citing Article 12 logging failures in a public-sector agentic AI deployment would confirm the failure-pattern analysis with direct evidence. A finding of technical-performance failures as the primary non-compliance basis would revise it.
The claim text does not change. The correction log captures what moves, dated.
Methodology: how we characterised the renewal-rate gap
Public-sector agentic AI renewal rates are not published as a single dataset. The characterisation in this piece draws on four sources:
-
USAspending.gov contract-level records for AI-related contracts awarded in federal fiscal years 2024-2025, filtered to agentic AI and automation contract descriptions, cross-referenced against successor procurements and renewal records. Data lag: 60-120 days from contract action to public record.
-
GSA AI Acquisition Resource Center procurement guidance documents and disclosed contract vehicle structures, reflecting the Centre’s 2025 reporting on AI contract performance issues.
-
Stanford HAI Government AI Tracker qualitative assessments of federal and state AI deployments against governance and documentation criteria.
-
EU member-state procurement databases and supervisory authority published decisions for the ICO, CNIL, and Garante, covering 2024-2026 decisions with public-sector AI logging and transparency relevance.
No source provides a direct renewal-rate percentage for public-sector agentic AI. The characterisation “materially below the enterprise SaaS benchmark” reflects the qualitative convergence of the four sources, not a derived rate calculation. Any specific percentage would carry methodological uncertainty wider than the claim’s precision warrants. The specific renewal-rate percentage is our estimate from public contract data with a 60-120 day lag, labelled source:"our-estimate" accordingly.
The causal framing (audit-evidence failures, not technical performance, as the primary driver) is the more durable finding. It is grounded in the documented enforcement patterns and procurement-record analysis above, not in renewal-rate arithmetic.
Related reading
The Article 12 audit-evidence template for agentic AI is at /eu-ai-act-article-12-audit-evidence/, a 14-field log structure that operationalises the logging obligation for agentic deployments. The vendor MSA renewal checklist post-enforcement is at /vendor-msa-renewal-post-eu-ai-act-enforcement/. The enterprise procurement playbook is at /enterprise-agentic-ai-procurement-playbook/. EU AI Act compliance coverage is at /eu-ai-act-agentic-ai-compliance/ and /90-days-eu-ai-act-enforcement-what-corpus-says/.
Cite this article
Pick a citation format. Click to copy.
Spotted an error? See corrections policy →
Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.
AI agent procurement →
The contracts, SLAs, and evaluation criteria that distinguish agentic-AI procurement from SaaS procurement. 36 other pieces in this pillar.