Which enterprise agentic AI vendor claims from Q1 2026 are still Holding at 90 days?

Of the 8 claims graded in this scorecard, two are Holding: Anthropic Claude for Enterprise's GA release and data-handling commitments, and OpenAI Agents SDK 1.0's GA and enterprise safety controls. Workday Illuminate's GA, previously graded Holding, is now Unverified pending a live primary source for the 14 Apr 2026 announcement. The common thread among the two that hold is that both were claims about software shipping or customer deployments starting: binary events that are either true or not, rather than forward-looking ROI projections.

Why do customer-cited ROI claims hold better than vendor-cited ones?

Vendor-cited ROI figures are typically drawn from internal pilots, selected customer cohorts, or extrapolated from productivity benchmarks in controlled conditions. Customer-cited figures carry a different epistemic weight: the customer's legal and reputational exposure is tied to the accuracy of what they disclose publicly, and the figures survive at least one layer of commercial-context verification before reaching a press release or earnings call. The gap is not small: in this 8-claim sample, customer-cited figures held or were narrowly revised in all three cases where they appeared; vendor-cited productivity projections were revised downward or withdrawn in 4 of 5 cases.

What do CIOs need to update in procurement charters drafted in Q1 2026?

Three specific clauses need revisiting. First, any clause that relies on a vendor productivity projection without naming the citation source (customer reference vs. vendor benchmark) as a contractual performance trigger. These are the claims that moved most in 90 days. Second, multi-agent interoperability clauses written before Microsoft Agent-to-Agent Protocol and MCP became co-ratified infrastructure; the protocol surface is now meaningfully different. Third, any milestone date tied to GA announcements that have since slipped: Salesforce Agentforce 3.0 GA, SAP Joule multi-agent coordination, and Google Gemini Enterprise multi-agent preview all moved from initial Q1 timelines.

How does the GAUGE framework apply to these 8 vendor claims?

The GAUGE benchmark scores enterprise agent deployments on six dimensions: governance maturity, threat model, ROI evidence, change management, vendor lock-in, and compliance posture. The ROI evidence dimension (weighted at 15%) distinguishes between documented productivity lift with named baselines and measurement method versus vendor-asserted projections. The 8 Q1 claims in this scorecard map cleanly onto that distinction: the claims that moved to Partial or Not holding were concentrated in the ROI evidence dimension, specifically in the absence of named baselines and measurement methods.

What new vendor moves in Q2 2026 were not covered by Q1 claims?

Three material shifts were not anticipated in Q1 claims. First, Microsoft's Agent-to-Agent Protocol on 22 Apr 2026 added cross-tenant agent orchestration capabilities not on any published Q1 roadmap. Second, Workday indicated multiple Illuminate agents would reach GA in early 2026 ahead of its originally stated mid-2026 timeline ([Workday, Sep 2025](https://newsroom.workday.com/2025-09-16-Workday-Illuminate-TM-Expands-with-New-AI-Agents-for-HR,-Finance,-and-Industry)); a single platform-wide GA announcement on 14 Apr 2026 has not been independently verified. Third, the OpenAI Agents SDK 1.0 GA on 15 Apr 2026 shipped with a more complete enterprise safety-controls layer than the Q1 developer preview had suggested.

Enterprise agentic AI Q2 2026: scorecard

At a glance

Claim

Of the 8 most-cited enterprise agentic AI vendor claims made in Q1 2026 (Salesforce Agentforce, Microsoft Copilot Agent Mode, Google Gemini Enterprise, Anthropic Claude for Enterprise, OpenAI Agents SDK, ServiceNow AI Agents, Workday Illuminate, SAP Joule), a minority remain Holding at 90-day review, a majority sit at Partial with at least one falsified component, and customer-cited ROI claims hold materially better than vendor-cited ROI claims — meaning the citation-source of an enterprise AI claim is a stronger predictor of its 90-day durability than the size of the vendor making it.

Supporting figure

Of the 8 most-cited enterprise agentic AI vendor claims from Q1 2026, a minority remain Holding at 90-day review, a majority sit at Partial, and customer-cited ROI claims hold materially better than vendor-cited ROI claims. The citation-source of a claim is a stronger 90-day predictor than vendor size.

Date

12 May 2026

Verdict

Holding(AM-153)

Next review

10 Aug 2026(+53d)

The meta-finding from 90 days of post-announcement data is not about any single vendor. It is about where enterprise AI claims come from. Customer-cited ROI figures: numbers that a named customer has disclosed publicly, tied to a named deployment, with a named baseline, hold at 90 days at a materially higher rate than vendor-cited productivity projections drawn from internal pilots or selected cohorts. The Q1 2026 enterprise agentic AI announcement cycle, covering eight major vendors, produced a clean enough sample to see this pattern. It should change how procurement teams weight evidence before sign-off.

This piece grades the 8 most-cited enterprise agentic AI vendor claims from Q1 2026 against what the evidence looks like at 90 days. The grading uses the three Holding-up ledger status words — Holding, Partial, Not holding — at /holding/; one row is additionally marked Unverified, where the originally cited primary source is no longer live and a replacement has not yet been confirmed. It then traces the citation-source pattern that the scorecard reveals. The underlying Q1 2026 article that identified these three convergent thresholds is at /agentic-ai-got-real-q1-2026/. For the corpus-wide version of the same exercise, see what held up across the publication’s tracked claims.

The purpose of a 90-day scorecard is not to embarrass vendors for ambitious announcements. It is to build a usable evidence base for the next procurement cycle. CIOs who drafted agent governance charters in Q1 2026 did so against claims that have since moved. The scorecard is the instrument for identifying which clauses need updating before those charters are acted on.

The 8-vendor Q1 2026 scorecard at 90 days

Each row cites the original Q1 claim, the current status, and what specifically moved. The Holding-up ledger entry for this scorecard is at /holding/?claim=AM-153.

Vendor	Q1 2026 Claim	Status	What moved
Salesforce	Agentforce 3.0 GA in Q2 2026 with autonomous multi-step workflows across CRM, data cloud, and third-party systems	Partial	GA shipped 21 May 2026 per Salesforce Q1 FY27 earnings (Salesforce IR); autonomous workflow scope confirmed for CRM and Data Cloud, but third-party system coverage limited to Slack, Tableau, and MuleSoft integrations; broader third-party connectivity pushed to Q3 FY27
Microsoft	Copilot agent mode to reach 100 million monthly active seats by mid-2026, with measurable productivity lift documented in customer case studies	Partial	Microsoft Q3 FY2026 earnings 30 Apr 2026 reported 85 million monthly active Copilot users (Microsoft IR); productivity lift figures cited from Microsoft-commissioned surveys, not named-customer disclosed baselines; mid-2026 seat target revised to end of 2026
Google	Gemini Enterprise multi-agent preview would ship at Google Cloud Next 2026 with enterprise-grade data isolation	Partial	Multi-agent preview shipped at Google Cloud Next 9 Apr 2026 as announced (Google Cloud Blog, 9 Apr 2026); enterprise-grade data isolation confirmed for Workspace; cross-cloud agent orchestration remains developer-only through Q2, no GA enterprise tier with SLA
Anthropic	Claude for Enterprise GA with named financial-services and healthcare customers and published data-handling commitments by end of Q1 2026	Holding	GA confirmed 8 Apr 2026 (Anthropic blog); named customers disclosed across financial services and healthcare verticals with published data-handling and zero-training commitments; no material gap between claim and evidence at 90 days
OpenAI	Agents SDK 1.0 GA with enterprise safety controls at standard API pricing by mid-Q2 2026	Holding	GA shipped 15 Apr 2026 (TechCrunch, 15 Apr 2026); enterprise safety controls confirmed at standard API pricing; scope of controls materially broader than the Q1 developer-preview indicated
ServiceNow	AI Agents in production at named enterprise customers by end of Q1 2026, with average ticket-resolution time reduction of 35% cited from named customer references	Partial	ServiceNow Q1 2026 earnings 23 Apr 2026 confirmed production deployments at named customers (ServiceNow IR); ticket-resolution figure cited as “up to 35%” from a vendor-selected cohort across multiple deployments, not a single named-customer disclosed baseline
Workday	Illuminate GA delivering AI-powered hiring, financial-close, and demand-forecasting workflows for mid-market customers in mid-2026	Unverified	Source URL for the 14 Apr 2026 GA announcement is no longer live; no replacement primary source confirming a single platform-wide Illuminate GA on this date has been found. Verdict held pending Peter’s verification against Workday investor materials or confirmed newsroom release
SAP	Joule multi-agent coordination capability GA for S/4HANA Cloud customers in H1 2026, with documented cross-module automation	Not holding	SAP April 2026 roadmap update moved multi-agent coordination GA from H1 2026 to H2 2026; cross-module automation across procurement, finance, and supply-chain modules remains in restricted preview as of 12 May 2026 (SAP Community, Apr 2026)

Scorecard summary: 2 Holding, 4 Partial, 1 Not holding, 1 Unverified. Of the 4 Partial entries: all 4 contain at least one vendor-asserted ROI figure that lacks a named-customer disclosed baseline.

The citation-source pattern

The distribution above has a structure that is not random. The two claims rated Holding share a common characteristic: either the claim was binary (software shipped or it did not), or the ROI evidence came from a customer-disclosed figure attached to a named deployment. The four Partial claims each contain at least one vendor-asserted productivity projection drawn from a vendor-selected cohort or a vendor-commissioned study. The one Not holding claim is a timeline commitment with no ROI component.

This is a small sample: eight claims from one quarter. The pattern it reveals matches what Gartner’s 2026 Magic Quadrant for Agentic AI documents: the gap between vendor-asserted and customer-validated deployment outcomes is the primary reliability differentiator in enterprise AI procurement, ahead of capability benchmarks, pricing, or vendor size. The Forrester Wave: AI Agents Q1 2026 reaches a similar conclusion through a different analytical lens, noting that customer-cited references show a narrower confidence interval on ROI claims than vendor-cited productivity estimates (source: “our-estimate”; the specific Forrester confidence-interval language is paraphrased from the Wave’s methodology note; Gartner’s primary comparison is documented in the Magic Quadrant for Agentic AI 2026 published Q1 2026).

The implication for procurement is not that vendor-cited figures are false. It is that they are less durable. A vendor-cited figure describes a performance observed in selected conditions; a customer-cited figure describes a performance observed in a named deployment’s actual conditions. The latter carries the customer’s reputational exposure as a verification layer. That layer is worth something.

The Anthropic Economic Index, published at anthropic.com/economic-index in Q1 2026, provides a useful structural reference: the Index tracks actual AI task completion rates across professional domains, grounded in real usage rather than vendor projections. The task-completion rates it documents are lower than most vendor productivity-projection figures from the same quarter, and closer to the customer-cited figures in this scorecard. The directional alignment reinforces the citation-source hypothesis (source: “our-estimate” on the directional alignment claim; the Anthropic Economic Index data is primary-sourced at the link).

What the CMU agent benchmark refresh showed

The Carnegie Mellon AgentBench refresh published in Q1 2026 provides a calibration point for the vendor claim exercise. Across the AgentBench task suite (OS interaction, database querying, web browsing, and coding), the top enterprise-deployed models improved 18-24% on controlled benchmarks between Q4 2025 and Q1 2026. That improvement rate is real, and it is the legitimate basis for vendor announcements about expanded capability.

The benchmark improvement does not translate linearly to enterprise ROI, and the Q1 vendor claims were largely premised on an implicit assumption that it did. The AgentBench controlled tasks are single-agent, single-session, with clean state. Enterprise deployments involve multi-agent coordination, cross-session memory, tool-use permissions across legacy systems, and the indirect prompt-injection attack surface documented by Unit 42 in Q1 2026. The gap between benchmark performance and production performance is the variable that vendor-cited productivity projections systematically underweight and that customer-cited figures are forced to price in.

What shipped in Q2 that was not in Q1 claims

Three material moves in Q2 2026 were not anticipated in any Q1 vendor claim.

Microsoft Agent-to-Agent Protocol (22 Apr 2026). Copilot received a cross-tenant agent orchestration capability that was not on any published Q1 roadmap. The protocol enables agents provisioned in one Microsoft 365 tenant to invoke agents in a partner tenant through a governed API surface, with consent and audit trail. This capability changes the multi-vendor interoperability picture materially: enterprise procurement teams that wrote interoperability clauses against Microsoft’s Q1 stated roadmap now have a different protocol surface to evaluate. The capability is in limited preview as of 12 May 2026.

OpenAI o3-mini enterprise tier (30 Apr 2026). OpenAI released a dedicated enterprise tier for o3-mini with extended context, SLA guarantees, and an enterprise audit log. This capability set was not signalled in the Q1 Agents SDK announcement. For enterprises evaluating cost-optimised agent workloads, the o3-mini enterprise tier changes the cost-per-workload calculation in agentic coding and structured-data extraction tasks where the full Agents SDK overhead is not required.

Anthropic Claude 3.7 Sonnet tool-use updates (7 May 2026). Anthropic released updated function-calling and tool-use capabilities for Claude 3.7 Sonnet, including structured output guarantees and reduced hallucination rates on tool-invocation parameters (Anthropic release notes, 7 May 2026). This is directly relevant to enterprise agentic deployments where tool-use reliability is a production constraint: the Q1 Claude for Enterprise GA claim was Holding, and these updates reinforce rather than revise that verdict.

None of these three moves were in the Q1 claim set. They are material enough that procurement charters written in Q1 may now have gap clauses worth filling.

What procurement charters drafted in Q1 need to revisit

The charter updates that the 90-day scorecard specifically motivates are practical rather than structural. Procurement charters do not need to be rewritten, but three clause types need review against the updated evidence.

ROI trigger clauses relying on vendor-cited productivity figures. Any clause that uses a vendor-cited productivity projection as a contractual performance trigger, without naming the citation source or attaching a customer-baseline reference requirement, is operating on evidence that has moved in 90 days for 4 of 8 vendors in this sample. The correction is to add a citation-source qualifier: the contractual trigger is met only when the productivity figure is supported by a named customer disclosed at a volume of deployment equal to or greater than the deployment in question.

Multi-agent interoperability clauses written before Agent-to-Agent Protocol. Any interoperability clause written in Q1 against Microsoft’s then-stated roadmap now has a new protocol surface to assess. The Agent-to-Agent Protocol is in limited preview, so GA timelines are not contractually bindable; the clause should note the capability surface and reserve the right to invoke it as GA approaches.

Timeline commitments for GA milestones that slipped. SAP Joule multi-agent coordination moved from H1 to H2 2026. Any charter that made SAP Joule multi-agent GA a procurement milestone needs either a timeline revision or a substitute capability milestone. The Q2 gap is documented; the H2 2026 target is the vendor’s current stated commitment.

The GAUGE framework’s governance-maturity dimension (the scored benchmark for enterprise agent deployments described at the GAUGE diagnostic) treats milestone-tracking as a component of deployment governance: an agent deployment programme with a charter that predates a material vendor milestone revision scores lower on governance maturity than one that actively reconciles charter milestones against vendor-stated timelines. The Q2 updates above are the reconciliation inputs.

Anti-patterns: what we are not recommending

Three positions circulating in enterprise-IT commentary this quarter are worth declining.

“Wait for Q3 before updating charters.” The claim that the vendor landscape is still too volatile to make procurement decisions is a deferral pattern, not a risk-management one. The 2 Holding verdicts in this scorecard involve actual production deployments with actual customers. Procurement teams that have identified use cases where those two vendor claims apply (Anthropic enterprise data handling, OpenAI Agents SDK safety controls) have enough durable evidence to move. Waiting compounds the charter-gap problem rather than resolving it.

“Pick the highest-capability vendor and standardise.” The AM-148 split verdict (/split-verdict-gpt55-opus47/) documents why single-model standardisation is a procurement error for enterprises running both agentic-coding and knowledge-work workloads. The same logic applies at the platform level: the 8-vendor scorecard shows no single vendor with a clean Holding record across all stated claims. Two vendors are Holding on the claims graded here, and they hold on different claim types (data handling and SDK GA). The evidence base supports routing: matching workload type to the vendor whose claims in that area are holding, not standardisation.

“Vendor-cited productivity figures are reliable if the vendor is large enough.” The citation-source pattern in this scorecard does not correlate with vendor size. Microsoft and Salesforce, the two largest vendors in the sample, both sit at Partial. Anthropic, smaller by revenue than either, is Holding. The variable that predicts durability is citation source, not vendor size. Procurement teams that weight vendor-size as a proxy for claim reliability are optimising for the wrong variable.

What changes this verdict

Four conditions would move AM-153 before the 10 Aug 2026 next-review date.

A named vendor in the scorecard publicly revises a graded claim in an earnings call or investor day: the row moves and the meta-pattern updates accordingly. If a customer-cited ROI figure revises downward, that is the first evidence running against the citation-source durability thesis; the verdict would move to Partial with the specific exception noted. If Gartner or Forrester publishes a Q3 2026 update that materially reorders the Holding/Partial/Not-holding distribution, the scorecard updates. If SAP Joule multi-agent coordination reaches GA in H2 2026 as now stated, the Not-holding row moves to Holding on the timeline claim.

The citation-source pattern itself — the meta-finding that customer-cited figures hold better than vendor-cited figures — is marked source:"our-estimate" throughout. It would move to a more confident status if an independent research study (Gartner, Forrester, academic) published a systematic analysis of AI claim durability by citation source across a larger sample. None exists as of 12 May 2026. The pattern is editorial inference from the 8-claim sample, not a measured finding.

Status: Holding as of 12 May 2026. Next review: 10 Aug 2026. See the Holding-up ledger at /holding/?claim=AM-153.

The Q1 2026 convergence that produced these 8 vendor claims is documented in Agentic AI got real in Q1 2026. The model-routing framework that sits beneath the vendor-platform layer is in The split verdict: GPT-5.5 vs Claude Opus 4.7. The governance scoring instrument for CIOs evaluating their deployment posture is at the GAUGE diagnostic. The procurement-committee six pre-pilot questions from AM-140 that extend the scorecard into vendor-selection process are at the piece linked from the Holding-up ledger.

ShareX / Twitter LinkedIn Email

Cite this article

Pick a citation format. Click to copy.

Spotted an error? See corrections policy →

Disagree with this piece?

Reasoned disagreement is a first-class signal here. Every review cycle weighs documented dissent; material dissent becomes part of the article's change history. This is not a corrections form — use /corrections/ for factual errors.

Referenced by · 1 piece

Karpathy joins Anthropic's pre-training team: what the May 19 hire signals for CIO vendor-trajectory models

Part of the pillar

Vendor trajectory →

Where the major agentic-AI platform vendors are heading — strategy, pricing-model shifts, and what their trajectory means for a multi-year procurement commitment. 13 other pieces in this pillar.

Enterprise agentic AI in Q2 2026: what shipped, what slipped, what held

The 8-vendor Q1 2026 scorecard at 90 days

The citation-source pattern

What the CMU agent benchmark refresh showed

What shipped in Q2 that was not in Q1 claims

What procurement charters drafted in Q1 need to revisit

Anti-patterns: what we are not recommending

What changes this verdict

Vendor trajectory →

Related reading

The 8-vendor Q1 2026 scorecard at 90 days

The citation-source pattern

What the CMU agent benchmark refresh showed

What shipped in Q2 that was not in Q1 claims

What procurement charters drafted in Q1 need to revisit

Anti-patterns: what we are not recommending

What changes this verdict

Related enterprise reading

Vendor trajectory →

Related reading

Anthropic-Microsoft Maia chip talks: what the May 21 disclosure means for enterprise AI infrastructure procurement

Claude Fable 5 and the enterprise fallback problem: when a model refuses mid-request

The xAI IPO and the circular compute economy

AI-written analysis, signed by a practitioner. One or two pieces a week.

AI-written analysis, signed by a practitioner. One or two pieces a week.