Anthropic case study: What it takes to build accurate AI analytics for the enterprise

Vivek Asija

Geetesh Iyer

Geetesh Iyer

how to build accurate ai analytics

Anthropic pointed the world's most capable AI at a data warehouse. It was right 21% of the time. Theirs is the most honest account to date of why enterprise AI analytics is hard. Here's what it actually tells us — and what it means for everyone else.

Can you connect Claude to a warehouse and let your team ask questions?

Yes. And if you do it naively, you'll get 21% accuracy.

That number comes from Anthropic themselves. A recent blog shared how their own data team built an internal analytics system on Claude — a project that required months of context engineering before their analytics agent worked reliably in production. Getting from 21% to 95% required building and actively maintaining a multi-layer infrastructure.

That's a number worth sitting with. This isn't some underfunded team working with a second-tier model. This is Anthropic, with Claude, in-house.

The post names three failure modes that account for nearly every analytics error. I want to walk through each one — because they're not Anthropic's problems. They're the everyday reality of enterprise AI analytics. 

To be sure, OpenAI observed nearly identical patterns while building their Data Agent. This isn't a Claude issue. It's structural.

Failure 1: AI doesn’t know what your business’s questions mean

Ask "what were active users last month?" and every word is ambiguous. Active could mean logged-in, transacted, or engaged. User could mean account, seat, or person. Last month could mean trailing 30 days or calendar period, in any of a dozen timezones.

Without an enforced, authoritative definition, the AI picks whichever interpretation seems most plausible. And it's confidently wrong.

Anthropic's finding: "The most common failure is that the agent can't map a concept to the single correct table, column, and metric definition, usually because there are multiple plausible candidates."

This is concept-entity ambiguity. Not a model problem. A context problem.

Failure 2: Your context decays, and AI doesn’t know it

Data models change constantly. Columns get renamed. Business logic evolves. Documentation that was accurate last quarter can be silently wrong today.

When skill documentation wasn't kept current with data model changes, Anthropic watched accuracy drop from 95% to 65% in a single month. The model didn't change. The data didn't change. The AI context went stale.

This is the scariest failure because it's invisible. You don't get an error. You get a confidently wrong number that someone makes a decision on.

Failure 3: More data doesn’t fix the accuracy problem

For both problems mentioned above, the intuitive fix is to throw more context at the AI: historical queries, dashboards, analyst notebooks. More data, better answers. Right?

Wrong. Anthropic found that access to thousands of historical queries moved accuracy by less than one percentage point. The information was there, and AI could see it. But it still couldn't use it reliably.

Anthropic's conclusion: "Accuracy is not a code generation issue. It is a context quality and execution problem."

The bottleneck isn't data access. It's structure and infrastructure.

What it takes to get to 95% accuracy

This is where the post gets interesting.

To go from 21% to 95%, Anthropic's team had to assemble: a dimensional data model, a semantic layer, lineage tracking, four types of reference documents, adversarial review agents, offline evaluation suites, provenance footers, passive monitoring, and active correction harvesting. Then, actively maintain all of it to prevent degradation.

the agentic analytics stack from anthropic

Source: Anthropic

That's not a feature. That's a system. And it took months to build.

And if they ever stopped maintaining it — if skill documentation drifted from the data model — accuracy dropped 30 points in a single month.

WisdomAI provides this infrastructure as a product out of the box

WisdomAI's Adaptive Context Engine was built around exactly this diagnosis. Where Anthropic's team had to engineer solutions over months, the Adaptive Context Engine ships them as a production system on day one, addressing these critical challenges innately.

Ambiguity

The Adaptive Context Engine maintains an authoritative map of every business concept to its governed definition, canonical data source, and calculation logic. When a question comes in, every ambiguous term resolves against a tested definition before a single query is written.

For example, if two teams have defined the same metric differently, the system surfaces the conflict rather than silently choosing one. Every question maps to one authoritative answer. Not the most plausible one, but the correct one.

Staleness

The continuous accuracy loop that Anthropic had to manually engineer — colocating skill docs in their code repo, wiring CI checks, running correction-harvesting agents — the Adaptive Context Engine runs automatically. 

When schemas change or business logic evolves, the Adaptive Context Engine detects the change, updates affected context, and re-validates dependent answers. User feedback, including informal corrections in chat, feeds back into the context continuously. No engineering team required.

Retrieval

The Adaptive Context Engine’s answer to Anthropic's retrieval problem isn't better search — it's structured routing. Every question is routed to the right authoritative source before any query is written. 

Here’s how federation works: the Adaptive Context Engine routes questions across warehouses, applications, and documents without requiring data to be centralized first. The context travels with the question.

Management

Accuracy isn't a one-time problem you solve. It's a continuous maintenance problem. Though it isn’t fully spelled out, it’s explicitly implied in every section of Anthropic’s post. 

Their team had to build the entire upkeep infrastructure from scratch: 

  • CI hooks to catch documentation drift

  • correction-harvesting agents to catch production errors

  • adversarial review pipelines to catch silent misses

The moment any piece fell out of sync, accuracy collapsed. At WisdomAI, we call this operational loop the Context Development Lifecycle. Build context, validate it, deploy it, monitor it, improve it — automatically, on one unified system. 

Metric governance, answer validation, permission enforcement, staleness monitoring, and embedded analytics all run together — not across five separate tools that someone has to wire and manually maintain. Anthropic had to assemble that stack themselves. WisdomAI’s Context Development Lifecycle, maintained within the Adaptive Context Engine, ships it.

Key takeaway: There’s a smarter way to build AI context in Claude

Anthropic's blog post is one of the most useful accounts published about enterprise AI analytics because it names exactly what it takes to deliver accuracy.

The architecture they describe — tested context built, validated, applied, and improved continuously — is the Context Development Lifecycle. And that’s exactly what the Adaptive Context Engine delivers as a product.

For enterprise customers, the results are concrete: 

  • 95%+ answer accuracy at ConocoPhillips

  • 100% sales-team adoption at Property Finder

  • Governed AI analytics across Cisco's finance, procurement, and supply chain workflows

The real headline isn't that AI context engineering is hard. It's what AI context engineering makes possible: an AI analyst that gives you the right answer, reliably, and knows when it can't.

If accuracy is the destination, context is the mechanism to get you there. But you can learn from Anthropic’s case study instead of repeating their mistakes. You don’t need to build and maintain it the hard way. And an engine to run that mechanism is the differentiator. 

Schedule a demo to see the difference WisdomAI’s Adaptive Context Engine makes.

Vivek Asija

Geetesh Iyer

Geetesh Iyer

Latest Blog

Latest Blog

Insights at your fingertips with AI-powered analytics

Insights at your fingertips with AI-powered analytics

Insights at your fingertips with AI-powered analytics

Insights at your fingertips with AI-powered analytics