Enterprises are measuring the wrong part of RAG

Enterprises have moved quickly to adopt RAG to ground LLMs in proprietary data. In practice, however, many organizations are discovering that retrieval is no longer a feature bolted onto model inference — it has become a foundational system dependency….

Most RAG systems don’t understand sophisticated documents — they shred them

By now, many enterprises have deployed some form of RAG. The promise is seductive: index your PDFs, connect an LLM and instantly democratize your corporate knowledge.

But for industries dependent on heavy engineering, the reality has been underwhelming. Engineers ask specific questions about infrastructure, and the bot hallucinates.

The failure isn’t in the LLM. The failure is in the preprocessing.

Standard RAG pipelines treat documents as flat strings of text. They use “fixed-size chunking” (cutting a document every 500 characters). This works for prose, but it destroys the logic of technical manuals. It slices tables in half, severs captions from images, and ignores the visual hierarchy of the page.

Improving RAG reliability isn’t about buying a bigger model; it’s about fixing the “dark data” problem through semantic chunking and multimodal textualization.

Here is the architectural framework for building a RAG system that can actually read a manual.

The fallacy of fixed-size chunking

In a standard Python RAG tutorial, you split text by character count. In an enterprise PDF, this is disastrous.

If a safety specification table spans 1,000 tokens, and your chunk size is 500, you have just split the “voltage limit” header from the “240V” value. The vector database stores them separately. When a user asks, “What is the voltage limit?”, the retrieval system finds the header but not the value. The LLM, forced to answer, often guesses.

The solution: Semantic chunking

The first step to fixing production RAG is abandoning arbitrary character counts in favor of document intelligence.

Using layout-aware parsing tools (such as Azure Document Intelligence), we can segment data based on document structure such as chapters, sections and paragraphs, rather than token count.

  • Logical cohesion: A section describing a specific machine part is kept as a single vector, even if it varies in length.

  • Table preservation: The parser identifies a table boundary and forces the entire grid into a single chunk, preserving the row-column relationships that are vital for accurate retrieval.

In our internal qualitative benchmarks, moving from fixed to semantic chunking significantly improved the retrieval accuracy of tabular data, effectively stopping the fragmentation of technical specs.

Unlocking visual dark data

The second failure mode of enterprise RAG is blindness. A massive amount of corporate IP exists not in text, but in flowcharts, schematics and system architecture diagrams. Standard embedding models (like text-embedding-3-small) cannot “see” these images. They are skipped during indexing.

If your answer lies in a flowchart, your RAG system will say, “I don’t know.”

The solution: Multimodal textualization

To make diagrams searchable, we implemented a multimodal preprocessing step using vision-capable models (specifically GPT-4o) before the data ever hits the vector store.

  1. OCR extraction: High-precision optical character recognition pulls text labels from within the image.

  2. Generative captioning: The vision model analyzes the image and generates a detailed natural language description (“A flowchart showing that process A leads to process B if the temperature exceeds 50 degrees”).

  3. Hybrid embedding: This generated description is embedded and stored as metadata linked to the original image.

Now, when a user searches for “temperature process flow,” the vector search matches the description, even though the original source was a PNG file.

The trust layer: Evidence-based UI

For enterprise adoption, accuracy is only half the battle. The other half is verifiability.

In a standard RAG interface, the chatbot gives a text answer and cites a filename. This forces the user to download the PDF and hunt for the page to verify the claim. For high-stakes queries (“Is this chemical flammable?”), users simply won’t trust the bot.

The architecture should implement visual citation. Because we preserved the link between the text chunk and its parent image during the preprocessing phase, the UI can display the exact chart or table used to generate the answer alongside the text response.

This “show your work” mechanism allows humans to verify the AI’s reasoning instantly, bridging the trust gap that kills so many internal AI projects.

Future-proofing: Native multimodal embeddings

While the “textualization” method (converting images to text descriptions) is the practical solution for today, the architecture is rapidly evolving.

We are already seeing the emergence of native multimodal embeddings (such as Cohere’s Embed 4). These models can map text and images into the same vector space without the intermediate step of captioning. While we currently use a multi-stage pipeline for maximum control, the future of data infrastructure will likely involve “end-to-end” vectorization where the layout of a page is embedded directly.

Furthermore, as long context LLMs become cost-effective, the need for chunking may diminish. We may soon pass entire manuals into the context window. However, until latency and cost for million-token calls drop significantly, semantic preprocessing remains the most economically viable strategy for real-time systems.

Conclusion

The difference between a RAG demo and a production system is how it handles the messy reality of enterprise data.

Stop treating your documents as simple strings of text. If you want your AI to understand your business, you must respect the structure of your documents. By implementing semantic chunking and unlocking the visual data within your charts, you transform your RAG system from a “keyword searcher” into a true “knowledge assistant.”

Dippu Kumar Singh is an AI architect and data engineer.

This tree search framework hits 98.7% on documents where vector search fails

A new open-source framework called PageIndex solves one of the old problems of retrieval-augmented generation (RAG): handling very long documents.

The classic RAG workflow (chunk documents, calculate embeddings, store them in a vector database, and retrieve the top matches based on semantic similarity) works well for basic tasks such as Q&A over small documents.

PageIndex abandons the standard “chunk-and-embed” method entirely and treats document retrieval not as a search problem, but as a navigation problem.

But as enterprises try to move RAG into high-stakes workflows — auditing financial statements, analyzing legal contracts, navigating pharmaceutical protocols — they’re hitting an accuracy barrier that chunk optimization can’t solve.

AlphaGo for documents

PageIndex addresses these limitations by borrowing a concept from game-playing AI rather than search engines: tree search.

When humans need to find specific information in a dense textbook or a long annual report, they do not scan every paragraph linearly. They consult the table of contents to identify the relevant chapter, then the section, and finally the specific page. PageIndex forces the LLM to replicate this human behavior.

Instead of pre-calculating vectors, the framework builds a “Global Index” of the document’s structure, creating a tree where nodes represent chapters, sections, and subsections. When a query arrives, the LLM performs a tree search, explicitly classifying each node as relevant or irrelevant based on the full context of the user’s request.

“In computer science terms, a table of contents is a tree-structured representation of a document, and navigating it corresponds to tree search,” Zhang said. “PageIndex applies the same core idea — tree search — to document retrieval, and can be thought of as an AlphaGo-style system for retrieval rather than for games.”

This shifts the architectural paradigm from passive retrieval, where the system simply fetches matching text, to active navigation, where an agentic model decides where to look.

The limits of semantic similarity

There is a fundamental flaw in how traditional RAG handles complex data. Vector retrieval assumes that the text most semantically similar to a user’s query is also the most relevant. In professional domains, this assumption frequently breaks down.

Mingtian Zhang, co-founder of PageIndex, points to financial reporting as a prime example of this failure mode. If a financial analyst asks an AI about “EBITDA” (earnings before interest, taxes, depreciation, and amortization), a standard vector database will retrieve every chunk where that acronym or a similar term appears.

“Multiple sections may mention EBITDA with similar wording, yet only one section defines the precise calculation, adjustments, or reporting scope relevant to the question,” Zhang told VentureBeat. “A similarity based retriever struggles to distinguish these cases because the semantic signals are nearly indistinguishable.”

This is the “intent vs. content” gap. The user does not want to find the word “EBITDA”; they want to understand the “logic” behind it for that specific quarter.

Furthermore, traditional embeddings strip the query of its context. Because embedding models have strict input-length limits, the retrieval system usually only sees the specific question being asked, ignoring the previous turns of the conversation. This detaches the retrieval step from the user’s reasoning process. The system matches documents against a short, decontextualized query rather than the full history of the problem the user is trying to solve.

Solving the multi-hop reasoning problem

The real-world impact of this structural approach is most visible in “multi-hop” queries that require the AI to follow a trail of breadcrumbs across different parts of a document.

In a recent benchmark test known as FinanceBench, a system built on PageIndex called “Mafin 2.5” achieved a state-of-the-art accuracy score of 98.7%. The performance gap between this approach and vector-based systems becomes clear when analyzing how they handle internal references.

Zhang offers the example of a query regarding the total value of deferred assets in a Federal Reserve annual report. The main section of the report describes the “change” in value but does not list the total. However, the text contains a footnote: “See Appendix G of this report … for more detailed information.”

A vector-based system typically fails here. The text in Appendix G looks nothing like the user’s query about deferred assets; it is likely just a table of numbers. Because there is no semantic match, the vector database ignores it.

The reasoning-based retriever, however, reads the cue in the main text, follows the structural link to Appendix G, locates the correct table, and returns the accurate figure.

The latency trade-off and infrastructure shift

For enterprise architects, the immediate concern with an LLM-driven search process is latency. Vector lookups occur in milliseconds; having an LLM “read” a table of contents implies a significantly slower user experience.

However, Zhang explains that the perceived latency for the end-user may be negligible due to how the retrieval is integrated into the generation process. In a classic RAG setup, retrieval is a blocking step: the system must search the database before it can begin generating an answer. With PageIndex, retrieval happens inline, during the model’s reasoning process.

“The system can start streaming immediately, and retrieve as it generates,” Zhang said. “That means PageIndex does not add an extra ‘retrieval gate’ before the first token, and Time to First Token (TTFT) is comparable to a normal LLM call.”

This architectural shift also simplifies the data infrastructure. By removing reliance on embeddings, enterprises no longer need to maintain a dedicated vector database. The tree-structured index is lightweight enough to sit in a traditional relational database like PostgreSQL.

This addresses a growing pain point in LLM systems with retrieval components: the complexity of keeping vector stores in sync with living documents. PageIndex separates structure indexing from text extraction. If a contract is amended or a policy updated, the system can handle small edits by re-indexing only the affected subtree rather than reprocessing the entire document corpus.

A decision matrix for the enterprise

While the accuracy gains are compelling, tree-search retrieval is not a universal replacement for vector search. The technology is best viewed as a specialized tool for “deep work” rather than a catch-all for every retrieval task.

For short documents, such as emails or chat logs, the entire context often fits within a modern LLM’s context window, making any retrieval system unnecessary. Conversely, for tasks purely based on semantic discovery, such as recommending similar products or finding content with a similar “vibe,” vector embeddings remain the superior choice because the goal is proximity, not reasoning.

PageIndex fits squarely in the middle: long, highly structured documents where the cost of error is high. This includes technical manuals, FDA filings, and merger agreements. In these scenarios, the requirement is auditability. An enterprise system needs to be able to explain not just the answer, but the path it took to find it (e.g., confirming that it checked Section 4.1, followed the reference to Appendix B, and synthesized the data found there).

The future of agentic retrieval

The rise of frameworks like PageIndex signals a broader trend in the AI stack: the move toward “Agentic RAG.” As models become more capable of planning and reasoning, the responsibility for finding data is moving from the database layer to the model layer.

We are already seeing this in the coding space, where agents like Claude Code and Cursor are moving away from simple vector lookups in favor of active codebase exploration. Zhang believes generic document retrieval will follow the same trajectory.

“Vector databases still have suitable use cases,” Zhang said. “But their historical role as the default database for LLMs and AI will become less clear over time.”

AI agents can talk to each other — they just can’t think together yet

AI agents can talk to each other now — they just can’t understand what the other one is trying to do. That’s the problem Cisco’s Outshift is trying to solve with a new architectural approach it calls the Internet of Cognition.

The gap is practical: protocols like MCP and A2A let agents exchange messages and identify tools, but they don’t share intent or context. Without that, multi-agent systems burn cycles on coordination and can’t compound what they learn.

“The bottom line is, we can send messages, but agents do not understand each other, so there is no grounding, negotiation or coordination or common intent,” Vijoy Pandey, general manager and senior vice president of Outshift, told VentureBeat. 

The practical impact:

Consider a patient scheduling a specialist appointment. With MCP alone, a symptom assessment agent passes a diagnosis code to a scheduling agent, which finds available appointments. An insurance agent verifies coverage. A pharmacy agent checks drug availability.

Each agent completes its task, but none of them reasons together about the patient’s needs. The pharmacy agent might recommend a drug that conflicts with the patient’s history — information the symptom agent has but didn’t pass along because “potential drug interactions” wasn’t in its scope. The scheduling agent books the nearest available appointment without knowing the insurance agent found better coverage at a different facility.

They’re connected, but they’re not aligned on the goal: Find the right care for this patient’s specific situation.

Current protocols handle the mechanics of agent communication — MCP, A2A, and Outshift’s AGNTCY, which it donated to the Linux Foundation, let agents discover tools and exchange messages. But these operate at what Pandey calls the “connectivity and identification layer.” They handle syntax, not semantics.

The missing piece is shared context and intent. An agent completing a task knows what it’s doing and why, but that reasoning isn’t transmitted when it hands off to another agent. Each agent interprets goals independently, which means coordination requires constant clarification and learned insights stay siloed.

For agents to move from communication to collaboration, they need to share three things, according to Outshift: pattern recognition across datasets, causal relationships between actions, and explicit goal states.

“Without shared intent and shared context, AI agents remain semantically isolated. They are capable individually, but goals get interpreted differently; coordination burns cycles, and nothing compounds. One agent learns something valuable, but the rest of the multi-agent-human organization still starts from scratch,” Outshift said in a paper.

Outshift said the industry needs “open, interoperable, enterprise-grade agentic systems that semantically collaborate” and proposes a new architecture it calls the “Internet of Cognition,” where multi-agent environments work within a shared system.

The proposed architecture introduces three layers:

Cognition State Protocols: A semantic layer that sits above message-passing protocols. Agents share not just data but intent — what they’re trying to accomplish and why. This lets agents align on goals before acting, rather than clarifying after the fact.

Cognition Fabric: Infrastructure for building and maintaining shared context. Think of it as distributed working memory: context graphs that persist across agent interactions, with policy controls for what gets shared and who can access it. System designers can define what “common understanding” looks like for their use case.

Cognition Engines: Two types of capability. Accelerators let agents pool insights and compound learning — one agent’s discovery becomes available to others solving related problems. Guardrails enforce compliance boundaries so shared reasoning doesn’t violate regulatory or policy constraints.

Outshift positioned the framework as a call to action rather than a finished product. The company is working on implementation but emphasized that semantic agent collaboration will require industry-wide coordination — much like early internet protocols needed buy-in to become standards. Outshift is in the process of writing the code, publishing the specs and releasing research around the Internet of Cognition. It hopes to have a demo of the protocols soon.

Noah Goodman, co-founder of frontier AI company Humans& and a professor of computer science at Stanford, said during VentureBeat’s AI Impact event held in San Francisco that innovation happens when “other humans figure out which humans to pay attention to.” The same dynamic applies to agent systems: as individual agents learn, the value multiplies when other agents can identify and leverage that knowledge.

The practical question for teams deploying multi-agent systems now: Are your agents just connected, or are they actually working toward the same goal?

Factify wants to move past PDFs and .docx by giving digital documents their own brain

Tel Aviv-based startup Factify emerged from stealth today with a $73 million seed round for an ambitious, yet quixotic mission: to bring digital documents beyond the standard formats most businesses use — .PDF, .docx, collaborative cloud files like Google Docs — and into the intelligence era.

For Matan Gavish, Factify’s Founder and CEO, this isn’t just a software upgrade—it is an inevitability he has been obsessed with for years.

“The PDF was developed when I was in elementary school,” Gavish told VentureBeat. “The bedrock of the software ecosystem hasn’t really evolved… someone has to redesign the digital document itself.”

Gavish, a tenured professor of computer science and Stanford PhD, admits that his fixation on administrative file formats is an anomaly for someone with his credentials.

“It’s a very uncool problem to be obsessed with,” he says. “Given the fact that my academic background is AI and machine learning, my mom wanted me to start an AI company because it’s cool. I’m not sure why I’m obsessed and then possessed by documents.”

But that obsession has now attracted a sizeable seed round led by Valley Capital Partners and backed by AI heavyweights like former Google AI chief John Giannandrea.

The bet is simple the static rigidity of most digital files has limited their utility, and a better, more intelligent document that actually shares its edit history and ownership with users as intended, is not only possible — it’s a multi-billion-dollar opportunity.

The history of digital documents

To understand why a seed round would balloon to $73 million, you have to understand the scale of the trap businesses are in. There are currently an estimated three trillion PDFs in circulation. “Some people see the PDF more than they see their kids,” Gavish jokes.

The history of the digital document is not a linear progression where one format replaces another. Instead, it is a story of “speciation,” where different formats evolved to fill distinct ecological niches: creation, distribution, and collaboration.

The era of files: Microsoft Word (1980s–1990s)

Digital documents began as isolated artifacts. In the 1980s, the “document” was inextricably linked to the hardware that created it. A file created in WordPerfect on a DOS machine was effectively gibberish to a Macintosh user.

Microsoft Word, which traces its lineage to the pioneering WYSIWYG editors at Xerox PARC, changed this by leveraging the dominance of the Windows operating system. By the 1990s, the binary .doc format became the default container for editable professional documents. However, these files were structurally complex “memory dumps” designed for the limited hardware of the time, often leading to corruption or privacy leaks where deleted text remained hidden in the file’s binary data.

The era of digital ‘stone’: the PDF (1990s-2006)

The PDF did not originate as a tool for writing; it was a tool for viewing. In 1991, Adobe co-founder John Warnock penned the “Camelot Project” white paper, envisioning a “digital envelope” that would look identical on any display or printer.

Unlike Word files, which were malleable, PDFs were designed to be immutable. They used the PostScript imaging model to place characters at precise coordinates, ensuring visual fidelity. While adoption was initially slow, Adobe’s 1994 decision to release the Acrobat Reader for free established PDF as the global standard for “digital concrete”—the format of finality used for contracts, government forms, and archives.

The collaborative cloud docs era (2006-present)

In 2006, Google disrupted the model again by moving the document from the hard drive to the browser. Using “Operational Transformation” algorithms, Google Docs allowed multiple users to edit the same stream of text simultaneously.

This shifted the paradigm from “sending a file” to “sharing a link.” While Google Workspace now claims over 3 billion users (mostly consumers and education), it fundamentally changed how we work—turning documents into living, collaborative processes rather than static artifacts.

The status quo: fragmentation

Despite these advances, the business world remains fragmented. We draft in Google Docs (the “Digital Stream”), format in Word (the “Digital Clay”), and sign in PDF (the “Digital Stone”).

But this fragmentation has a cost. “The problem is not the document. It is everything around it,” the company notes. “Once a PDF leaves your system, control is gone. Versions drift. Access is unclear. Nothing is visible.”

Turning digital documents into intelligent infrastructure

Factify’s wager is that in the age of AI, this fragmentation is no longer just annoying—it is a critical failure. AI models need structured, verifiable data to function.

When an AI “reads” a PDF, it is essentially guessing, using optical character recognition to scrape text from what is effectively a digital photo.

“What we’re dealing with here is a megalomaniac vision, but it’s at the same time probably something that is inevitable,” Gavish says.

Factify’s solution is to treat documents not as static files, but as intelligent infrastructure. In the “Factified” standard, a document carries its own brain. It possesses a unique identity, a live permission system, and an immutable audit log that travels with it.

“We wrote a new document format that supplants the PostScript,” Gavish explains. “We created a new data layer that supports the document as a first class citizen… and it’s always available inside the organization and potentially outside.”

This distinction—between a File and an API—is the core of the company’s pitch”

  • Files are liabilities: They accumulate, get lost, and can be stolen. “It goes back to a brick status,” Gavish says. “Files are liabilities, if anything, because they just accumulate there, you have to guard them.”

  • APIs are assets: A Factify document is an active object. You can ask it questions: “Who has seen you? When do you expire? Are you the most up-to-date version?”

‘People don’t change’, but formats do

History is littered with formats that tried to replace the PDF (like Microsoft’s XPS). They failed because they demanded too much behavioral change from users. Gavish is keenly aware of this trap.

“When I talk to enterprise software entrepreneurs, I tell them the two laws to know about starting a company in enterprise software is that people don’t care, and no one changes,” he says.

To skirt this, Factify has built deep backwards compatibility. A Factified document can look exactly like a PDF, complete with page breaks and margins. Users don’t need to learn a new interface to get value; they just need to solve a specific pain point—like an executive who wants to ensure an investment memo can’t be forwarded.

“All they have to tell their team is, ‘Dear Chief of Staff, employment agreements and investment memoranda… are going to be Factified. The rest carry on,'” Gavish says. “They see immediate benefit… but then they discover that they’ve crossed the Rubicon.”

What’s next for Factify?

The capital from this round will be used to deepen the platform’s core engineering—which Gavish describes as a “heavy engineering lift” requiring them to rebuild the document format, data layer, and application layer from scratch. The company is also establishing a major operational hub in Pittsburgh to support its U.S. expansion.

Ultimately, Factify isn’t trying to build another collaboration tool like Google Docs. They are trying to build the immutable record of the future—the standard for “truth” in a digital world.

“The PDF… became a standard meaning I cannot file my taxes using any other format. This is how victory looks like,” Gavish says. “We are creating a document standard that is not specific for health care or for insurance, but is just document as such.”

For the three trillion static files currently sitting in cloud storage, the writing may finally be on the wall.

Adaptive6 emerges from stealth to reduce enterprise cloud waste (and it’s already optimizing Ticketmaster)

The generative AI era has sped everything up for most enterprises we talk to, especially development cycles (thanks to “vibe coding” and “agentic swarming“).

But even as they seek to leverage the power of new AI-assisted programming tools and coding agents like Claude Code to generate code, enterprises must contend with a looming concern — no, not safety (although that’s another one!): cloud spend.

According to Gartner, public cloud spend will rise 21.3% in 2026 and yet, according to Flexera’s last State of the Cloud report, up to 32% of enterprise cloud spend is actually just wasted resources — duplicated code, non-functional code, outdated code, needless scaffolding, inefficient processes, etc.

Today, a new firm, Adaptive6 emerged from stealth to reduce this cloud waste in realtime — automatically. The company, which also announced $44 million in total funding including a $28 million Series A led by U.S. Venture Partners (USVP), aims to treat cloud waste not as a financial discrepancy, but as a code vulnerability that must be detected and patched.

Co-founded by CEO Aviv Revach, an experienced founder, former Head of Strategy at Taboola, and a former security research team leader for the Israeli Military Intelligence Unit 8200, the idea behind the venture came directly from his experience working in cybersecurity.

“We realized this is not a financial problem; it’s an engineering problem,” Revach told VentureBeat in an exclusive video call interview conducted recently. “We drew on our background in cybersecurity, where to find vulnerabilities, you scan the cloud, identify the issues, map them back to the relevant code, find the responsible developer or engineer, and remediate—or, in some cases, shift left and prevent them altogether… it was obvious that this is exactly what we need to do.”

Adaptive6’s platform introduces a radical shift in how enterprises govern infrastructure: instead of asking finance teams to spot inefficiencies they can’t fix, it empowers engineers to resolve waste directly in their workflow.

By applying the rigor of cybersecurity—scanning, tracing, and remediation—Adaptive6 automates the cleanup of “Shadow Waste” across complex multi-cloud environments.

The shift: from billing to engineering

For years, the industry standard for managing cloud costs has been “visibility”—dashboards that tell you yesterday’s news. Revach argues that visibility without action is just noise.

“The first generation of tools are sort of trying to help on the financial side of the cloud,” Revach told VentureBeat. “They typically deal with the financial aspects of cloud cost… showing you costs going up, costs going down, forecasting, budgeting. But what they don’t really focus on is one of the biggest problems, which is the waste problem.”

According to Revach, the disconnect lies in ownership.

“Just like you have the CISO in cybersecurity trying to get everybody to be thinking about security, you now have the FinOps person trying to get everybody to be thinking about cloud cost.”

Technology: hunting “shadow waste”

The core of Adaptive6’s offering is its “Cloud Cost Governance and Optimization” (CCGO) platform. It doesn’t just look for idle servers; it hunts for what the company calls Shadow Waste—hidden inefficiencies in architecture and application workloads that traditional cost tools often miss.

The system operates without agents, using standard cloud APIs to gain read-only access to environments.

Revach explained to VentureBeat that the platform scans across AWS, GCP, and Azure, as well as PaaS layers like Databricks and Snowflake, and even deep into Kubernetes clusters.

“We have unique technology that basically allows us to match each resource in the cloud [where] we found a problem to the relevant line of code that actually created that problem,” Revach explained.

This “Cloud to Code” technology allows the system to identify the specific engineer who made the change and serve them a fix directly in their workflow (Jira, Slack, or ServiceNow).

Beyond basic resource sizing, the platform analyzes complex configurations, including those for emerging AI workloads.

Revach highlighted a specific technical nuance regarding “provisioned throughput” for Large Language Models (LLMs) on AWS.

He noted that engineers often struggle to balance commitment levels—committing too little risks performance, while committing too much wastes capital. Adaptive6’s engine analyzes these specific usage patterns to recommend the precise throughput commitment needed, a level of granularity that general finance tools lack.

Revach also provided a specific example of “Shadow Waste” involving application-level inefficiencies:

“If you’re using Python… and you’re not using the latest version—right now, version 3.12 made a major change that made it far more efficient,” he said. “Most folks, when they think about cloud cost, they don’t necessarily think of the Python version, so they only think about the size of the machine. By moving to that version, you gain the efficiency so your code just runs faster, and you reduce the cost.”

The AI paradox: both problem and solution

While Adaptive6 uses AI to generate remediation scripts and “1-Click Fixes,” Revach was careful to distinguish their deep-tech approach from generic AI coding agents. In fact, he noted that AI-generated code is often a source of waste itself.

“The code that is produced by AI is many times not that efficient because it was trained on a lot of code that other people wrote that didn’t necessarily take cloud cost optimization and governance into account,” Revach warned.

This is why Adaptive6 relies on a research team of experts rather than just generative models to identify inefficiencies. “Just like with vulnerability research, you see cyber companies getting the best of the best security researchers to find things… we are doing the exact same thing for cost inefficiencies,” Revach said.

Impact and adoption

The platform is already in use by major enterprises, including Ticketmaster, Bayer, and Norstella, with customers reporting 15–35% reductions in total cloud spend.

For global organizations, the ability to decentralized cost management is critical. “As complex as it gets with a big organization, that’s exactly our sweet spot,” Revach noted. He cited one dramatic instance of the tool’s efficacy: “We’ve had a case where one misconfiguration that basically an organization solved actually resulted in more than a million dollars of savings.”

Looking ahead

The system also includes “shift left” prevention capabilities, integrating directly into CI/CD pipelines. This allows the platform to scan code for cost inefficiencies before it ever goes live, effectively blocking expensive architectural mistakes before they are deployed—much like a security scanner blocks vulnerable code.

“We detect what’s already wasting money, prevent new inefficiencies before they deploy, and remediate at scale,” Revach said. By shifting the responsibility left to developers, Adaptive6 suggests the future of cloud cost management won’t be found in a spreadsheet, but in a pull request.

How SAP Cloud ERP enabled Western Sugar’s move to AI-driven automation

Presented by SAP


Ten years ago, Western Sugar made a decision that would prove prescient: move from on-premise SAP ECC to SAP S/4HANA Cloud Public Edition. At the time, artificial intelligence wasn’t a priority on most roadmaps. The company was simply trying to escape what Director of Corporate Controlling, Richard Caluori, calls “a trainwreck:” a heavily customized ERP system so laden with custom ABAP code that it had become unupgradable.

Today, that early cloud adoption is proving to be the foundation for Western Sugar’s AI transformation. As SAP accelerates its rollout of business AI capabilities across finance, supply chain, HR, and more, Western Sugar finds itself uniquely positioned to take advantage of the technology.

“We didn’t move to the cloud thinking about AI,” Caluori says. “But that decision to embrace clean core principles and standardized processes turned out to be exactly what we needed when AI capabilities became available. The clean data, the standardized workflows, the disciplined processes, all of that groundwork we laid for basic operational reasons is now the foundation that makes AI work. We were ready without even knowing it.”

Building AI readiness with a clean core ERP foundation

Western Sugar’s journey began with a familiar enterprise problem: technical debt. Years of on-premise customization had created a system that was nearly impossible to maintain or upgrade.

“Because we were on premise, we could do our own coding in ABAP, and over the years we created such a mess with our internal coding that the software was no longer upgradable,” Caluori explains. “The immediate benefits of moving to public cloud were clear: reduced infrastructure burden and access to standard processes refined by SAP. They’ve been in this business for 40 to 50 years, and they put all their experience into this one solution. Now upgrades just work.”

But the most significant advantage proved to be the clean core philosophy inherent in public cloud deployments, which means the software is maintained and upgraded by SAP. This approach, combined with robust API connectivity, created an environment where Western Sugar’s IT department could easily integrate systems.

In SAP S/4HANA Cloud Public Edition, this model keeps core ERP logic standardized and upgradeable, while extensions and integrations are handled through SAP-supported APIs and services.

“In the end, we have lower total cost of ownership, a better product, and the data quality and process discipline that’s essential for AI adoption,” Caluori says.

This clean core foundation — standardized, continuously updated, and API-connected — is what makes embedded AI viable inside SAP Cloud ERP.

How clean core processes in SAP enabled AI automation

When SAP began rolling out SAP Business AI capabilities inside SAP S/4HANA Cloud Public Edition, Caluori was amped to improve processes through automation and standardization in ways they’d never previously imagined. The company’s first major AI implementation focused on central invoice management.

Today, invoices arrive from external sources and pass through the firewall. If they meet predefined AI confidence thresholds, SAP Business AI automatically posts them with zero human keyboard input. Each transaction is continuously evaluated using a traffic-light model: green items are processed automatically, yellow items are routed for review, and red items are flagged for immediate attention.

“Because the invoice just flows through the system automatically, we’re held to a high standard,” he says. “The AI-driven functionality only works if the whole process chain, from purchasing requisition to purchase order to receiving to issuing is clean, so we’re constantly improving those upstream processes to meet the demands of our AI innovations.”

Quantifying the operational impact of AI automation

Caluori estimates Western Sugar has achieved six-figure direct cost savings through AI automation, not accounting for improved visibility and control.

“When I log into my computer now, I can see immediately in real time what’s going on across the procurement side,” he says. “I have a comprehensive cockpit view that I didn’t have before. Because I have much more visibility, I have much more control over operations.”

The company is now expanding AI adoption into new areas. With Western Sugar’s recent transition to SAP’s three-speed landscape, Caluori is targeting month-end closing processes for AI automation.

“My goal is that AI handles the vast majority of the month-end close,” he says. “Over time, as AI learns what we’re doing and how we close the books, the goal is to automate over 50 percent of the month’s end closing activities. We’re also looking forward to AI-managed procurement networks, and proactive reporting and intelligence, all of which will soon be possible.”

Western Sugar is also developing predictive maintenance AI for its manufacturing equipment — a critical capability for its large-scale facilities, where equipment failures can halt production and lead to losses in the hundreds of thousands of dollars. These efforts build on SAP’s AI and analytics capabilities across asset management and manufacturing systems.

“We’ve started an internal team working on predictive analytics with AI, where the system can tell us in advance if we need to be on alert for specific equipment — that a particular machine could break down in the next two or three days or weeks,” Caluori explains. “If we can proactively address these issues before they cause production stoppages, this will save us millions of dollars.”

Managing organizational change alongside AI adoption

While the technology itself has delivered clear benefits, the organizational impact has been more complex. For Western Sugar, modernizing its core systems early — by moving to SAP’s cloud and embracing standardized, upgrade-driven processes — required not just new workflows, but a fundamental shift in how employees thought about change itself.

For Caluori, that readiness is non-negotiable. “Change management is the number-one key to success,” he says. “We had to do a lot of change management, not only around business processes, but around employee behavior as well.”

That work paid off over time, in part because cloud adoption normalized continuous change. As upgrades became routine rather than disruptive, employees grew more comfortable with evolution as an operating condition.

“Now, when SAP comes with a new upgrade, they know change is coming,” Caluori explains. “The mindset has shifted to being eager to see what improvements the next update will bring.”

And that cultural shift has proven critical as Western Sugar moves beyond system upgrades and into more advanced initiatives.

“Now people are even eager to move into AI — the bigger projects,” he adds.

However, that cultural readiness must be driven from the top. At Western Sugar, executive leadership — many of whom came from large international organizations — understands the competitive necessity of staying current with technology. That top-down commitment has helped normalize continuous change and created the foundation required to pursue AI strategically.

Lessons in AI readiness from Western Sugar

For companies considering their own AI journeys, Western Sugar’s experience offers a clear lesson: AI readiness begins long before AI adoption. The clean core, standardized processes, and strong data quality that Western Sugar established a decade ago, driven purely by the need to escape technical debt, proved to be exactly what AI required. And while Caluori acknowledges the advantage an early start gave them, he says the second-best time to start is now.

“You have to embrace these changes, otherwise you’re left behind,” Caluori says. “That continuous improvement is what SAP provides us, and now with AI capabilities integrated throughout, we’re seeing benefits we couldn’t have imagined when we started this journey.”


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

A European AI challenger goes after GitHub Copilot: Mistral launches Vibe 2.0

Mistral AI, the French artificial intelligence company that has positioned itself as Europe’s leading challenger to American AI giants, announced on Tuesday the general availability of Mistral Vibe 2.0, a significant upgrade to its terminal-based coding agent that’s the startup’s most aggressive push yet into the competitive AI-assisted software development market.

The release is a pivotal moment for the Paris-based company, which is transitioning its developer tools from a free testing phase to a commercial product integrated with its paid subscription plans. The move comes just days after Mistral CEO Arthur Mensch told Bloomberg Television at the World Economic Forum in Davos that the company expects to cross €1 billion in revenue by the end of 2026 — a projection that would still leave it far behind American competitors but would cement its position as Europe’s preeminent AI firm.

“The announcement is more of an upgrade and general availability,” Timothée Lacroix, cofounder of Mistral, said in an exclusive interview with VentureBeat. “We produced Devstral 2 in December, and we released at the time a first version of Vibe. Everything was free and in testing. Now we have finalized and improved the CLI, and we are moving Mistral Vibe to a paid plan that’s bundled with our Le Chat plans.”

Why legacy enterprise code is AI’s blind spot

Mistral Vibe 2.0 arrives as technology executives across industries grapple with a fundamental tension: the promise of AI-powered coding tools is immense, but the most capable models are controlled by a handful of American companies — OpenAI, Anthropic, and Google — whose closed-source approaches leave enterprises with limited control over their most sensitive intellectual property.

Mistral is betting that its open-source approach, combined with deep customization capabilities, will appeal to organizations wary of sending proprietary code to third-party providers. The strategy targets a specific pain point that Lacroix says plagues enterprises with legacy systems.

“The code bases that large enterprise work with are large and have been built upon years and years, and they haven’t seen the web,” Lacroix explained. “They potentially rely on large libraries or large domain-specific languages that are unknown to typical language models. And so what we’re able to do with the Vibe CLI and our models is to go and customize them to a customer’s code base and its specific IP to get an improved experience.”

This customization capability addresses a limitation that has frustrated many enterprise technology leaders: general-purpose AI coding assistants trained on public code repositories often struggle with proprietary frameworks, internal coding conventions, and domain-specific languages that exist only within corporate walls. A bank’s internal trading system, a manufacturer’s proprietary control software, or a pharmaceutical company’s research pipeline may rely on decades of accumulated code written in conventions that no public AI model has ever encountered.

Custom subagents and clarification prompts give developers more control

The updated Vibe CLI introduces several features designed to give developers more granular control over how the AI agent operates. Custom subagents allow organizations to build specialized AI agents for targeted tasks—such as deployment scripts, pull request reviews, or test generation—that can be invoked on demand rather than relying on a single general-purpose assistant.

Multi-choice clarifications are a departure from the behavior of many AI coding tools that attempt to infer developer intent when instructions are ambiguous. Instead, Vibe 2.0 prompts users with options before taking action, reducing the risk of unwanted code changes. Slash-command skills enable developers to load preconfigured workflows for common tasks like deploying, linting, or generating documentation through simple commands. Unified agent modes allow teams to configure custom operational modes that combine specific tools, permissions, and behaviors, enabling developers to switch contexts without switching between different applications. The tool also now ships with continuous updates through the command line, eliminating the need for manual version management.

Mistral Vibe 2.0 is available through two subscription tiers. The Le Chat Pro plan costs $14.99 per month and provides full access to the Vibe CLI and Devstral 2, the underlying model that powers the agent, with students receiving a 50 percent discount. The Le Chat Team plan, priced at $24.99 per seat per month, adds unified billing, administrative controls, and priority support for organizations. 

Both plans include generous usage allowances for sustained development work, with the option to continue beyond limits through pay-as-you-go pricing at API rates. The underlying Devstral 2 model, which previously was offered free through Mistral’s API during a testing period, now moves to paid access with input pricing of $0.40 per million tokens and output pricing of $2.00 per million tokens.

Smaller, denser models challenge the bigger-is-better assumption

The Devstral 2 model family that powers Vibe CLI is Mistral’s bet that smaller, more efficient models can compete with — and in some cases outperform — the massive systems built by better-funded American rivals. Devstral 2, a 123-billion-parameter dense transformer, achieves 72.2 percent on SWE-bench Verified, a widely used benchmark for evaluating AI systems’ ability to solve real-world software engineering problems.

Perhaps more significant for enterprise deployment, the model is roughly five times smaller than DeepSeek V3.2 and eight times smaller than Kimi K2 — Chinese models that have drawn attention for matching American AI systems at a fraction of the cost. The smaller Devstral 2 Small, at 24 billion parameters, can run on consumer hardware including laptops.

“Those two models are dense, which makes it also—I mean, the small one is something that can run on a laptop, really, which is great if you’re working on the train,” Lacroix noted. “But the fact that the larger one is also dense is interesting for on-prem or more resource-constrained usage, where it’s easier to get efficient use of a dense model rather than large mixture of experts, and it requires smaller hardware to start.”

The distinction between dense and mixture-of-experts architectures is technically significant. While mixture-of-experts models can theoretically offer more capability per compute dollar by activating only portions of their parameters for any given task, they require more complex infrastructure to deploy efficiently. Dense models, by contrast, activate all parameters for every computation but are more straightforward to run on conventional hardware — a meaningful consideration for enterprises that want to deploy AI systems on their own infrastructure rather than relying on cloud providers.

Banks and defense contractors want AI that never leaves their walls

For regulated industries — particularly financial services, healthcare, and defense — the question of where AI models run and who has access to the data they process is not merely technical but existential. Banks cannot send proprietary trading algorithms to external AI providers. Healthcare organizations face strict regulations about patient data. Defense contractors operate under security clearances that prohibit sharing sensitive information with foreign entities.

Lacroix suggests that the on-premises deployment capability, while important, is secondary to a more fundamental concern about ownership and control. “The fact that it’s on-prem, I think, is less relevant than the fact that it’s owned by the company and that it’s on wherever they feel safe moving that data — like they’re not shipping the entire code base to a third party,” he said. “I think that’s important.”

This framing positions Mistral not merely as a vendor of AI tools but as a partner in building proprietary AI capabilities that become strategic assets for client organizations. “When we work with a company to then customize them and potentially fine-tune them or continue pre-training them, then they become assets to that company, and they are their own competitive advantage, really,” Lacroix explained.

Mistral has actively cultivated relationships with governments to underscore this positioning. The company serves defense ministries in Europe and Southeast Asia, both directly and through defense contractors. At Davos, Mensch described AI as critical not only to economic sovereignty but to “strategic sovereignty,” noting that autonomous systems like drones require AI capabilities and that deterrence in this domain is increasingly important.

Mistral’s CEO dismisses the idea that China lags in artificial intelligence

Mistral’s positioning as a European alternative to American AI giants takes on added significance amid rising geopolitical tensions. At the World Economic Forum, Mensch was characteristically blunt about the competitive landscape, dismissing claims that Chinese AI development lags the United States as a “fairy tale.”

“China is not behind the West,” Mensch said in his Bloomberg Television interview. The capabilities of China’s open-source technology, he added, are “probably stressing the CEOs in the U.S.”

The comments reflect a broader anxiety in the AI industry about the durability of American technological leadership. Chinese companies including DeepSeek and Alibaba have released open-source models that match or exceed many American systems, often at dramatically lower costs. For Mistral, this competitive pressure validates its strategy of focusing on efficiency and customization rather than attempting to match the massive training runs of better-capitalized American rivals.

European Commission digital chief Henna Virkkunen, also speaking at Davos, underscored the strategic importance of technological sovereignty. “It’s so important that we are not dependent on one country or one company when it comes to some very critical fields of our economy or society,” she said.

For American enterprise customers, Lacroix suggests that Mistral’s European identity and government relationships need not be a concern — and may even be an advantage. “One of the benefits when working as we do, like with open weights, and especially when deploying on customers’ premises and giving them control, is that the wider geopolitics don’t necessarily matter that much,” he said. “I think the benefits of the open-source scene is that it gives you confidence that you know what you’re using, and you’re in total control of it.”

From model maker to enterprise platform signals a strategic pivot

Mistral’s transition from a pure model company to what Lacroix describes as “a full enterprise platform around developing AI applications” reflects a broader maturation in the AI industry. The realization that model weights alone do not capture the full value of AI systems has pushed companies across the sector toward more integrated offerings.

“We don’t think the only value we provide is in the model,” Lacroix said. “We started as a models company. We are now building a full enterprise platform around developing AI applications. We have a part of our company that provides services to integrate deeply. And so the way we make money, and I guess the question behind this is the value that is core to Mistral, is that full-stack solution to getting to the ROI of AI.”

This full-stack approach includes fine-tuning on internal languages and domain-specific languages, reinforcement learning with customer-specific environments, and end-to-end code modernization services that can migrate entire codebases to modern technology stacks. Mistral says it already delivers these solutions to some of the world’s largest organizations in finance, defense, and infrastructure.

The revenue milestone Mensch projected at Davos — crossing €1 billion by year’s end — would represent remarkable growth for a company founded in 2023. But it would still leave Mistral far behind American competitors whose valuations stretch into the hundreds of billions. OpenAI, now reportedly valued at more than $150 billion, and Anthropic, valued at approximately $60 billion, operate at a scale that Mistral cannot match through organic growth alone. To close the gap, Mistral is looking at acquisitions. “We are in the process of looking at a few opportunities,” Mensch said at Davos, though he declined to specify target business areas or geographic regions. The company’s September fundraise brought in €1.7 billion, with Dutch semiconductor equipment giant ASML joining as a key investor, valuing Mistral at €11.7 billion.

The coding assistant wars are just getting started

Looking beyond the immediate product announcement, Lacroix sees the current generation of AI coding tools as a transitional phase toward more autonomous software development. “For a few tasks, it’s already becoming the default entry point — like if I want to prototype something, or if I want to quickly iterate on an idea. I think it’s already faster,” he said. “What I see today is there is still some story that needs to happen on how you do the work asynchronously and in a way where it’s easy to orchestrate several tasks and several improvements on the same code base in a flow that feels natural.”

The current experience, he suggests, does not yet feel like having “your own team of developers that can really 10x yourself.” But he expects rapid improvement, driven by abundant training data and intense industry interest. Perhaps more ambitiously, Lacroix sees the file-manipulation and tool-calling capabilities built for coding as applicable far beyond software development. “What I’m really excited about is the use of these tools outside of coding,” he said. “The really strong realization is you now have an agent that is great at working with a file system, that can edit information and that expands its context a lot, and it’s really great at using all sorts of tools. Those tools don’t need to be necessarily related to coding, really.”

For chief technology officers and engineering leaders evaluating AI coding tools, Mistral’s announcement crystallizes the strategic choice now facing enterprises: accept the convenience and raw capability of closed-source American models, or bet on the flexibility and control of open-source alternatives that can be customized and deployed behind corporate firewalls. Human evaluations comparing Devstral 2 against Claude Sonnet 4.5 showed that Anthropic’s model was “significantly preferred,” according to Mistral’s own benchmarking — an acknowledgment that closed-source leaders retain advantages that efficiency and customization cannot fully offset.

But Lacroix is betting that for enterprises with proprietary code, legacy systems, and regulatory constraints, customization will matter more than raw performance on public benchmarks. “The point is that you can now get all of this vibe coding disruption and goodness in an environment where customization is needed, which was difficult before,” he said. “And that’s, I think, the main point that we’re making with this announcement.”

The AI coding wars, in other words, are no longer just about which model writes the best code. They’re about who gets to own the model that understands yours.

Anthropic embeds Slack, Figma and Asana inside Claude, turning AI chat into a workplace command center

Anthropic announced Monday that users can now open and interact with popular business applications directly inside Claude, the company’s AI assistant—a significant expansion that transforms the chatbot from a conversational tool into an integrated workspace where employees can build project timelines, draft Slack messages, create presentations, and visualize data without switching browser tabs.

The rollout, which goes live today, includes integrations with Amplitude, Asana, Box, Canva, Clay, Figma, Hex, Monday.com, and Slack. Salesforce integration is coming soon. The feature marks a new chapter in Anthropic’s aggressive push to dominate enterprise AI, arriving just days after the company’s CEO made headlines at Davos with bold predictions about AI replacing white-collar workers.

“MCP Apps are an extension to the core MCP protocol and are part of the open source MCP ecosystem,” Sean Strong, Anthropic’s product manager for MCP Apps, told VentureBeat in an exclusive interview. “Within Claude.ai, connectors require a paid Claude plan — Pro, Max, Team, or Enterprise — but there is no additional charge associated with using connectors.”

That pricing decision is notable. Rather than monetizing integrations separately or charging partners for distribution, Anthropic is bundling interactive tools into existing subscription tiers — a strategy designed to accelerate adoption and deepen Claude’s foothold in corporate environments where the company reportedly already leads OpenAI.

Inside MCP Apps, the open-source technology that lets Claude control your favorite work tools

The technical foundation is what Anthropic calls “MCP Apps,” a new extension to the Model Context Protocol, the open standard for connecting external tools to AI applications that Anthropic open-sourced last year. MCP Apps allow any MCP server to deliver an interactive user interface within any supporting AI product—meaning the technology isn’t limited to Claude.

In practice, the integrations allow for surprisingly granular control. Users can build analytics charts in Amplitude and adjust parameters interactively to explore trends. They can turn conversations into Asana projects with tasks and timelines that sync automatically. They can prompt Claude to generate flowcharts or Gantt charts in Figma’s collaborative whiteboard tool, FigJam. They can draft Slack messages, preview formatting, and review before posting.

The Hex integration may prove particularly valuable for data teams: users can ask data questions in natural language and receive answers complete with interactive charts, tables, and citations — effectively turning Claude into a business intelligence interface.

“We open sourced MCP to give the ecosystem a universal way to connect tools to AI,” the company said in its announcement blog. “Now we’re extending MCP further so developers can build interactive UI on top of it, wherever their users are.”

What happens when AI can send messages and create projects on your behalf

With AI systems increasingly capable of taking real-world actions — sending messages, creating projects, publishing content — the question of guardrails becomes critical. Can an employee accidentally send an unreviewed Slack message or publish an incomplete Canva presentation?

Strong addressed this directly. “Most major MCP clients, including Claude, provide consent prompts that help users determine if they want to take an action via a MCP server,” he said.

For enterprise deployments, IT administrators retain control. “Team and Enterprise admins have the ability to control which MCP servers users in their organizations have the ability to use,” Strong explained.

The consent-prompt approach is a middle ground between full autonomy and cumbersome approval workflows. But it also places significant responsibility on individual users to review actions before confirming them — a design choice that may draw scrutiny as AI agents become capable of more consequential decisions.

The security concerns are not hypothetical. As Fortune reported last week, Anthropic’s Claude Code product faces vulnerabilities including “prompt injections,” where attackers hide malicious instructions in web content to manipulate AI behavior. The company has implemented multiple security layers, including running some features in virtual machines and adding deletion protection after users accidentally removed files. “Agent safety—that is, the task of securing Claude’s real-world actions—is still an active area of development in the industry,” Anthropic has acknowledged.

Claude Code’s viral success set the stage for Anthropic’s enterprise ambitions

The interactive tools announcement arrives at a moment of unusual momentum for Anthropic. Claude Code, the company’s coding assistant released in February 2024, has become a viral hit that has captured attention far beyond its intended developer audience.

Originally built for software developers, Claude Code has captured attention far beyond its intended audience. Non-programmers have deployed it to book theater tickets, file taxes, and monitor tomato plants. Nvidia CEO Jensen Huang called it “incredible.” Even Microsoft, which sells the competing GitHub Copilot, has widely adopted Claude Code internally, with non-developers reportedly encouraged to use it.

Boris Cherny, Anthropic’s head of Claude Code, told Fortune that his team built Cowork — a user-friendly version of the coding product for non-programmers — in approximately a week and a half, largely using Claude Code itself. “Engineers just feel unshackled, that they don’t have to work on all the tedious stuff anymore,” Cherny said.

Claude Code is now used by Uber, Netflix, Spotify, Salesforce, Accenture, and Snowflake, according to Anthropic. Claude’s total web audience has more than doubled since December 2024, and daily unique visitors on desktop are up 12% globally year-to-date, according to data from Similarweb and Sensor Tower published by The Wall Street Journal.

The company is also reportedly planning a $10 billion fundraising round that would value Anthropic at $350 billion — a staggering figure that reflects investor confidence in the company’s enterprise traction.

Anthropic’s CEO stirred controversy at Davos with predictions about AI replacing workers

The interactive tools launch also arrives against a backdrop of intense debate about AI’s impact on employment — a debate that Anthropic’s own CEO helped intensify at the World Economic Forum in Davos last week.

Dario Amodei told a Davos audience that AI models would replace the work of all software developers within a year and would reach “Nobel-level” scientific research in multiple fields within two years. He predicted that 50% of white-collar jobs would disappear within five years.

“I have engineers within Anthropic who say ‘I don’t write any code anymore. I just let the model write the code, I edit it,'” Amodei said. “We might be six to 12 months away from when the model is doing most, maybe all of what software engineers do end-to-end.”

Not everyone agrees with that timeline. Demis Hassabis, the Nobel Prize-winning CEO of Google DeepMind, said at the same conference that today’s AI systems are “nowhere near” human-level artificial general intelligence. Yann LeCun, the Turing Award-winning AI pioneer who recently left Meta to found Advanced Machine Intelligence Labs, went further, arguing that large language models “will never be able to achieve humanlike intelligence” and that a completely different approach is needed.

Why embedding AI into daily workflows could create powerful lock-in for enterprises

Anthropic’s integration strategy reflects a broader shift in enterprise AI competition. The battleground is moving from model benchmarks and capability demonstrations toward workflow integration — the degree to which AI systems become embedded in how companies actually operate.

By making Claude the interface through which employees interact with Asana, Slack, Figma, and other daily tools, Anthropic is positioning itself not merely as an AI provider but as a workflow orchestration layer. The more actions that flow through Claude, the harder it becomes for enterprises to switch to a competitor.

This approach mirrors strategies that proved successful for earlier generations of enterprise software. Salesforce built its dominance partly by becoming the system of record for customer data. Slack grew by centralizing workplace communication. Anthropic appears to be betting that AI assistants can occupy a similar position — the default starting point for work itself.

The open-source foundation of MCP may accelerate this strategy. By making the protocol available to any developer, Anthropic encourages a broad ecosystem of integrations that all funnel through MCP-compatible clients — of which Claude is the most prominent. The company benefits from network effects even as it maintains the standard is open.

The race to become the operating system for AI-powered work is just getting started

The launch notably excludes some major enterprise platforms. Salesforce integration is listed as “coming soon,” and there’s no mention of Microsoft 365, Google Workspace, or other productivity suites that dominate corporate environments. Those gaps may limit initial adoption in organizations heavily invested in those ecosystems.

The feature is available on web and desktop for paid Claude plans, with support for Claude Cowork — the file management agent launched last week — coming later. Mobile support was not mentioned in the announcement.

For enterprises evaluating Claude against OpenAI’s offerings and other competitors, the interactive integrations represent a tangible differentiator. The ability to take action within business tools — rather than simply generating text that users must copy elsewhere — addresses a persistent friction point in AI adoption.

Whether that advantage proves durable depends on how quickly competitors respond. OpenAI has its own enterprise ambitions and partnerships. Google is integrating Gemini across its productivity suite. Microsoft continues to deepen Copilot’s presence in Office applications.

But the larger significance may be what today’s announcement signals about where enterprise software is headed. For decades, the default unit of work has been the application — the spreadsheet, the project tracker, the messaging platform. Anthropic is wagering that the future belongs to the AI layer that sits above them all.

If the company is right, the question for every enterprise software vendor becomes uncomfortably simple: Do you want to be the tool, or the thing that controls the tools?

The era of agentic AI demands a data constitution, not better prompts

The industry consensus is that 2026 will be the year of “agentic AI.” We are rapidly moving past chatbots that simply summarize text. We are entering the era of autonomous agents that execute tasks. We expect them to book flights, diagnose system outages, manage cloud infrastructure and personalize media streams in real-time.

As a technology executive overseeing platforms that serve 30 million concurrent users during massive global events like the Olympics and the Super Bowl, I have seen the unsexy reality behind the hype: Agents are incredibly fragile.

Executives and VCs obsess over model benchmarks. They debate Llama 3 versus GPT-4. They focus on maximizing context window sizes. Yet they are ignoring the actual failure point. The primary reason autonomous agents fail in production is often due to data hygiene issues.

In the previous era of “human-in-the-loop” analytics, data quality was a manageable nuisance. If an ETL pipeline experiences an issue, a dashboard may display an incorrect revenue number. A human analyst would spot the anomaly, flag it and fix it. The blast radius was contained.

In the new world of autonomous agents, that safety net is gone.

If a data pipeline drifts today, an agent doesn’t just report the wrong number. It takes the wrong action. It provisions the wrong server type. It recommends a horror movie to a user watching cartoons. It hallucinates a customer service answer based on corrupted vector embeddings.

To run AI at the scale of the NFL or the Olympics, I realized that standard data cleaning is insufficient. We cannot just “monitor” data. We must legislate it.

A solution to this specific problem could be in the form of a ‘data quality – creed’ framework. It functions as a ‘data constitution.’ It enforces thousands of automated rules before a single byte of data is allowed to touch an AI model. While I applied this specifically to the streaming architecture at NBCUniversal, the methodology is universal for any enterprise looking to operationalize AI agents.

Here is why “defensive data engineering” and the Creed philosophy are the only ways to survive the Agentic era.

The vector database trap

The core problem with AI Agents is that they trust the context you give them implicitly. If you are using RAG, your vector database is the agent’s long-term memory.

Standard data quality issues are catastrophic for vector databases. In traditional SQL databases, a null value is just a null value. In a vector database, a null value or a schema mismatch can warp the semantic meaning of the entire embedding.

Consider a scenario where metadata drifts. Suppose your pipeline ingests video metadata, but a race condition causes the “genre” tag to slip. Your metadata might tag a video as “live sports,” but the embedding was generated from a “news clip.” When an agent queries the database for “touchdown highlights,” it retrieves the news clip because the vector similarity search is operating on a corrupted signal. The agent then serves that clip to millions of users.

At scale, you cannot rely on downstream monitoring to catch this. By the time an anomaly alarm goes off, the agent has already made thousands of bad decisions. Quality controls must shift to the absolute “left” of the pipeline.

The “Creed” framework: 3 principles for survival

The Creed framework is expected to act as a gatekeeper. It is a multi-tenant quality architecture that sits between ingestion sources and AI models.

For technology leaders looking to build their own “constitution,” here are the three non-negotiable principles I recommend.

1. The “quarantine” pattern is mandatory: In many modern data organizations, engineers favor the “ELT” approach. They dump raw data into a lake and clean it up later. For AI Agents, this is unacceptable. You cannot let an agent drink from a polluted lake.

The Creed methodology enforces a strict “dead letter queue.” If a data packet violates a contract, it is immediately quarantined. It never reaches the vector database. It is far better for an agent to say “I don’t know” due to missing data than to confidently lie due to bad data. This “circuit breaker” pattern is essential for preventing high-profile hallucinations.

2. Schema is law: For years, the industry moved toward “schemaless” flexibility to move fast. We must reverse that trend for core AI pipelines. We must enforce strict typing and referential integrity.

In my experience, a robust system requires scale. The implementation I oversee currently enforces more than 1,000 active rules running across real-time streams. These aren’t just checking for nulls. They check for business logic consistency.

  • Example: Does the “user_segment” in the event stream match the active taxonomy in the feature store? If not, block it.

  • Example: Is the timestamp within the acceptable latency window for real-time inference? If not, drop it.

3. Vector consistency checks This is the new frontier for SREs. We must implement automated checks to ensure that the text chunks stored in a vector database actually match the embedding vectors associated with them. “Silent” failures in an embedding model API often leave you with vectors that point to nothing. This causes agents to retrieve pure noise.

The culture war: Engineers vs. governance

Implementing a framework like Creed is not just a technical challenge. It is a cultural one.

Engineers generally hate guardrails. They view strict schemas and data contracts as bureaucratic hurdles that slow down deployment velocity. When introducing a data constitution, leaders often face pushback. Teams feel they are returning to the “waterfall” era of rigid database administration.

To succeed, you must flip the incentive structure. We demonstrated that Creed was actually an accelerator. By guaranteeing the purity of the input data, we eliminated the weeks data scientists used to spend debugging model hallucinations. We turned data governance from a compliance task into a “quality of service” guarantee.

The lesson for data decision makers

If you are building an AI strategy for 2026, stop buying more GPUs. Stop worrying about which foundation model is slightly higher on the leaderboard this week.

Start auditing your data contracts.

An AI Agent is only as autonomous as its data is reliable. Without a strict, automated data constitution like the Creed framework, your agents will eventually go rogue. In an SRE’s world, a rogue agent is far worse than a broken dashboard. It is a silent killer of trust, revenue, and customer experience.

Manoj Yerrasani is a senior technology executive.