Factify wants to move past PDFs and .docx by giving digital documents their own brain

Tel Aviv-based startup Factify emerged from stealth today with a $73 million seed round for an ambitious, yet quixotic mission: to bring digital documents beyond the standard formats most businesses use — .PDF, .docx, collaborative cloud files like Google Docs — and into the intelligence era.

For Matan Gavish, Factify’s Founder and CEO, this isn’t just a software upgrade—it is an inevitability he has been obsessed with for years.

“The PDF was developed when I was in elementary school,” Gavish told VentureBeat. “The bedrock of the software ecosystem hasn’t really evolved… someone has to redesign the digital document itself.”

Gavish, a tenured professor of computer science and Stanford PhD, admits that his fixation on administrative file formats is an anomaly for someone with his credentials.

“It’s a very uncool problem to be obsessed with,” he says. “Given the fact that my academic background is AI and machine learning, my mom wanted me to start an AI company because it’s cool. I’m not sure why I’m obsessed and then possessed by documents.”

But that obsession has now attracted a sizeable seed round led by Valley Capital Partners and backed by AI heavyweights like former Google AI chief John Giannandrea.

The bet is simple the static rigidity of most digital files has limited their utility, and a better, more intelligent document that actually shares its edit history and ownership with users as intended, is not only possible — it’s a multi-billion-dollar opportunity.

The history of digital documents

To understand why a seed round would balloon to $73 million, you have to understand the scale of the trap businesses are in. There are currently an estimated three trillion PDFs in circulation. “Some people see the PDF more than they see their kids,” Gavish jokes.

The history of the digital document is not a linear progression where one format replaces another. Instead, it is a story of “speciation,” where different formats evolved to fill distinct ecological niches: creation, distribution, and collaboration.

The era of files: Microsoft Word (1980s–1990s)

Digital documents began as isolated artifacts. In the 1980s, the “document” was inextricably linked to the hardware that created it. A file created in WordPerfect on a DOS machine was effectively gibberish to a Macintosh user.

Microsoft Word, which traces its lineage to the pioneering WYSIWYG editors at Xerox PARC, changed this by leveraging the dominance of the Windows operating system. By the 1990s, the binary .doc format became the default container for editable professional documents. However, these files were structurally complex “memory dumps” designed for the limited hardware of the time, often leading to corruption or privacy leaks where deleted text remained hidden in the file’s binary data.

The era of digital ‘stone’: the PDF (1990s-2006)

The PDF did not originate as a tool for writing; it was a tool for viewing. In 1991, Adobe co-founder John Warnock penned the “Camelot Project” white paper, envisioning a “digital envelope” that would look identical on any display or printer.

Unlike Word files, which were malleable, PDFs were designed to be immutable. They used the PostScript imaging model to place characters at precise coordinates, ensuring visual fidelity. While adoption was initially slow, Adobe’s 1994 decision to release the Acrobat Reader for free established PDF as the global standard for “digital concrete”—the format of finality used for contracts, government forms, and archives.

The collaborative cloud docs era (2006-present)

In 2006, Google disrupted the model again by moving the document from the hard drive to the browser. Using “Operational Transformation” algorithms, Google Docs allowed multiple users to edit the same stream of text simultaneously.

This shifted the paradigm from “sending a file” to “sharing a link.” While Google Workspace now claims over 3 billion users (mostly consumers and education), it fundamentally changed how we work—turning documents into living, collaborative processes rather than static artifacts.

The status quo: fragmentation

Despite these advances, the business world remains fragmented. We draft in Google Docs (the “Digital Stream”), format in Word (the “Digital Clay”), and sign in PDF (the “Digital Stone”).

But this fragmentation has a cost. “The problem is not the document. It is everything around it,” the company notes. “Once a PDF leaves your system, control is gone. Versions drift. Access is unclear. Nothing is visible.”

Turning digital documents into intelligent infrastructure

Factify’s wager is that in the age of AI, this fragmentation is no longer just annoying—it is a critical failure. AI models need structured, verifiable data to function.

When an AI “reads” a PDF, it is essentially guessing, using optical character recognition to scrape text from what is effectively a digital photo.

“What we’re dealing with here is a megalomaniac vision, but it’s at the same time probably something that is inevitable,” Gavish says.

Factify’s solution is to treat documents not as static files, but as intelligent infrastructure. In the “Factified” standard, a document carries its own brain. It possesses a unique identity, a live permission system, and an immutable audit log that travels with it.

“We wrote a new document format that supplants the PostScript,” Gavish explains. “We created a new data layer that supports the document as a first class citizen… and it’s always available inside the organization and potentially outside.”

This distinction—between a File and an API—is the core of the company’s pitch”

  • Files are liabilities: They accumulate, get lost, and can be stolen. “It goes back to a brick status,” Gavish says. “Files are liabilities, if anything, because they just accumulate there, you have to guard them.”

  • APIs are assets: A Factify document is an active object. You can ask it questions: “Who has seen you? When do you expire? Are you the most up-to-date version?”

‘People don’t change’, but formats do

History is littered with formats that tried to replace the PDF (like Microsoft’s XPS). They failed because they demanded too much behavioral change from users. Gavish is keenly aware of this trap.

“When I talk to enterprise software entrepreneurs, I tell them the two laws to know about starting a company in enterprise software is that people don’t care, and no one changes,” he says.

To skirt this, Factify has built deep backwards compatibility. A Factified document can look exactly like a PDF, complete with page breaks and margins. Users don’t need to learn a new interface to get value; they just need to solve a specific pain point—like an executive who wants to ensure an investment memo can’t be forwarded.

“All they have to tell their team is, ‘Dear Chief of Staff, employment agreements and investment memoranda… are going to be Factified. The rest carry on,'” Gavish says. “They see immediate benefit… but then they discover that they’ve crossed the Rubicon.”

What’s next for Factify?

The capital from this round will be used to deepen the platform’s core engineering—which Gavish describes as a “heavy engineering lift” requiring them to rebuild the document format, data layer, and application layer from scratch. The company is also establishing a major operational hub in Pittsburgh to support its U.S. expansion.

Ultimately, Factify isn’t trying to build another collaboration tool like Google Docs. They are trying to build the immutable record of the future—the standard for “truth” in a digital world.

“The PDF… became a standard meaning I cannot file my taxes using any other format. This is how victory looks like,” Gavish says. “We are creating a document standard that is not specific for health care or for insurance, but is just document as such.”

For the three trillion static files currently sitting in cloud storage, the writing may finally be on the wall.

Adaptive6 emerges from stealth to reduce enterprise cloud waste (and it’s already optimizing Ticketmaster)

The generative AI era has sped everything up for most enterprises we talk to, especially development cycles (thanks to “vibe coding” and “agentic swarming“).

But even as they seek to leverage the power of new AI-assisted programming tools and coding agents like Claude Code to generate code, enterprises must contend with a looming concern — no, not safety (although that’s another one!): cloud spend.

According to Gartner, public cloud spend will rise 21.3% in 2026 and yet, according to Flexera’s last State of the Cloud report, up to 32% of enterprise cloud spend is actually just wasted resources — duplicated code, non-functional code, outdated code, needless scaffolding, inefficient processes, etc.

Today, a new firm, Adaptive6 emerged from stealth to reduce this cloud waste in realtime — automatically. The company, which also announced $44 million in total funding including a $28 million Series A led by U.S. Venture Partners (USVP), aims to treat cloud waste not as a financial discrepancy, but as a code vulnerability that must be detected and patched.

Co-founded by CEO Aviv Revach, an experienced founder, former Head of Strategy at Taboola, and a former security research team leader for the Israeli Military Intelligence Unit 8200, the idea behind the venture came directly from his experience working in cybersecurity.

“We realized this is not a financial problem; it’s an engineering problem,” Revach told VentureBeat in an exclusive video call interview conducted recently. “We drew on our background in cybersecurity, where to find vulnerabilities, you scan the cloud, identify the issues, map them back to the relevant code, find the responsible developer or engineer, and remediate—or, in some cases, shift left and prevent them altogether… it was obvious that this is exactly what we need to do.”

Adaptive6’s platform introduces a radical shift in how enterprises govern infrastructure: instead of asking finance teams to spot inefficiencies they can’t fix, it empowers engineers to resolve waste directly in their workflow.

By applying the rigor of cybersecurity—scanning, tracing, and remediation—Adaptive6 automates the cleanup of “Shadow Waste” across complex multi-cloud environments.

The shift: from billing to engineering

For years, the industry standard for managing cloud costs has been “visibility”—dashboards that tell you yesterday’s news. Revach argues that visibility without action is just noise.

“The first generation of tools are sort of trying to help on the financial side of the cloud,” Revach told VentureBeat. “They typically deal with the financial aspects of cloud cost… showing you costs going up, costs going down, forecasting, budgeting. But what they don’t really focus on is one of the biggest problems, which is the waste problem.”

According to Revach, the disconnect lies in ownership.

“Just like you have the CISO in cybersecurity trying to get everybody to be thinking about security, you now have the FinOps person trying to get everybody to be thinking about cloud cost.”

Technology: hunting “shadow waste”

The core of Adaptive6’s offering is its “Cloud Cost Governance and Optimization” (CCGO) platform. It doesn’t just look for idle servers; it hunts for what the company calls Shadow Waste—hidden inefficiencies in architecture and application workloads that traditional cost tools often miss.

The system operates without agents, using standard cloud APIs to gain read-only access to environments.

Revach explained to VentureBeat that the platform scans across AWS, GCP, and Azure, as well as PaaS layers like Databricks and Snowflake, and even deep into Kubernetes clusters.

“We have unique technology that basically allows us to match each resource in the cloud [where] we found a problem to the relevant line of code that actually created that problem,” Revach explained.

This “Cloud to Code” technology allows the system to identify the specific engineer who made the change and serve them a fix directly in their workflow (Jira, Slack, or ServiceNow).

Beyond basic resource sizing, the platform analyzes complex configurations, including those for emerging AI workloads.

Revach highlighted a specific technical nuance regarding “provisioned throughput” for Large Language Models (LLMs) on AWS.

He noted that engineers often struggle to balance commitment levels—committing too little risks performance, while committing too much wastes capital. Adaptive6’s engine analyzes these specific usage patterns to recommend the precise throughput commitment needed, a level of granularity that general finance tools lack.

Revach also provided a specific example of “Shadow Waste” involving application-level inefficiencies:

“If you’re using Python… and you’re not using the latest version—right now, version 3.12 made a major change that made it far more efficient,” he said. “Most folks, when they think about cloud cost, they don’t necessarily think of the Python version, so they only think about the size of the machine. By moving to that version, you gain the efficiency so your code just runs faster, and you reduce the cost.”

The AI paradox: both problem and solution

While Adaptive6 uses AI to generate remediation scripts and “1-Click Fixes,” Revach was careful to distinguish their deep-tech approach from generic AI coding agents. In fact, he noted that AI-generated code is often a source of waste itself.

“The code that is produced by AI is many times not that efficient because it was trained on a lot of code that other people wrote that didn’t necessarily take cloud cost optimization and governance into account,” Revach warned.

This is why Adaptive6 relies on a research team of experts rather than just generative models to identify inefficiencies. “Just like with vulnerability research, you see cyber companies getting the best of the best security researchers to find things… we are doing the exact same thing for cost inefficiencies,” Revach said.

Impact and adoption

The platform is already in use by major enterprises, including Ticketmaster, Bayer, and Norstella, with customers reporting 15–35% reductions in total cloud spend.

For global organizations, the ability to decentralized cost management is critical. “As complex as it gets with a big organization, that’s exactly our sweet spot,” Revach noted. He cited one dramatic instance of the tool’s efficacy: “We’ve had a case where one misconfiguration that basically an organization solved actually resulted in more than a million dollars of savings.”

Looking ahead

The system also includes “shift left” prevention capabilities, integrating directly into CI/CD pipelines. This allows the platform to scan code for cost inefficiencies before it ever goes live, effectively blocking expensive architectural mistakes before they are deployed—much like a security scanner blocks vulnerable code.

“We detect what’s already wasting money, prevent new inefficiencies before they deploy, and remediate at scale,” Revach said. By shifting the responsibility left to developers, Adaptive6 suggests the future of cloud cost management won’t be found in a spreadsheet, but in a pull request.

How SAP Cloud ERP enabled Western Sugar’s move to AI-driven automation

Presented by SAP


Ten years ago, Western Sugar made a decision that would prove prescient: move from on-premise SAP ECC to SAP S/4HANA Cloud Public Edition. At the time, artificial intelligence wasn’t a priority on most roadmaps. The company was simply trying to escape what Director of Corporate Controlling, Richard Caluori, calls “a trainwreck:” a heavily customized ERP system so laden with custom ABAP code that it had become unupgradable.

Today, that early cloud adoption is proving to be the foundation for Western Sugar’s AI transformation. As SAP accelerates its rollout of business AI capabilities across finance, supply chain, HR, and more, Western Sugar finds itself uniquely positioned to take advantage of the technology.

“We didn’t move to the cloud thinking about AI,” Caluori says. “But that decision to embrace clean core principles and standardized processes turned out to be exactly what we needed when AI capabilities became available. The clean data, the standardized workflows, the disciplined processes, all of that groundwork we laid for basic operational reasons is now the foundation that makes AI work. We were ready without even knowing it.”

Building AI readiness with a clean core ERP foundation

Western Sugar’s journey began with a familiar enterprise problem: technical debt. Years of on-premise customization had created a system that was nearly impossible to maintain or upgrade.

“Because we were on premise, we could do our own coding in ABAP, and over the years we created such a mess with our internal coding that the software was no longer upgradable,” Caluori explains. “The immediate benefits of moving to public cloud were clear: reduced infrastructure burden and access to standard processes refined by SAP. They’ve been in this business for 40 to 50 years, and they put all their experience into this one solution. Now upgrades just work.”

But the most significant advantage proved to be the clean core philosophy inherent in public cloud deployments, which means the software is maintained and upgraded by SAP. This approach, combined with robust API connectivity, created an environment where Western Sugar’s IT department could easily integrate systems.

In SAP S/4HANA Cloud Public Edition, this model keeps core ERP logic standardized and upgradeable, while extensions and integrations are handled through SAP-supported APIs and services.

“In the end, we have lower total cost of ownership, a better product, and the data quality and process discipline that’s essential for AI adoption,” Caluori says.

This clean core foundation — standardized, continuously updated, and API-connected — is what makes embedded AI viable inside SAP Cloud ERP.

How clean core processes in SAP enabled AI automation

When SAP began rolling out SAP Business AI capabilities inside SAP S/4HANA Cloud Public Edition, Caluori was amped to improve processes through automation and standardization in ways they’d never previously imagined. The company’s first major AI implementation focused on central invoice management.

Today, invoices arrive from external sources and pass through the firewall. If they meet predefined AI confidence thresholds, SAP Business AI automatically posts them with zero human keyboard input. Each transaction is continuously evaluated using a traffic-light model: green items are processed automatically, yellow items are routed for review, and red items are flagged for immediate attention.

“Because the invoice just flows through the system automatically, we’re held to a high standard,” he says. “The AI-driven functionality only works if the whole process chain, from purchasing requisition to purchase order to receiving to issuing is clean, so we’re constantly improving those upstream processes to meet the demands of our AI innovations.”

Quantifying the operational impact of AI automation

Caluori estimates Western Sugar has achieved six-figure direct cost savings through AI automation, not accounting for improved visibility and control.

“When I log into my computer now, I can see immediately in real time what’s going on across the procurement side,” he says. “I have a comprehensive cockpit view that I didn’t have before. Because I have much more visibility, I have much more control over operations.”

The company is now expanding AI adoption into new areas. With Western Sugar’s recent transition to SAP’s three-speed landscape, Caluori is targeting month-end closing processes for AI automation.

“My goal is that AI handles the vast majority of the month-end close,” he says. “Over time, as AI learns what we’re doing and how we close the books, the goal is to automate over 50 percent of the month’s end closing activities. We’re also looking forward to AI-managed procurement networks, and proactive reporting and intelligence, all of which will soon be possible.”

Western Sugar is also developing predictive maintenance AI for its manufacturing equipment — a critical capability for its large-scale facilities, where equipment failures can halt production and lead to losses in the hundreds of thousands of dollars. These efforts build on SAP’s AI and analytics capabilities across asset management and manufacturing systems.

“We’ve started an internal team working on predictive analytics with AI, where the system can tell us in advance if we need to be on alert for specific equipment — that a particular machine could break down in the next two or three days or weeks,” Caluori explains. “If we can proactively address these issues before they cause production stoppages, this will save us millions of dollars.”

Managing organizational change alongside AI adoption

While the technology itself has delivered clear benefits, the organizational impact has been more complex. For Western Sugar, modernizing its core systems early — by moving to SAP’s cloud and embracing standardized, upgrade-driven processes — required not just new workflows, but a fundamental shift in how employees thought about change itself.

For Caluori, that readiness is non-negotiable. “Change management is the number-one key to success,” he says. “We had to do a lot of change management, not only around business processes, but around employee behavior as well.”

That work paid off over time, in part because cloud adoption normalized continuous change. As upgrades became routine rather than disruptive, employees grew more comfortable with evolution as an operating condition.

“Now, when SAP comes with a new upgrade, they know change is coming,” Caluori explains. “The mindset has shifted to being eager to see what improvements the next update will bring.”

And that cultural shift has proven critical as Western Sugar moves beyond system upgrades and into more advanced initiatives.

“Now people are even eager to move into AI — the bigger projects,” he adds.

However, that cultural readiness must be driven from the top. At Western Sugar, executive leadership — many of whom came from large international organizations — understands the competitive necessity of staying current with technology. That top-down commitment has helped normalize continuous change and created the foundation required to pursue AI strategically.

Lessons in AI readiness from Western Sugar

For companies considering their own AI journeys, Western Sugar’s experience offers a clear lesson: AI readiness begins long before AI adoption. The clean core, standardized processes, and strong data quality that Western Sugar established a decade ago, driven purely by the need to escape technical debt, proved to be exactly what AI required. And while Caluori acknowledges the advantage an early start gave them, he says the second-best time to start is now.

“You have to embrace these changes, otherwise you’re left behind,” Caluori says. “That continuous improvement is what SAP provides us, and now with AI capabilities integrated throughout, we’re seeing benefits we couldn’t have imagined when we started this journey.”


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

A European AI challenger goes after GitHub Copilot: Mistral launches Vibe 2.0

Mistral AI, the French artificial intelligence company that has positioned itself as Europe’s leading challenger to American AI giants, announced on Tuesday the general availability of Mistral Vibe 2.0, a significant upgrade to its terminal-based coding agent that’s the startup’s most aggressive push yet into the competitive AI-assisted software development market.

The release is a pivotal moment for the Paris-based company, which is transitioning its developer tools from a free testing phase to a commercial product integrated with its paid subscription plans. The move comes just days after Mistral CEO Arthur Mensch told Bloomberg Television at the World Economic Forum in Davos that the company expects to cross €1 billion in revenue by the end of 2026 — a projection that would still leave it far behind American competitors but would cement its position as Europe’s preeminent AI firm.

“The announcement is more of an upgrade and general availability,” Timothée Lacroix, cofounder of Mistral, said in an exclusive interview with VentureBeat. “We produced Devstral 2 in December, and we released at the time a first version of Vibe. Everything was free and in testing. Now we have finalized and improved the CLI, and we are moving Mistral Vibe to a paid plan that’s bundled with our Le Chat plans.”

Why legacy enterprise code is AI’s blind spot

Mistral Vibe 2.0 arrives as technology executives across industries grapple with a fundamental tension: the promise of AI-powered coding tools is immense, but the most capable models are controlled by a handful of American companies — OpenAI, Anthropic, and Google — whose closed-source approaches leave enterprises with limited control over their most sensitive intellectual property.

Mistral is betting that its open-source approach, combined with deep customization capabilities, will appeal to organizations wary of sending proprietary code to third-party providers. The strategy targets a specific pain point that Lacroix says plagues enterprises with legacy systems.

“The code bases that large enterprise work with are large and have been built upon years and years, and they haven’t seen the web,” Lacroix explained. “They potentially rely on large libraries or large domain-specific languages that are unknown to typical language models. And so what we’re able to do with the Vibe CLI and our models is to go and customize them to a customer’s code base and its specific IP to get an improved experience.”

This customization capability addresses a limitation that has frustrated many enterprise technology leaders: general-purpose AI coding assistants trained on public code repositories often struggle with proprietary frameworks, internal coding conventions, and domain-specific languages that exist only within corporate walls. A bank’s internal trading system, a manufacturer’s proprietary control software, or a pharmaceutical company’s research pipeline may rely on decades of accumulated code written in conventions that no public AI model has ever encountered.

Custom subagents and clarification prompts give developers more control

The updated Vibe CLI introduces several features designed to give developers more granular control over how the AI agent operates. Custom subagents allow organizations to build specialized AI agents for targeted tasks—such as deployment scripts, pull request reviews, or test generation—that can be invoked on demand rather than relying on a single general-purpose assistant.

Multi-choice clarifications are a departure from the behavior of many AI coding tools that attempt to infer developer intent when instructions are ambiguous. Instead, Vibe 2.0 prompts users with options before taking action, reducing the risk of unwanted code changes. Slash-command skills enable developers to load preconfigured workflows for common tasks like deploying, linting, or generating documentation through simple commands. Unified agent modes allow teams to configure custom operational modes that combine specific tools, permissions, and behaviors, enabling developers to switch contexts without switching between different applications. The tool also now ships with continuous updates through the command line, eliminating the need for manual version management.

Mistral Vibe 2.0 is available through two subscription tiers. The Le Chat Pro plan costs $14.99 per month and provides full access to the Vibe CLI and Devstral 2, the underlying model that powers the agent, with students receiving a 50 percent discount. The Le Chat Team plan, priced at $24.99 per seat per month, adds unified billing, administrative controls, and priority support for organizations. 

Both plans include generous usage allowances for sustained development work, with the option to continue beyond limits through pay-as-you-go pricing at API rates. The underlying Devstral 2 model, which previously was offered free through Mistral’s API during a testing period, now moves to paid access with input pricing of $0.40 per million tokens and output pricing of $2.00 per million tokens.

Smaller, denser models challenge the bigger-is-better assumption

The Devstral 2 model family that powers Vibe CLI is Mistral’s bet that smaller, more efficient models can compete with — and in some cases outperform — the massive systems built by better-funded American rivals. Devstral 2, a 123-billion-parameter dense transformer, achieves 72.2 percent on SWE-bench Verified, a widely used benchmark for evaluating AI systems’ ability to solve real-world software engineering problems.

Perhaps more significant for enterprise deployment, the model is roughly five times smaller than DeepSeek V3.2 and eight times smaller than Kimi K2 — Chinese models that have drawn attention for matching American AI systems at a fraction of the cost. The smaller Devstral 2 Small, at 24 billion parameters, can run on consumer hardware including laptops.

“Those two models are dense, which makes it also—I mean, the small one is something that can run on a laptop, really, which is great if you’re working on the train,” Lacroix noted. “But the fact that the larger one is also dense is interesting for on-prem or more resource-constrained usage, where it’s easier to get efficient use of a dense model rather than large mixture of experts, and it requires smaller hardware to start.”

The distinction between dense and mixture-of-experts architectures is technically significant. While mixture-of-experts models can theoretically offer more capability per compute dollar by activating only portions of their parameters for any given task, they require more complex infrastructure to deploy efficiently. Dense models, by contrast, activate all parameters for every computation but are more straightforward to run on conventional hardware — a meaningful consideration for enterprises that want to deploy AI systems on their own infrastructure rather than relying on cloud providers.

Banks and defense contractors want AI that never leaves their walls

For regulated industries — particularly financial services, healthcare, and defense — the question of where AI models run and who has access to the data they process is not merely technical but existential. Banks cannot send proprietary trading algorithms to external AI providers. Healthcare organizations face strict regulations about patient data. Defense contractors operate under security clearances that prohibit sharing sensitive information with foreign entities.

Lacroix suggests that the on-premises deployment capability, while important, is secondary to a more fundamental concern about ownership and control. “The fact that it’s on-prem, I think, is less relevant than the fact that it’s owned by the company and that it’s on wherever they feel safe moving that data — like they’re not shipping the entire code base to a third party,” he said. “I think that’s important.”

This framing positions Mistral not merely as a vendor of AI tools but as a partner in building proprietary AI capabilities that become strategic assets for client organizations. “When we work with a company to then customize them and potentially fine-tune them or continue pre-training them, then they become assets to that company, and they are their own competitive advantage, really,” Lacroix explained.

Mistral has actively cultivated relationships with governments to underscore this positioning. The company serves defense ministries in Europe and Southeast Asia, both directly and through defense contractors. At Davos, Mensch described AI as critical not only to economic sovereignty but to “strategic sovereignty,” noting that autonomous systems like drones require AI capabilities and that deterrence in this domain is increasingly important.

Mistral’s CEO dismisses the idea that China lags in artificial intelligence

Mistral’s positioning as a European alternative to American AI giants takes on added significance amid rising geopolitical tensions. At the World Economic Forum, Mensch was characteristically blunt about the competitive landscape, dismissing claims that Chinese AI development lags the United States as a “fairy tale.”

“China is not behind the West,” Mensch said in his Bloomberg Television interview. The capabilities of China’s open-source technology, he added, are “probably stressing the CEOs in the U.S.”

The comments reflect a broader anxiety in the AI industry about the durability of American technological leadership. Chinese companies including DeepSeek and Alibaba have released open-source models that match or exceed many American systems, often at dramatically lower costs. For Mistral, this competitive pressure validates its strategy of focusing on efficiency and customization rather than attempting to match the massive training runs of better-capitalized American rivals.

European Commission digital chief Henna Virkkunen, also speaking at Davos, underscored the strategic importance of technological sovereignty. “It’s so important that we are not dependent on one country or one company when it comes to some very critical fields of our economy or society,” she said.

For American enterprise customers, Lacroix suggests that Mistral’s European identity and government relationships need not be a concern — and may even be an advantage. “One of the benefits when working as we do, like with open weights, and especially when deploying on customers’ premises and giving them control, is that the wider geopolitics don’t necessarily matter that much,” he said. “I think the benefits of the open-source scene is that it gives you confidence that you know what you’re using, and you’re in total control of it.”

From model maker to enterprise platform signals a strategic pivot

Mistral’s transition from a pure model company to what Lacroix describes as “a full enterprise platform around developing AI applications” reflects a broader maturation in the AI industry. The realization that model weights alone do not capture the full value of AI systems has pushed companies across the sector toward more integrated offerings.

“We don’t think the only value we provide is in the model,” Lacroix said. “We started as a models company. We are now building a full enterprise platform around developing AI applications. We have a part of our company that provides services to integrate deeply. And so the way we make money, and I guess the question behind this is the value that is core to Mistral, is that full-stack solution to getting to the ROI of AI.”

This full-stack approach includes fine-tuning on internal languages and domain-specific languages, reinforcement learning with customer-specific environments, and end-to-end code modernization services that can migrate entire codebases to modern technology stacks. Mistral says it already delivers these solutions to some of the world’s largest organizations in finance, defense, and infrastructure.

The revenue milestone Mensch projected at Davos — crossing €1 billion by year’s end — would represent remarkable growth for a company founded in 2023. But it would still leave Mistral far behind American competitors whose valuations stretch into the hundreds of billions. OpenAI, now reportedly valued at more than $150 billion, and Anthropic, valued at approximately $60 billion, operate at a scale that Mistral cannot match through organic growth alone. To close the gap, Mistral is looking at acquisitions. “We are in the process of looking at a few opportunities,” Mensch said at Davos, though he declined to specify target business areas or geographic regions. The company’s September fundraise brought in €1.7 billion, with Dutch semiconductor equipment giant ASML joining as a key investor, valuing Mistral at €11.7 billion.

The coding assistant wars are just getting started

Looking beyond the immediate product announcement, Lacroix sees the current generation of AI coding tools as a transitional phase toward more autonomous software development. “For a few tasks, it’s already becoming the default entry point — like if I want to prototype something, or if I want to quickly iterate on an idea. I think it’s already faster,” he said. “What I see today is there is still some story that needs to happen on how you do the work asynchronously and in a way where it’s easy to orchestrate several tasks and several improvements on the same code base in a flow that feels natural.”

The current experience, he suggests, does not yet feel like having “your own team of developers that can really 10x yourself.” But he expects rapid improvement, driven by abundant training data and intense industry interest. Perhaps more ambitiously, Lacroix sees the file-manipulation and tool-calling capabilities built for coding as applicable far beyond software development. “What I’m really excited about is the use of these tools outside of coding,” he said. “The really strong realization is you now have an agent that is great at working with a file system, that can edit information and that expands its context a lot, and it’s really great at using all sorts of tools. Those tools don’t need to be necessarily related to coding, really.”

For chief technology officers and engineering leaders evaluating AI coding tools, Mistral’s announcement crystallizes the strategic choice now facing enterprises: accept the convenience and raw capability of closed-source American models, or bet on the flexibility and control of open-source alternatives that can be customized and deployed behind corporate firewalls. Human evaluations comparing Devstral 2 against Claude Sonnet 4.5 showed that Anthropic’s model was “significantly preferred,” according to Mistral’s own benchmarking — an acknowledgment that closed-source leaders retain advantages that efficiency and customization cannot fully offset.

But Lacroix is betting that for enterprises with proprietary code, legacy systems, and regulatory constraints, customization will matter more than raw performance on public benchmarks. “The point is that you can now get all of this vibe coding disruption and goodness in an environment where customization is needed, which was difficult before,” he said. “And that’s, I think, the main point that we’re making with this announcement.”

The AI coding wars, in other words, are no longer just about which model writes the best code. They’re about who gets to own the model that understands yours.

Anthropic embeds Slack, Figma and Asana inside Claude, turning AI chat into a workplace command center

Anthropic announced Monday that users can now open and interact with popular business applications directly inside Claude, the company’s AI assistant—a significant expansion that transforms the chatbot from a conversational tool into an integrated workspace where employees can build project timelines, draft Slack messages, create presentations, and visualize data without switching browser tabs.

The rollout, which goes live today, includes integrations with Amplitude, Asana, Box, Canva, Clay, Figma, Hex, Monday.com, and Slack. Salesforce integration is coming soon. The feature marks a new chapter in Anthropic’s aggressive push to dominate enterprise AI, arriving just days after the company’s CEO made headlines at Davos with bold predictions about AI replacing white-collar workers.

“MCP Apps are an extension to the core MCP protocol and are part of the open source MCP ecosystem,” Sean Strong, Anthropic’s product manager for MCP Apps, told VentureBeat in an exclusive interview. “Within Claude.ai, connectors require a paid Claude plan — Pro, Max, Team, or Enterprise — but there is no additional charge associated with using connectors.”

That pricing decision is notable. Rather than monetizing integrations separately or charging partners for distribution, Anthropic is bundling interactive tools into existing subscription tiers — a strategy designed to accelerate adoption and deepen Claude’s foothold in corporate environments where the company reportedly already leads OpenAI.

Inside MCP Apps, the open-source technology that lets Claude control your favorite work tools

The technical foundation is what Anthropic calls “MCP Apps,” a new extension to the Model Context Protocol, the open standard for connecting external tools to AI applications that Anthropic open-sourced last year. MCP Apps allow any MCP server to deliver an interactive user interface within any supporting AI product—meaning the technology isn’t limited to Claude.

In practice, the integrations allow for surprisingly granular control. Users can build analytics charts in Amplitude and adjust parameters interactively to explore trends. They can turn conversations into Asana projects with tasks and timelines that sync automatically. They can prompt Claude to generate flowcharts or Gantt charts in Figma’s collaborative whiteboard tool, FigJam. They can draft Slack messages, preview formatting, and review before posting.

The Hex integration may prove particularly valuable for data teams: users can ask data questions in natural language and receive answers complete with interactive charts, tables, and citations — effectively turning Claude into a business intelligence interface.

“We open sourced MCP to give the ecosystem a universal way to connect tools to AI,” the company said in its announcement blog. “Now we’re extending MCP further so developers can build interactive UI on top of it, wherever their users are.”

What happens when AI can send messages and create projects on your behalf

With AI systems increasingly capable of taking real-world actions — sending messages, creating projects, publishing content — the question of guardrails becomes critical. Can an employee accidentally send an unreviewed Slack message or publish an incomplete Canva presentation?

Strong addressed this directly. “Most major MCP clients, including Claude, provide consent prompts that help users determine if they want to take an action via a MCP server,” he said.

For enterprise deployments, IT administrators retain control. “Team and Enterprise admins have the ability to control which MCP servers users in their organizations have the ability to use,” Strong explained.

The consent-prompt approach is a middle ground between full autonomy and cumbersome approval workflows. But it also places significant responsibility on individual users to review actions before confirming them — a design choice that may draw scrutiny as AI agents become capable of more consequential decisions.

The security concerns are not hypothetical. As Fortune reported last week, Anthropic’s Claude Code product faces vulnerabilities including “prompt injections,” where attackers hide malicious instructions in web content to manipulate AI behavior. The company has implemented multiple security layers, including running some features in virtual machines and adding deletion protection after users accidentally removed files. “Agent safety—that is, the task of securing Claude’s real-world actions—is still an active area of development in the industry,” Anthropic has acknowledged.

Claude Code’s viral success set the stage for Anthropic’s enterprise ambitions

The interactive tools announcement arrives at a moment of unusual momentum for Anthropic. Claude Code, the company’s coding assistant released in February 2024, has become a viral hit that has captured attention far beyond its intended developer audience.

Originally built for software developers, Claude Code has captured attention far beyond its intended audience. Non-programmers have deployed it to book theater tickets, file taxes, and monitor tomato plants. Nvidia CEO Jensen Huang called it “incredible.” Even Microsoft, which sells the competing GitHub Copilot, has widely adopted Claude Code internally, with non-developers reportedly encouraged to use it.

Boris Cherny, Anthropic’s head of Claude Code, told Fortune that his team built Cowork — a user-friendly version of the coding product for non-programmers — in approximately a week and a half, largely using Claude Code itself. “Engineers just feel unshackled, that they don’t have to work on all the tedious stuff anymore,” Cherny said.

Claude Code is now used by Uber, Netflix, Spotify, Salesforce, Accenture, and Snowflake, according to Anthropic. Claude’s total web audience has more than doubled since December 2024, and daily unique visitors on desktop are up 12% globally year-to-date, according to data from Similarweb and Sensor Tower published by The Wall Street Journal.

The company is also reportedly planning a $10 billion fundraising round that would value Anthropic at $350 billion — a staggering figure that reflects investor confidence in the company’s enterprise traction.

Anthropic’s CEO stirred controversy at Davos with predictions about AI replacing workers

The interactive tools launch also arrives against a backdrop of intense debate about AI’s impact on employment — a debate that Anthropic’s own CEO helped intensify at the World Economic Forum in Davos last week.

Dario Amodei told a Davos audience that AI models would replace the work of all software developers within a year and would reach “Nobel-level” scientific research in multiple fields within two years. He predicted that 50% of white-collar jobs would disappear within five years.

“I have engineers within Anthropic who say ‘I don’t write any code anymore. I just let the model write the code, I edit it,'” Amodei said. “We might be six to 12 months away from when the model is doing most, maybe all of what software engineers do end-to-end.”

Not everyone agrees with that timeline. Demis Hassabis, the Nobel Prize-winning CEO of Google DeepMind, said at the same conference that today’s AI systems are “nowhere near” human-level artificial general intelligence. Yann LeCun, the Turing Award-winning AI pioneer who recently left Meta to found Advanced Machine Intelligence Labs, went further, arguing that large language models “will never be able to achieve humanlike intelligence” and that a completely different approach is needed.

Why embedding AI into daily workflows could create powerful lock-in for enterprises

Anthropic’s integration strategy reflects a broader shift in enterprise AI competition. The battleground is moving from model benchmarks and capability demonstrations toward workflow integration — the degree to which AI systems become embedded in how companies actually operate.

By making Claude the interface through which employees interact with Asana, Slack, Figma, and other daily tools, Anthropic is positioning itself not merely as an AI provider but as a workflow orchestration layer. The more actions that flow through Claude, the harder it becomes for enterprises to switch to a competitor.

This approach mirrors strategies that proved successful for earlier generations of enterprise software. Salesforce built its dominance partly by becoming the system of record for customer data. Slack grew by centralizing workplace communication. Anthropic appears to be betting that AI assistants can occupy a similar position — the default starting point for work itself.

The open-source foundation of MCP may accelerate this strategy. By making the protocol available to any developer, Anthropic encourages a broad ecosystem of integrations that all funnel through MCP-compatible clients — of which Claude is the most prominent. The company benefits from network effects even as it maintains the standard is open.

The race to become the operating system for AI-powered work is just getting started

The launch notably excludes some major enterprise platforms. Salesforce integration is listed as “coming soon,” and there’s no mention of Microsoft 365, Google Workspace, or other productivity suites that dominate corporate environments. Those gaps may limit initial adoption in organizations heavily invested in those ecosystems.

The feature is available on web and desktop for paid Claude plans, with support for Claude Cowork — the file management agent launched last week — coming later. Mobile support was not mentioned in the announcement.

For enterprises evaluating Claude against OpenAI’s offerings and other competitors, the interactive integrations represent a tangible differentiator. The ability to take action within business tools — rather than simply generating text that users must copy elsewhere — addresses a persistent friction point in AI adoption.

Whether that advantage proves durable depends on how quickly competitors respond. OpenAI has its own enterprise ambitions and partnerships. Google is integrating Gemini across its productivity suite. Microsoft continues to deepen Copilot’s presence in Office applications.

But the larger significance may be what today’s announcement signals about where enterprise software is headed. For decades, the default unit of work has been the application — the spreadsheet, the project tracker, the messaging platform. Anthropic is wagering that the future belongs to the AI layer that sits above them all.

If the company is right, the question for every enterprise software vendor becomes uncomfortably simple: Do you want to be the tool, or the thing that controls the tools?

The era of agentic AI demands a data constitution, not better prompts

The industry consensus is that 2026 will be the year of “agentic AI.” We are rapidly moving past chatbots that simply summarize text. We are entering the era of autonomous agents that execute tasks. We expect them to book flights, diagnose system outages, manage cloud infrastructure and personalize media streams in real-time.

As a technology executive overseeing platforms that serve 30 million concurrent users during massive global events like the Olympics and the Super Bowl, I have seen the unsexy reality behind the hype: Agents are incredibly fragile.

Executives and VCs obsess over model benchmarks. They debate Llama 3 versus GPT-4. They focus on maximizing context window sizes. Yet they are ignoring the actual failure point. The primary reason autonomous agents fail in production is often due to data hygiene issues.

In the previous era of “human-in-the-loop” analytics, data quality was a manageable nuisance. If an ETL pipeline experiences an issue, a dashboard may display an incorrect revenue number. A human analyst would spot the anomaly, flag it and fix it. The blast radius was contained.

In the new world of autonomous agents, that safety net is gone.

If a data pipeline drifts today, an agent doesn’t just report the wrong number. It takes the wrong action. It provisions the wrong server type. It recommends a horror movie to a user watching cartoons. It hallucinates a customer service answer based on corrupted vector embeddings.

To run AI at the scale of the NFL or the Olympics, I realized that standard data cleaning is insufficient. We cannot just “monitor” data. We must legislate it.

A solution to this specific problem could be in the form of a ‘data quality – creed’ framework. It functions as a ‘data constitution.’ It enforces thousands of automated rules before a single byte of data is allowed to touch an AI model. While I applied this specifically to the streaming architecture at NBCUniversal, the methodology is universal for any enterprise looking to operationalize AI agents.

Here is why “defensive data engineering” and the Creed philosophy are the only ways to survive the Agentic era.

The vector database trap

The core problem with AI Agents is that they trust the context you give them implicitly. If you are using RAG, your vector database is the agent’s long-term memory.

Standard data quality issues are catastrophic for vector databases. In traditional SQL databases, a null value is just a null value. In a vector database, a null value or a schema mismatch can warp the semantic meaning of the entire embedding.

Consider a scenario where metadata drifts. Suppose your pipeline ingests video metadata, but a race condition causes the “genre” tag to slip. Your metadata might tag a video as “live sports,” but the embedding was generated from a “news clip.” When an agent queries the database for “touchdown highlights,” it retrieves the news clip because the vector similarity search is operating on a corrupted signal. The agent then serves that clip to millions of users.

At scale, you cannot rely on downstream monitoring to catch this. By the time an anomaly alarm goes off, the agent has already made thousands of bad decisions. Quality controls must shift to the absolute “left” of the pipeline.

The “Creed” framework: 3 principles for survival

The Creed framework is expected to act as a gatekeeper. It is a multi-tenant quality architecture that sits between ingestion sources and AI models.

For technology leaders looking to build their own “constitution,” here are the three non-negotiable principles I recommend.

1. The “quarantine” pattern is mandatory: In many modern data organizations, engineers favor the “ELT” approach. They dump raw data into a lake and clean it up later. For AI Agents, this is unacceptable. You cannot let an agent drink from a polluted lake.

The Creed methodology enforces a strict “dead letter queue.” If a data packet violates a contract, it is immediately quarantined. It never reaches the vector database. It is far better for an agent to say “I don’t know” due to missing data than to confidently lie due to bad data. This “circuit breaker” pattern is essential for preventing high-profile hallucinations.

2. Schema is law: For years, the industry moved toward “schemaless” flexibility to move fast. We must reverse that trend for core AI pipelines. We must enforce strict typing and referential integrity.

In my experience, a robust system requires scale. The implementation I oversee currently enforces more than 1,000 active rules running across real-time streams. These aren’t just checking for nulls. They check for business logic consistency.

  • Example: Does the “user_segment” in the event stream match the active taxonomy in the feature store? If not, block it.

  • Example: Is the timestamp within the acceptable latency window for real-time inference? If not, drop it.

3. Vector consistency checks This is the new frontier for SREs. We must implement automated checks to ensure that the text chunks stored in a vector database actually match the embedding vectors associated with them. “Silent” failures in an embedding model API often leave you with vectors that point to nothing. This causes agents to retrieve pure noise.

The culture war: Engineers vs. governance

Implementing a framework like Creed is not just a technical challenge. It is a cultural one.

Engineers generally hate guardrails. They view strict schemas and data contracts as bureaucratic hurdles that slow down deployment velocity. When introducing a data constitution, leaders often face pushback. Teams feel they are returning to the “waterfall” era of rigid database administration.

To succeed, you must flip the incentive structure. We demonstrated that Creed was actually an accelerator. By guaranteeing the purity of the input data, we eliminated the weeks data scientists used to spend debugging model hallucinations. We turned data governance from a compliance task into a “quality of service” guarantee.

The lesson for data decision makers

If you are building an AI strategy for 2026, stop buying more GPUs. Stop worrying about which foundation model is slightly higher on the leaderboard this week.

Start auditing your data contracts.

An AI Agent is only as autonomous as its data is reliable. Without a strict, automated data constitution like the Creed framework, your agents will eventually go rogue. In an SRE’s world, a rogue agent is far worse than a broken dashboard. It is a silent killer of trust, revenue, and customer experience.

Manoj Yerrasani is a senior technology executive.

Conversational AI doesn’t understand users — ‘Intent First’ architecture does

The modern customer has just one need that matters: Getting the thing they want when they want it. The old standard RAG model embed+retrieve+LLM misunderstands intent, overloads context and misses freshness, repeatedly sending customers down the wrong paths.

Instead, intent-first architecture uses a lightweight language model to parse the query for intent and context, before delivering to the most relevant content sources (documents, APIs, people).

Enterprise AI is a speeding train headed for a cliff. Organizations are deploying LLM-powered search applications at a record pace, while a fundamental architectural issue is setting most up for failure.

A recent Coveo study revealed that 72% of enterprise search queries fail to deliver meaningful results on the first attempt, while Gartner also predicts that the majority of conversational AI deployments have been falling short of enterprise expectations.

The problem isn’t the underlying models. It’s the architecture around them.

After designing and running live AI-driven customer interaction platforms at scale, serving millions of customer and citizen users at some of the world’s largest telecommunications and healthcare organizations, I’ve come to see a pattern. It’s the difference between successful AI-powered interaction deployments and multi-million-dollar failures.

It’s a cloud-native architecture pattern that I call Intent-First. And it’s reshaping the way enterprises build AI-powered experiences.

The $36 pillion problem 

Gartner projects the global conversational AI market will balloon to $36 billion by 2032. Enterprises are scrambling to get a slice. The demos are irresistible. Plug your LLM into your knowledge base, and suddenly it can answer customer questions in natural language.Magic. 

Then production happens. 

A major telecommunications provider I work with rolled out a RAG system with the expectation of driving down the support call rate. Instead, the rate increased. Callers tried AI-powered search, were provided incorrect answers with a high degree of confidence and called customer support angrier than before.

This pattern is repeated over and over. In healthcare, customer-facing AI assistants are providing patients with formulary information that’s outdated by weeks or months. Financial services chatbots are spitting out answers from both retail and institutional product content. Retailers are seeing discontinued products surface in product searches.

The issue isn’t a failure of AI technology. It’s a failure of architecture

Why standard RAG architectures fail 

The standard RAG pattern — embedding the query, retrieving semantically similar content, passing to an LLM —works beautifully in demos and proof of concepts. But it falls apart in production use cases for three systematic reasons:

1. The intent gap

Intent is not context. But standard RAG architectures don’t account for this.

Say a customer types “I want to cancel” What does that mean? Cancel a service? Cancel an order? Cancel an appointment? During our telecommunications deployment, we found that 65% of queries for “cancel” were actually about orders or appointments, not service cancellation. The RAG system had no way of understanding this intent, so it consistently returned service cancellation documents.

Intent matters. In healthcare, if a patient is typing “I need to cancel” because they’re trying to cancel an appointment, a prescription refill or a procedure, routing them to medication content from scheduling is not only frustrating — it’s also dangerous.

2. Context flood 

Enterprise knowledge and experience is vast, spanning dozens of sources such as product catalogs, billing, support articles, policies, promotions and account data. Standard RAG models treat all of it the same, searching all for every query.

When a customer asks “How do I activate my new phone,” they don’t care about billing FAQs, store locations or network status updates. But a standard RAG model retrieves semantically similar content from every source, returning search results that are a half-steps off the mark.

3. Freshness blindspot 

Vector space is timeblind. Semantically, last quarter’s promotion is identical to this quarter’s. But presenting customers with outdated offers shatters trust. We linked a significant percentage of customer complaints to search results that surfaced expired products, offers, or features.

The Intent-First architecture pattern 

The Intent-First architecture pattern is the mirror image of the standard RAG deployment. In the RAG model, you retrieve, then route. In the Intent-First model, you classify before you route or retrieve.

Intent-First architectures use a lightweight language model to parse a query for intent and context, before dispatching to the most relevant content sources (documents, APIs, agents).

Comparison: Intent-first vs standard RAG

Cloud-native implementation

The Intent-First pattern is designed for cloud-native deployment, leveraging microservices, containerization and elastic scaling to handle enterprise traffic patterns.

Intent classification service

The classifier determines user intent before any retrieval occurs:

ALGORITHM: Intent Classification

INPUT: user_query (string)

OUTPUT: intent_result (object)

1. PREPROCESS query (normalize, expand contractions)

2. CLASSIFY using transformer model:

   – primary_intent ← model.predict(query)

   – confidence ← model.confidence_score()

3. IF confidence < 0.70 THEN

   – RETURN {

       requires_clarification: true,

       suggested_question: generate_clarifying_question(query)

     }

4. EXTRACT sub_intent based on primary_intent:

   – IF primary = “ACCOUNT” → check for ORDER_STATUS, PROFILE, etc.

   – IF primary = “SUPPORT” → check for DEVICE_ISSUE, NETWORK, etc.

   – IF primary = “BILLING” → check for PAYMENT, DISPUTE, etc.

5. DETERMINE target_sources based on intent mapping:

   – ORDER_STATUS → [orders_db, order_faq]

   – DEVICE_ISSUE → [troubleshooting_kb, device_guides]

   – MEDICATION → [formulary, clinical_docs] (healthcare)

6. RETURN {

     primary_intent,

     sub_intent,

     confidence,

     target_sources,

     requires_personalization: true/false

   }

Context-aware retrieval service

Once intent is classified, retrieval becomes targeted:

ALGORITHM: Context-Aware Retrieval

INPUT: query, intent_result, user_context

OUTPUT: ranked_documents

1. GET source_config for intent_result.sub_intent:

   – primary_sources ← sources to search

   – excluded_sources ← sources to skip

   – freshness_days ← max content age

2. IF intent requires personalization AND user is authenticated:

   – FETCH account_context from Account Service

   – IF intent = ORDER_STATUS:

       – FETCH recent_orders (last 60 days)

       – ADD to results

3. BUILD search filters:

   – content_types ← primary_sources only

   – max_age ← freshness_days

   – user_context ← account_context (if available)

4. FOR EACH source IN primary_sources:

   – documents ← vector_search(query, source, filters)

   – ADD documents to results

5. SCORE each document:

   – relevance_score ← vector_similarity × 0.40

   – recency_score ← freshness_weight × 0.20

   – personalization_score ← user_match × 0.25

   – intent_match_score ← type_match × 0.15

   – total_score ← SUM of above

6. RANK by total_score descending

7. RETURN top 10 documents

Healthcare-specific considerations

In healthcare deployments, the Intent-First pattern includes additional safeguards:

Healthcare intent categories:

  • Clinical: Medication questions, symptoms, care instructions

  • Coverage: Benefits, prior authorization, formulary

  • Scheduling: Appointments, provider availability

  • Billing: Claims, payments, statements

  • Account: Profile, dependents, ID cards

Critical safeguard: Clinical queries always include disclaimers and never replace professional medical advice. The system routes complex clinical questions to human support.

Handling edge cases

The edge cases are where systems fail. The Intent-First pattern includes specific handlers:

Frustration detection keywords:

  • Anger: “terrible,” “worst,” “hate,” “ridiculous”

  • Time: “hours,” “days,” “still waiting”

  • Failure: “useless,” “no help,” “doesn’t work”

  • Escalation: “speak to human,” “real person,” “manager”

When frustration is detected, skip search entirely and route to human support.

Cross-industry applications

The Intent-First pattern applies wherever enterprises deploy conversational AI over heterogeneous content:

Industry

Intent categories

Key benefit

Telecommunications

Sales, Support, Billing, Account, Retention

Prevents “cancel” misclassification

Healthcare

Clinical, Coverage, Scheduling, Billing

Separates clinical from administrative

Financial services

Retail, Institutional, Lending, Insurance

Prevents context mixing

Retail

Product, Orders, Returns, Loyalty

Ensures promotional freshness

Results

After implementing Intent-First architecture across telecommunications and healthcare platforms:

Metric

Impact

Query success rate

Nearly doubled

Support escalations

Reduced by more than half

Time to resolution

Reduced approximately 70%

User satisfaction

Improved roughly 50%

Return user rate

More than doubled

The return user rate proved most significant. When search works, users come back. When it fails, they abandon the channel entirely, increasing costs across all other support channels.

The strategic imperative

The conversational AI market will continue to experience hyper growth.

But enterprises that build and deploy typical RAG architectures will continue to fail … repeatedly.

AI will confidently give wrong answers, users will abandon digital channels out of frustration and support costs will go up instead of down.

Intent-First is a fundamental shift in how enterprises need to architect and build AI-powered customer conversations. It’s not about better models or more data. It’s about understanding what a user wants before you try to help them.

The sooner an organization realizes this as an architectural imperative, the sooner they will be able to capture the efficiency gains this technology is supposed to enable. Those that don’t will be debugging why their AI investments haven’t been producing expected business outcomes for many years to come.

The demo is easy. Production is hard. But the pattern for production success is clear: Intent First.

Sreenivasa Reddy Hulebeedu Reddy is a lead software engineer and enterprise architect

Railway secures $100 million to challenge AWS with AI-native cloud infrastructure

Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million in a Series B funding round, as surging demand for artificial intelligence applications exposes the limitations of legacy cloud infrastructure.

TQ Ventures led the round, with participation from FPV Ventures, Redpoint, and Unusual Ventures. The investment values Railway as one of the most significant infrastructure startups to emerge during the AI boom, capitalizing on developer frustration with the complexity and cost of traditional platforms like Amazon Web Services and Google Cloud.

“As AI models get better at writing code, more and more people are asking the age-old question: where, and how, do I run my applications?” said Jake Cooper, Railway’s 28-year-old founder and chief executive, in an exclusive interview with VentureBeat. “The last generation of cloud primitives were slow and outdated, and now with AI moving everything faster, teams simply can’t keep up.”

The funding is a dramatic acceleration for a company that has charted an unconventional path through the cloud computing industry. Railway raised just $24 million in total before this round, including a $20 million Series A from Redpoint in 2022. The company now processes more than 10 million deployments monthly and handles over one trillion requests through its edge network — metrics that rival far larger and better-funded competitors.

Why three-minute deploy times have become unacceptable in the age of AI coding assistants

Railway’s pitch rests on a simple observation: the tools developers use to deploy and manage software were designed for a slower era. A standard build-and-deploy cycle using Terraform, the industry-standard infrastructure tool, takes two to three minutes. That delay, once tolerable, has become a critical bottleneck as AI coding assistants like Claude, ChatGPT, and Cursor can generate working code in seconds.

“When godly intelligence is on tap and can solve any problem in three seconds, those amalgamations of systems become bottlenecks,” Cooper told VentureBeat. “What was really cool for humans to deploy in 10 seconds or less is now table stakes for agents.”

The company claims its platform delivers deployments in under one second — fast enough to keep pace with AI-generated code. Customers report a tenfold increase in developer velocity and up to 65 percent cost savings compared to traditional cloud providers.

These numbers come directly from enterprise clients, not internal benchmarks. Daniel Lobaton, chief technology officer at G2X, a platform serving 100,000 federal contractors, measured deployment speed improvements of seven times faster and an 87 percent cost reduction after migrating to Railway. His infrastructure bill dropped from $15,000 per month to approximately $1,000.

“The work that used to take me a week on our previous infrastructure, I can do in Railway in like a day,” Lobaton said. “If I want to spin up a new service and test different architectures, it would take so long on our old setup. In Railway I can launch six services in two minutes.”

Inside the controversial decision to abandon Google Cloud and build data centers from scratch

What distinguishes Railway from competitors like Render and Fly.io is the depth of its vertical integration. In 2024, the company made the unusual decision to abandon Google Cloud entirely and build its own data centers, a move that echoes the famous Alan Kay maxim: “People who are really serious about software should make their own hardware.”

“We wanted to design hardware in a way where we could build a differentiated experience,” Cooper said. “Having full control over the network, compute, and storage layers lets us do really fast build and deploy loops, the kind that allows us to move at ‘agentic speed’ while staying 100 percent the smoothest ride in town.”

The approach paid dividends during recent widespread outages that affected major cloud providers — Railway remained online throughout.

This soup-to-nuts control enables pricing that undercuts the hyperscalers by roughly 50 percent and newer cloud startups by three to four times. Railway charges by the second for actual compute usage: $0.00000386 per gigabyte-second of memory, $0.00000772 per vCPU-second, and $0.00000006 per gigabyte-second of storage. There are no charges for idle virtual machines — a stark contrast to the traditional cloud model where customers pay for provisioned capacity whether they use it or not.

“The conventional wisdom is that the big guys have economies of scale to offer better pricing,” Cooper noted. “But when they’re charging for VMs that usually sit idle in the cloud, and we’ve purpose-built everything to fit much more density on these machines, you have a big opportunity.”

How 30 employees built a platform generating tens of millions in annual revenue

Railway has achieved its scale with a team of just 30 employees generating tens of millions in annual revenue — a ratio of revenue per employee that would be exceptional even for established software companies. The company grew revenue 3.5 times last year and continues to expand at 15 percent month-over-month.

Cooper emphasized that the fundraise was strategic rather than necessary. “We’re default alive; there’s no reason for us to raise money,” he said. “We raised because we see a massive opportunity to accelerate, not because we needed to survive.”

The company hired its first salesperson only last year and employs just two solutions engineers. Nearly all of Railway’s two million users discovered the platform through word of mouth — developers telling other developers about a tool that actually works.

“We basically did the standard engineering thing: if you build it, they will come,” Cooper recalled. “And to some degree, they came.”

From side projects to Fortune 500 deployments: Railway’s unlikely corporate expansion

Despite its grassroots developer community, Railway has made significant inroads into large organizations. The company claims that 31 percent of Fortune 500 companies now use its platform, though deployments range from company-wide infrastructure to individual team projects.

Notable customers include Bilt, the loyalty program company; Intuit’s GoCo subsidiary; TripAdvisor’s Cruise Critic; and MGM Resorts. Kernel, a Y Combinator-backed startup providing AI infrastructure to over 1,000 companies, runs its entire customer-facing system on Railway for $444 per month.

“At my previous company Clever, which sold for $500 million, I had six full-time engineers just managing AWS,” said Rafael Garcia, Kernel’s chief technology officer. “Now I have six engineers total, and they all focus on product. Railway is exactly the tool I wish I had in 2012.”

For enterprise customers, Railway offers security certifications including SOC 2 Type 2 compliance and HIPAA readiness, with business associate agreements available upon request. The platform provides single sign-on authentication, comprehensive audit logs, and the option to deploy within a customer’s existing cloud environment through a “bring your own cloud” configuration.

Enterprise pricing starts at custom levels, with specific add-ons for extended log retention ($200 monthly), HIPAA BAAs ($1,000), enterprise support with SLOs ($2,000), and dedicated virtual machines ($10,000).

The startup’s bold strategy to take on Amazon, Google, and a new generation of cloud rivals

Railway enters a crowded market that includes not only the hyperscale cloud providers—Amazon Web Services, Microsoft Azure, and Google Cloud Platform—but also a growing cohort of developer-focused platforms like Vercel, Render, Fly.io, and Heroku.

Cooper argues that Railway’s competitors fall into two camps, neither of which has fully committed to the new infrastructure model that AI demands.

“The hyperscalers have two competing systems, and they haven’t gone all-in on the new model because their legacy revenue stream is still printing money,” he observed. “They have this mammoth pool of cash coming from people who provision a VM, use maybe 10 percent of it, and still pay for the whole thing. To what end are they actually interested in going all the way in on a new experience if they don’t really need to?”

Against startup competitors, Railway differentiates by covering the full infrastructure stack. “We’re not just containers; we’ve got VM primitives, stateful storage, virtual private networking, automated load balancing,” Cooper said. “And we wrap all of this in an absurdly easy-to-use UI, with agentic primitives so agents can move 1,000 times faster.”

The platform supports databases including PostgreSQL, MySQL, MongoDB, and Redis; provides up to 256 terabytes of persistent storage with over 100,000 input/output operations per second; and enables deployment to four global regions spanning the United States, Europe, and Southeast Asia. Enterprise customers can scale to 112 vCPUs and 2 terabytes of RAM per service.

Why investors are betting that AI will create a thousand times more software than exists today

Railway’s fundraise reflects broader investor enthusiasm for companies positioned to benefit from the AI coding revolution. As tools like GitHub Copilot, Cursor, and Claude become standard fixtures in developer workflows, the volume of code being written — and the infrastructure needed to run it — is expanding dramatically.

“The amount of software that’s going to come online over the next five years is unfathomable compared to what existed before — we’re talking a thousand times more software,” Cooper predicted. “All of that has to run somewhere.”

The company has already integrated directly with AI systems, building what Cooper calls “loops where Claude can hook in, call deployments, and analyze infrastructure automatically.” Railway released a Model Context Protocol server in August 2025 that allows AI coding agents to deploy applications and manage infrastructure directly from code editors.

“The notion of a developer is melting before our eyes,” Cooper said. “You don’t have to be an engineer to engineer things anymore — you just need critical thinking and the ability to analyze things in a systems capacity.”

What Railway plans to do with $100 million and zero marketing experience

Railway plans to use the new capital to expand its global data center footprint, grow its team beyond 30 employees, and build what Cooper described as a proper go-to-market operation for the first time in the company’s five-year history.

“One of my mentors said you raise money when you can change the trajectory of the business,” Cooper explained. “We’ve built all the required substrate to scale indefinitely; what’s been holding us back is simply talking about it. 2026 is the year we play on the world stage.”

The company’s investor roster reads like a who’s who of developer infrastructure. Angel investors include Tom Preston-Werner, co-founder of GitHub; Guillermo Rauch, chief executive of Vercel; Spencer Kimball, chief executive of Cockroach Labs; Olivier Pomel, chief executive of Datadog; and Jori Lallo, co-founder of Linear.

The timing of Railway’s expansion coincides with what many in Silicon Valley view as a fundamental shift in how software gets made. Coding assistants are no longer experimental curiosities — they have become essential tools that millions of developers rely on daily. Each line of AI-generated code needs somewhere to run, and the incumbents, by Cooper’s telling, are too wedded to their existing business models to fully capitalize on the moment.

Whether Railway can translate developer enthusiasm into sustained enterprise adoption remains an open question. The cloud infrastructure market is littered with promising startups that failed to break the grip of Amazon, Microsoft, and Google. But Cooper, who previously worked as a software engineer at Wolfram Alpha, Bloomberg, and Uber before founding Railway in 2020, seems unfazed by the scale of his ambition.

“In five years, Railway [will be] the place where software gets created and evolved, period,” he said. “Deploy instantly, scale infinitely, with zero friction. That’s the prize worth playing for, and there’s no bigger one on offer.”

For a company that built a $100 million business by doing the opposite of what conventional startup wisdom dictates — no marketing, no sales team, no venture hype—the real test begins now. Railway spent five years proving that developers would find a better mousetrap on their own. The next five will determine whether the rest of the world is ready to get on board.

Why LinkedIn says prompting was a non-starter — and small models was the breakthrough

LinkedIn is a leader in AI recommender systems, having developed them over the last 15-plus years. But getting to a next-gen recommendation stack for the job-seekers of tomorrow required a whole new technique. The company had to look beyond off-the-shelf models to achieve next-level accuracy, latency, and efficiency.

“There was just no way we were gonna be able to do that through prompting,” Erran Berger, VP of product engineering at LinkedIn, says in a new Beyond the Pilot podcast. “We didn’t even try that for next-gen recommender systems because we realized it was a non-starter.”

Instead, his team set to develop a highly detailed product policy document to fine-tune an initially massive 7-billion-parameter model; that was then further distilled into additional teacher and student models optimized to hundreds of millions of parameters. 

The technique has created a repeatable cookbook now reused across LinkedIn’s AI products. 

“Adopting this eval process end to end will drive substantial quality improvement of the likes we probably haven’t seen in years here at LinkedIn,” Berger says. 

Why multi-teacher distillation was a ‘breakthrough’ for LinkedIn 

Berger and his team set out to build an LLM that could interpret individual job queries, candidate profiles and job descriptions in real time, and in a way that mirrored LinkedIn’s product policy as accurately as possible. 

Working with the company’s product management team, engineers eventually built out a 20-to-30-page document scoring job description and profile pairs “across many dimensions.” 

“We did many, many iterations on this,” Berger says. That product policy document was then paired with a “golden dataset” comprising thousands of pairs of queries and profiles; the team fed this into ChatGPT during data generation and experimentation, prompting the model over time to learn scoring pairs and eventually generate a much larger synthetic data set to train a 7-billion-parameter teacher model.

However, Berger says, it’s not enough to have an LLM running in production just on product policy. “At the end of the day, it’s a recommender system, and we need to do some amount of click prediction and personalization.” 

So, his team used that initial product policy-focused teacher model to develop a second teacher model oriented toward click prediction. Using the two, they further distilled a 1.7 billion parameter model for training purposes. That eventual student model was run through “many, many training runs,” and was optimized “at every point” to minimize quality loss, Berger says. 

This multi-teacher distillation technique allowed the team to “achieve a lot of affinity” to the original product policy and “land” click prediction, he says. They were also able to “modularize and componentize” the training process for the student.

Consider it in the context of a chat agent with two different teacher models: One is training the agent on accuracy in responses, the other on tone and how it should communicate. Those two things are very different, yet critical, objectives, Berger notes. 

“By now mixing them, you get better outcomes, but also iterate on them independently,” he says. “That was a breakthrough for us.” 

Changing how teams work together

Berger says he can’t understate the importance of anchoring on a product policy and an iterative eval process. 

Getting a “really, really good product policy” requires translating product manager domain expertise into a unified document. Historically, Berger notes, the product management team was laser focused on strategy and user experience, leaving modeling iteration approaches to ML engineers. Now, though, the two teams work together to “dial in” and create an aligned teacher model. 

“How product managers work with machine learning engineers now is very different from anything we’ve done previously,” he says. “It’s now a blueprint for basically any AI products we do at LinkedIn.”

Watch the full podcast to hear more about: 

  • How LinkedIn optimized every step of the R&D process to support velocity, leading to real results with days or hours rather than weeks; 

  • Why teams should develop pipelines for plugability and experimentation and try out different models to support flexibility; 

  • The continued importance of traditional engineering debugging.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

TrueFoundry launches TrueFailover to automatically reroute enterprise AI traffic during model outages

When OpenAI went down in December, one of TrueFoundry’s customers faced a crisis that had nothing to do with chatbots or content generation. The company uses large language models to help refill prescriptions. Every second of downtime meant thousands of dollars in lost revenue — and patients who could not access their medications on time.

TrueFoundry, an enterprise AI infrastructure company, announced Wednesday a new product called TrueFailover designed to prevent exactly that scenario. The system automatically detects when AI providers experience outages, slowdowns, or quality degradation, then seamlessly reroutes traffic to backup models and regions before users notice anything went wrong.

“The challenge is that in the AI world, failover is no longer that simple,” said Nikunj Bajaj, co-founder and chief executive of TrueFoundry, in an exclusive interview with VentureBeat. “When you move from one model to another, you also have to consider things like output quality, latency, and whether the prompt even works the same way. In many cases, the prompt needs to be adjusted in real-time to prevent results from degrading. That is not something most teams are set up to manage manually.”

The announcement arrives at a pivotal moment for enterprise AI adoption. Companies have moved far beyond experimentation. AI now powers prescription refills at pharmacies, generates sales proposals, assists software developers, and handles customer support inquiries. When these systems fail, the consequences ripple through entire organizations.

Why enterprise AI systems remain dangerously dependent on single providers

Large language models from OpenAI, Anthropic, Google, and other providers have become essential infrastructure for thousands of businesses. But unlike traditional cloud services from Amazon Web Services or Microsoft Azure — which offer robust uptime guarantees backed by decades of operational experience — AI providers operate complex, resource-intensive systems that remain prone to unexpected failures.

“Major LLM providers experience outages, slowdowns, or latency spikes every few weeks or months, and we regularly see the downstream impact on businesses that rely on a single provider,” Bajaj told VentureBeat.

The December OpenAI outage that affected TrueFoundry’s pharmacy customer illustrates the stakes. “At their scale, even seconds of downtime can translate into thousands of dollars in lost revenue,” Bajaj explained. “Beyond the economic impact, there is also a human consequence when patients cannot access prescriptions on time. Because this customer had our failover solution in place, they were able to reroute requests to another model provider within minutes of detecting the outage. Without that setup, recovery would likely have taken hours.”

The problem extends beyond complete outages. Partial failures — where a model slows down or produces lower-quality responses without going fully offline — can quietly destroy user experience and violate service-level agreements. These “slow but technically up” scenarios often prove more damaging than dramatic crashes because they evade traditional monitoring systems while steadily eroding performance.

Inside the technology that keeps AI applications online when providers fail

TrueFailover operates as a resilience layer on top of TrueFoundry’s AI Gateway, which already processes more than 10 billion requests per month for Fortune 1000 companies. The system weaves together several interconnected capabilities into a unified safety net for enterprise AI.

At its core, the product enables multi-model failover by allowing enterprises to define primary and backup models across providers. If OpenAI becomes unavailable, traffic automatically shifts to Anthropic, Google’s Gemini, Mistral, or self-hosted alternatives. The routing happens transparently, without requiring application teams to rewrite code or manually intervene.

The system extends this protection across geographic boundaries through multi-region and multi-cloud resilience. By distributing AI endpoints across zones and cloud providers, health-based routing can detect problems in specific regions and divert traffic to healthy alternatives. What would otherwise become a global incident transforms into an invisible infrastructure adjustment that users never perceive.

Perhaps most critically, TrueFailover employs degradation-aware routing that continuously monitors latency, error rates, and quality signals. “We look at a combination of signals that together indicate when a model’s performance is starting to degrade,” Bajaj explained. “Large language models are shared resources. Providers run the same model instance across many customers, so when demand spikes for one user or workload, it can affect everyone else using that model.”

The system watches for rising response times, increasing error rates, and patterns suggesting instability. “Individually, none of these signals tell the full story,” Bajaj said. “But taken together, they allow us to detect early signs that a model is slowing down or becoming unreliable. Those signals feed into an AI-driven system that can decide when and how to reroute traffic before users experience a noticeable drop in quality.”

Strategic caching rounds out the protection by shielding providers from sudden traffic spikes and preventing rate-limit cascades during high-demand periods. This allows systems to absorb demand surges and provider limits without brownouts or throttling surprises.

The approach represents a fundamental shift in how enterprises should think about AI reliability. “TrueFailover is designed to handle that complexity automatically,” Bajaj said. “It continuously monitors how models behave across many customers and use cases, looks for early warning signs like rising latency, and takes action before things break. Most individual enterprises do not have that kind of visibility because they are only able to see their own systems.”

The engineering challenge of switching models without sacrificing output quality

One of the thorniest challenges in AI failover involves maintaining consistent output quality when switching between models. A prompt optimized for GPT-5 may produce different results on Claude or Gemini. TrueFoundry addresses this through several mechanisms that balance speed against precision.

“Some teams rely on the fact that large models have become good enough that small differences in prompts do not materially affect the output,” Bajaj explained. “In those cases, switching from one provider to another can happen with some visible impact — that’s not ideal, but some teams choose to do it.”

More sophisticated implementations maintain provider-specific prompts for the same application. “When traffic shifts from one model to another, the prompt shifts with it,” Bajaj said. “In that case, failover is not just switching models. It is switching to a configuration that has already been tested.”

TrueFailover automates this process. The system dynamically routes requests and adjusts prompts based on which model handles the query, keeping quality within acceptable ranges without manual intervention. The key, Bajaj emphasized, is that “failover is planned, not reactive. The logic, prompts, and guardrails are defined ahead of time, which is why end users typically do not notice when a switch happens.”

Importantly, many failover scenarios do not require changing providers at all. “It can be routing traffic from the same model in one region to another region, such as from the East Coast to the West Coast, where no prompt changes are required,” Bajaj noted. This geographic flexibility provides a first line of defense before more complex cross-provider switches become necessary.

How regulated industries can use AI failover without compromising compliance

For enterprises in healthcare, financial services, and other regulated sectors, the prospect of AI traffic automatically routing to different providers raises immediate compliance concerns. Patient data cannot simply flow to whichever model happens to be available. Financial records require strict controls over where they travel. TrueFoundry built explicit guardrails to address these constraints.

“TrueFailover will never route data to a model or provider that an enterprise has not explicitly approved,” Bajaj said. “Everything is controlled through an admin configuration layer where teams set clear guardrails upfront.”

Enterprises define exactly which models qualify for failover, which providers can receive traffic, and even which regions or model categories — such as closed-source versus open-source — are acceptable. Once those rules take effect, TrueFailover operates only within them.

“If a model is not on the approved list, it is simply not an option for routing,” Bajaj emphasized. “There is no scenario where traffic is automatically sent somewhere unexpected. The idea is to give teams full control over compliance and data boundaries, while still allowing the system to respond quickly when something goes wrong. That way, reliability improves without compromising security or regulatory requirements.”

This design reflects lessons learned from TrueFoundry’s existing enterprise deployments. A Fortune 50 healthcare company already uses the platform to handle more than 500 million IVR calls annually through an agentic AI system. That customer required the ability to run workloads across both cloud and on-premise infrastructure while maintaining strict data residency controls — exactly the kind of hybrid environment where failover policies must be precisely defined.

Where automatic failover cannot help and what enterprises must plan for

TrueFoundry acknowledges that TrueFailover cannot solve every reliability problem. The system operates within the guardrails enterprises configure, and those configurations determine what protection is possible.

“If a team allows failover from a large, high-capacity model to a much smaller model without adjusting prompts or expectations, TrueFailover cannot guarantee the same output quality,” Bajaj explained. “The system can route traffic, but it cannot make a smaller model behave like a larger one without appropriate configuration.”

Infrastructure constraints also limit protection. If an enterprise hosts its own models and all of them run on the same GPU cluster, TrueFailover cannot help when that infrastructure fails. “When there is no alternate infrastructure available, there is nothing to fail over to,” Bajaj said.

The question of simultaneous multi-provider failures occasionally surfaces in enterprise risk discussions. Bajaj argues this scenario, while theoretically possible, rarely matches reality. “In practice, ‘going down’ usually does not mean an entire provider is offline across all models and regions,” he explained. “What happens far more often is a slowdown or disruption in a specific model or region because of traffic spikes or capacity issues.”

When that occurs, failover can happen at multiple levels — from on-premise to cloud, cloud to on-premise, one region to another, one model to another, or even within the same provider before switching providers entirely. “That alone makes it very unlikely that everything fails at once,” Bajaj said. “The key point is that reliability is built on layers of redundancy. The more providers, regions, and models that are included in the guardrails, the smaller the chance that users experience a complete outage.”

A startup that built its platform inside Fortune 500 AI deployments

TrueFoundry has established itself as infrastructure for some of the world’s largest AI deployments, providing crucial context for its failover ambitions. The company raised $19 million in Series A funding in February 2025, led by Intel Capital with participation from Eniac Ventures, Peak XV Partners, and Jump Capital. Angel investors including Gokul Rajaram and Mohit Aron also joined the round, bringing total funding to $21 million.

The San Francisco-based company was founded in 2021 by Bajaj and co-founders Abhishek Choudhary and Anuraag Gutgutia, all former Meta engineers who met as classmates at IIT Kharagpur. Initially focused on accelerating machine learning deployments, TrueFoundry pivoted to support generative AI capabilities as the technology went mainstream in 2023.

The company’s customer roster demonstrates enterprise-scale adoption that few AI infrastructure startups can match. Nvidia employs TrueFoundry to build multi-agent systems that optimize GPU cluster utilization across data centers worldwide — a use case where even small improvements in utilization translate into substantial business impact given the insatiable demand for GPU capacity. Adopt AI routes more than 15 million requests and 40 billion input tokens through TrueFoundry’s AI Gateway to power its enterprise agentic workflows.

Gaming company Games 24×7 serves machine learning models to more than 100 million users through the platform at scales exceeding 200 requests per second. Digital adoption platform Whatfix migrated to a microservices architecture on TrueFoundry, reducing its release cycle sixfold and cutting testing time by 40 percent.

TrueFoundry currently reports more than 30 paid customers worldwide and has indicated it exceeded $1.5 million in annual recurring revenue last year while quadrupling its customer base. The company manages more than 1,000 clusters for machine learning workloads across its client base.

TrueFailover will be offered as an add-on module on top of the existing TrueFoundry AI Gateway and platform, with pricing following a usage-based model tied to traffic volume along with the number of users, models, providers, and regions involved. An early access program for design partners opens in the coming weeks.

Why traditional cloud uptime guarantees may never apply to AI providers

Enterprise technology buyers have long demanded uptime commitments from infrastructure providers. Amazon Web Services, Microsoft Azure, and Google Cloud all offer service-level agreements with financial penalties for failures. Will AI providers eventually face similar expectations?

Bajaj sees fundamental constraints that make traditional SLAs difficult to achieve in the current generation of AI infrastructure. “Most foundational LLMs today operate as shared resources, which is what enables the standard pricing you see publicly advertised,” he explained. “Providers do offer higher uptime commitments, but that usually means dedicated capacity or reserved infrastructure, and the cost increases significantly.”

Even with substantial budgets, enterprises face usage quotas that create unexpected exposure. “If traffic spikes beyond those limits, requests can still spill back into shared infrastructure,” Bajaj said. “That makes it hard to achieve the kind of hard guarantees enterprises are used to with cloud providers.”

The economics of running large language models create additional barriers that may persist for years. “LLMs are still extremely complex and expensive to run. They require massive infrastructure and energy, and we do not expect a near-term future where most companies run multiple, fully dedicated model instances just to guarantee uptime.”

This reality drives demand for solutions like TrueFailover that provide resilience regardless of what individual providers can promise. “Enterprises are realizing that reliability cannot come from the model provider alone,” Bajaj said. “It requires additional layers of protection to handle the realities of how these systems operate today.”

The new calculus for companies that built AI into critical business processes

The timing of TrueFoundry’s announcement reflects a fundamental shift in how enterprises use AI — and what they stand to lose when it fails. What began as internal experimentation has evolved into customer-facing applications where disruptions directly affect revenue and reputation.

“Many enterprises experimented with Gen AI and agentic systems in the past, and production use cases were largely internal-facing,” Bajaj observed. “There was no immediate impact on their top line or the public perception of the enterprise.”

That era has ended. “Now that these enterprises have launched public-facing applications, where both the top line and public perception can be impacted if an outage occurs, the stakes are much higher than they were even six months ago. That’s why we are seeing more and more attention on this now.”

For companies that have woven AI into critical business processes — from prescription refills to customer support to sales operations — the calculus has changed entirely. The question is no longer which model performs best on benchmarks or which provider offers the most compelling features. The question that now keeps technology leaders awake is far simpler and far more urgent: what happens when the AI disappears at the worst possible moment?

Somewhere, a pharmacist is filling a prescription. A customer support agent is resolving a complaint. A sales team is generating a proposal for a deal that closes tomorrow. All of them depend on AI systems that depend on providers that, despite their scale and sophistication, still go dark without warning.

TrueFoundry is betting that enterprises will pay handsomely to ensure those moments of darkness never reach the people who matter most — their customers.