Is agentic AI ready to reshape Global Business Services?

Presented by EdgeVerveBefore addressing Global Business Services (GBS), let’s take a step back. Can agentic AI, the type of AI able to take goal-driven action, transform not just GBS but any kind of enterprise? And has it done so yet? As with many new …

What AI builders can learn from fraud models that run in 300 milliseconds

Fraud protection is a race against scale. 

For instance, Mastercard’s network processes roughly 160 billion transactions a year, and experiences surges of 70,000 transactions a second during peak periods (like the December holiday rush). Finding the fraudulent purchases among those — without chasing false alarms — is an incredible task, which is why fraudsters have been able to game the system. 

But now, sophisticated AI models can probe down to individual transactions, pinpointing the ones that seem suspicious — in milliseconds’ time. This is the heart of Mastercard’s flagship fraud platform, Decision Intelligence Pro (DI Pro). 

“DI Pro is specifically looking at each transaction and the risk associated with it,” Johan Gerber, Mastercard’s EVP of security solutions, said in a recent VB Beyond the Pilot podcast. “The fundamental problem we’re trying to solve here is assessing in real time.”

How DI Pro works

Mastercard’s DI Pro was built for latency and speed. From the moment a consumer taps a card or clicks “buy,” that transaction flows through Mastercard’s orchestration layer, back onto the network, and then on to the issuing bank. Typically, this occurs in less than 300 milliseconds. 

Ultimately, the bank makes the approve-or-decline decision, but the quality of that decision depends on Mastercard’s ability to deliver a precise, contextualized risk score based on whether the transaction could be fraudulent. Complicating this whole process is the fact that they’re not looking for anomalies, per se; they’re looking for transactions that, by design, are similar to consumer behavior. 

At the core of DI Pro is a recurrent neural network (RNN) that Mastercard refers to as an “inverse recommender” architecture. This treats fraud detection as a recommendation problem; the RNN performs a pattern completion exercise to identify how merchants relate to one another. 

As Gerber explained: “Here’s where they’ve been before, here’s where they are right now. Does this make sense for them? Would we have recommended this merchant to them?” 

Chris Merz, SVP of data science at MasterCard, explained that the fraud problem can be broken down into two sub components: A user’s pattern behavior and a fraudster’s pattern behavior. “And we’re trying to tease those two things out,” he said. 

Another “neat technique,” he said, is how Mastercard approaches data sovereignty, or when data is subject to the laws and governance structures in the region where it is collected, processed, or stored. To keep data “on soil,” the company’s fraud team relies on aggregated, “completely anonymized” data that is not sensitive to any privacy concerns and thus can be shared with models globally. 

“So you still can have the global patterns influencing every local decision,” said Gerber. “We take a year’s worth of knowledge and squeeze it into a single transaction in 50 milliseconds to say yes or no, this is good or this is bad.”

Scamming the scammers

While AI is helping financial companies like Mastercard, it’s helping fraudsters, too; now, they’re able to rapidly develop new techniques and identify new avenues to exploit.  

Mastercard is fighting back by engaging cyber criminals on their turf. One way they’re doing so is by using “honeypots,” or artificial environments meant to essentially “trap” cyber criminals. When threat actors think they’ve got a legitimate mark, AI agents engage with them in the hopes of accessing mule accounts used to funnel money. That becomes “extremely powerful,” Gerber said, because defenders can apply graph techniques to determine how and where mule accounts are connected to legitimate accounts. 

Because in the end, to get their payout, scammers need a legitimate account somewhere, linked to mule accounts, even if it’s cloaked 10 layers down. When defenders can identify these, they can map global fraud networks.

“It’s a wonderful thing when we take the fight to them, because they cause us enough pain as it is,” Gerber said. 

Listen to the podcast to learn more about: 

  • How Mastercard created a “malware sandbox” with Recorded Future; 

  • Why a data science engineering requirements document (DSERD) was essential to align four separate engineering teams;

  • The importance of “relentless prioritization” and tough decision-making to move beyond “a thousand flowers blooming” to projects that actually have a strong business impact;

  • Why successful AI deployment should incorporate three phases: ideation, activation, and implementation — but many enterprises skip the second step. 

Listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

The missing layer between agent connectivity and true collaboration

Today’s AI challenge is about agent coordination, context, and collaboration. How do you enable them to truly think together, with all the contextual understanding, negotiation, and shared purpose that entails? It’s a critical next step toward a new kind of distributed intelligence that keeps humans firmly in the loop.

At the latest stop on VentureBeat’s AI Impact Series, Vijoy Pandey, SVP and GM of Outshift by Cisco, and Noah Goodman, Stanford professor and co-founder of Humans&, sat down to talk about how to move beyond agents that just connect to agents that are steeped in collective intelligence.

The need for collective intelligence, not coordinated actions

The core challenge, Pandey said, is that “agents today can connect together, but they can’t really think together.”

While protocols like MCP and A2A have solved basic connectivity, and AGNTCY tackles the problems of discovery, identity management to inter-agent communication and observability, they’ve only addressed the equivalent of making a phone call between two people who don’t speak the same language. But Pandey’s team has identified something deeper than technical plumbing: the need for agents to achieve collective intelligence, not just coordinated actions.

How shared intent and shared knowledge enable collective innovation

To understand where multi-agent AI needs to go, both speakers pointed to the history of human intelligence. While humans became individually intelligent roughly 300,000 years ago, true collective intelligence didn’t emerge until around 70,000 years ago with the advent of sophisticated language.

This breakthrough enabled three critical capabilities: shared intent, shared knowledge, and collective innovation.

“Once you have a shared intent, a shared goal, you have a body of knowledge that you can modify, evolve, build upon, you can then go towards collective innovation,” Pandey said.

Goodman, whose work bridges computer science and psychology, explained that language is far more than just encoding and decoding information.

“Language is this kind of encoding that requires understanding the context, the intention of the speaker, the world, how that affects what people will say in order to figure out what people mean,” he said.

This sophisticated understanding is what scaffolds human collaboration and cumulative cultural evolution, and it’s what is currently missing from agent-to-agent interaction.

Addressing the gaps with the Internet of Cognition

“We have to mimic human evolution,” Pandey explained. “In addition to agents getting smarter and smarter, just like individual humans, we need to build infrastructure that enables collective innovation, which implies sharing intent, coordination, and then sharing knowledge or context and evolving that context.”

Pandey calls it the Internet of Cognition: a three-layer architecture designed to enable collective thinking among heterogeneous agents:

Protocol layer: Beyond basic connectivity, these protocols enable understanding, handling intent sharing, coordination, negotiation, and discovery between agents from different vendors and organizations.

Fabric layer: A shared memory system that allows agents to build and evolve collective context, with emergent properties arising from their interactions.

Cognition engine layer: Accelerators and guardrails that help agents think faster while operating within necessary constraints around compliance, security, and cost.

The difficulty is that organizations need to build collective intelligence across organizational boundaries.

“Think about shared memory in a heterogeneous way,” Pandey said. “We have agents from different parties coming together. So how do you evolve that memory and have emergent properties?”

New foundation training protocols to advance agent connection

At Humans&, rather than relying solely on additional protocols, Goodman’s team is fundamentally changing how foundation models are trained not only between a human and an agent, but between a human and multiple agents, and especially between an agent and multiple humans.

“By changing the training that we give to the foundation models and centering the training over extremely long horizon interactions, they’ll come to understand how interactions should proceed in order to achieve the right long-term outcomes,” he said.

And, he adds, it’s a deliberate divergence from the longer-autonomy path pursued by many large labs.

“Our goal is not longer and longer autonomy. It’s better and better collaboration,” he said. “Humans& is building agents with deep social understanding: entities that know who knows what, can foster collaboration, and put the right specialists in touch at the right time.”

Establishing guardrails that support cognition

Guardrails remain a central challenge in deploying multi-functional agents that touch every part of an organization’s system. The question is how to enforce boundaries without stifling innovation. Organizations need strict, rule-like guardrails, but humans don’t actually work that way. Instead, people operate on a principle of minimal harm, or thinking ahead about consequences and making contextual judgments.

“How do we provide the guardrails in a way which is rule-like, but also supports the outcome-based cognition when the models get smart enough for that?” Goodman asked.

Pandey extended this thinking to the reality of innovation teams that need to apply the rules with judgment, not just follow them mechanically. Figuring out what’s open to interpretation is a “very collaborative task,” he said. “And you don’t figure that out through a set of predicates. You don’t figure that out through a document. You figure that out through common understanding and grounding and discovery and negotiation.”

Distributed intelligence: the path to superintelligence

True superintelligence won’t come from increasingly powerful individual models, but from distributed systems.

“While we build better and better models, and better and better agents, eventually we feel that true super intelligence will happen through distributed systems,” Pandey said

Intelligence will scale along two axes, both vertical, or better individual agents, and horizontal, or more collaborative networks, in a manner very similar to traditional distributed computing.

However, said Goodman, “We can’t move towards a future where the AIs go off and work by themselves. We have to move towards a future where there’s an integrated ecosystem, a distributed ecosystem that seamlessly merges humans and AI together.”

What the OpenClaw moment means for enterprises: 5 big takeaways

The “OpenClaw moment” represents the first time autonomous AI agents have successfully “escaped the lab” and moved into the hands of the general workforce.

Originally developed by Austrian engineer Peter Steinberger as a hobby project called “Clawdbot” in November 2025, the framework went through a rapid branding evolution to “Moltbot” before settling on “OpenClaw” in late January 2026.

Unlike previous chatbots, OpenClaw is designed with “hands”—the ability to execute shell commands, manage local files, and navigate messaging platforms like WhatsApp and Slack with persistent, root-level permissions.

This capability — and the uptake of what was then called Moltbot by many AI power users on X — directly led another entrepreneur, Matt Schlicht, to develop Moltbook, a social network where thousands of OpenClaw-powered agents autonomously sign up and interact.

The result has been a series of bizarre, unverified reports that have set the tech world ablaze: agents reportedly forming digital “religions” like Crustafarianism, hiring human micro-workers for digital tasks on another website, “Rentahuman,” and in some extreme unverified cases, attempting to lock their own human creators out of their credentials.

For IT leaders, the timing is critical. This week, the release of Claude Opus 4.6 and OpenAI’s Frontier agent creation platform signaled that the industry is moving from single agents to “agent teams.”

Simultaneously, the “SaaSpocalypse“—a massive market correction that wiped over $800 billion from software valuations—has proven that the traditional seat-based licensing model is under existential threat.

So how should enterprise technical decision-makers think through this fast-moving start to the year, and how can they start to understand what OpenClaw means for their businesses? I spoke to a small group of leaders at the forefront of enterprise AI adoption this week to get their thoughts. Here’s what I learned:

1. The death of over-engineering: productive AI works on “garbage” data

The prevailing wisdom once suggested that enterprises needed massive infrastructure overhauls and perfectly curated data sets before AI could be useful. The OpenClaw moment has shattered that myth, proving that modern models can navigate messy, uncurated data by treating “intelligence as a service.”

“The first takeaway is the amount of preparation that we need to do to make AI productive,” says Tanmai Gopal, Co-founder & CEO at PromptQL, a well-funded enterprise data engineering and consulting firm. “There is a surprising insight there: you actually don’t need to do too much preparation. Everybody thought we needed new software and new AI-native companies to come and do things. It will catalyze more disruption as leadership realizes that we don’t actually need to prep so much to get AI to be productive. We need to prep in different ways. You can just let it be and say, ‘go read all of this context and explore all of this data and tell me where there are dragons or flaws.'”

“The data is already there,” agreed Rajiv Dattani, co-founder of AIUC (the AI Underwriting Corporation), which has developed the AIUC-1 standard for AI agents as part of a consortium with leaders from Anthropic, Google, CISCO, Stanford and MIT. “But the compliance and the safeguards, and most importantly, the institutional trust is not. How can you ensure your agentic systems don’t go off and go full MechaHitler and start offending people or causing problems?”

Hence why Dattani’s company, AUIC, provides a certification standard, AIUC-1, that enterprises can put agents through in order to obtain insurance that backs them up in event they do cause problems. Without putting OpenClaw agents or other similar agents through such a process, enterprises are likely less ready to accept the consequences and costs of autonomy gone awry.

2. The rise of the “secret cyborgs”: shadow IT is the new normal

With OpenClaw amassing over 160,000 GitHub stars, employees are deploying local agents through the back door to stay productive.

This creates a “Shadow IT” crisis where agents often run with full user-level permissions, potentially creating backdoors into corporate systems (as Wharton School of Business Professor Ethan Mollick has written, many employees are secretly adopting AI to get ahead at work and obtain more leisure time, without informing superiors or the organization).

Now, executives are actually observing this trend in realtime as employees deploy OpenClaw on work machines without authorization.

“It’s not an isolated, rare thing; it’s happening across almost every organization,” warns Pukar Hamal, CEO & Founder of enterprise AI security diligence firm SecurityPal. “There are companies finding engineers who have given OpenClaw access to their devices. In larger enterprises, you’re going to notice that you’ve given root-level access to your machine. People want tools so tools can do their jobs, but enterprises are concerned.”

Brianne Kimmel, Founder & Managing Partner of venture capital firm Worklife Ventures, views this through a talent-retention lens. “People are trying these on evenings and weekends, and it’s hard for companies to ensure employees aren’t trying the latest technologies. From my perspective, we’ve seen how that really allows teams to stay sharp. I have always erred on the side of encouraging, especially early-career folks, to try all of the latest tools.”

3. The collapse of seat-based pricing as a viable business model

The 2026 “SaaSpocalypse” saw massive value erased from software indices as investors realized agents could replace human headcount.

If an autonomous agent can perform the work of dozens of human users, the traditional “per-seat” business model becomes a liability for legacy vendors.

“If you have AI that can log into a product and do all the work, why do you need 1,000 users at your company to have access to that tool?” Hamal asks. “Anyone that does user-based pricing—it’s probably a real concern. That’s probably what you’re seeing with the decay in SaaS valuations, because anybody that is indexed to users or discrete units of ‘jobs to be done’ needs to rethink their business model.”

4. Transitioning to an “AI coworker” model

The release of Claude Opus 4.6 and OpenAI’s Frontier this week already signals a shift from single agents to coordinated “agent teams.”

In this environment, the volume of AI-generated code and content is so high that traditional human-led review is no longer physically possible.

“Our senior engineers just cannot keep up with the volume of code being generated; they can’t do code reviews anymore,” Gopal notes. “Now we have an entirely different product development lifecycle where everyone needs to be trained to be a product person. Instead of doing code reviews, you work on a code review agent that people maintain. You’re looking at software that was 100% vibe-coded… it’s glitchy, it’s not perfect, but dude, it works.”

“The productivity increases are impressive,” Dattani concurred. “It’s clear that we are at the onset of a major shift in business globally, but each business will need to approach that slightly differently depending on their specific data security and safety requirements. Remember that even while you’re trying to outdo your competition, they are bound by the same rules and regulations as you — and it’s worth it to take time to get it right, start small, don’t try to do too much at once.”

5. Future outlook: voice interfaces, personality, and global scaling

The experts I spoke to all see a future where “vibe working” becomes the norm.

Local, personality-driven AI—including through voice interfaces like Wispr or ElevenLabs powered OpenClaw agents—will become the primary interface for work, while agents handle the heavy lifting of international expansion.

“Voice is the primary interface for AI; it keeps people off their phones and improves quality of life,” says Kimmel. “The more you can give AI a personality that you’ve uniquely designed, the better the experience. Previously, you’d need to hire a GM in a new country and build a translation team. Now, companies can think international from day one with a localized lens.”

Hamal adds a broader perspective on the global stakes: “We have knowledge worker AGI. It’s proven it can be done. Security is a concern that will rate-limit enterprise adoption, which means they’re more vulnerable to disruption from the low end of the market who don’t have the same concerns.”

Best practices for enterprise leaders seeking to embrace agentic AI capabilities at work

As OpenClaw and similar autonomous frameworks proliferate, IT departments must move beyond blanket bans toward structured governance. Use the following checklist to manage the “Agentic Wave” safely:

  • Implement Identity-Based Governance: Every agent must have a strong, attributable identity tied to a human owner or team. Use frameworks like IBC (Identity, Boundaries, Context) to track who an agent is and what it is allowed to do at any moment.

  • Enforce Sandbox Requirements: Prohibit OpenClaw from running on systems with access to live production data. All experimentation should occur in isolated, purpose-built sandboxes on segregated hardware.

  • Audit Third-Party “Skills”: Recent reports indicate nearly 20% of skills in the ClawHub registry contain vulnerabilities or malicious code. Mandate a “white-list only” policy for approved agent plugins.

  • Disable Unauthenticated Gateways: Early versions of OpenClaw allowed “none” as an authentication mode. Ensure all instances are updated to current versions where strong authentication is mandatory and enforced by default.

  • Monitor for “Shadow Agents”: Use endpoint detection tools to scan for unauthorized OpenClaw installations or abnormal API traffic to external LLM providers.

  • Update AI Policy for Autonomy: Standard Generative AI policies often fail to address “agents.” Update policies to explicitly define human-in-the-loop requirements for high-risk actions like financial transfers or file system modifications.

OpenAI’s GPT-5.3-Codex drops as Anthropic upgrades Claude — AI coding wars heat up ahead of Super Bowl ads

OpenAI on Wednesday released GPT-5.3-Codex, which the company calls its most capable coding agent to date, in an announcement timed to land at the exact same moment Anthropic unveiled its own flagship model upgrade, Claude Opus 4.6. The synchronized launches mark the opening salvo in what industry observers are calling the AI coding wars — a high-stakes battle to capture the enterprise software development market.

The dueling announcements came amid an already heated week between the two AI giants, who are also set to air competing Super Bowl advertisements on Sunday, and whose executives have been trading barbs publicly over business models, access, and corporate ethics.

“I love building with this model; it feels like more of a step forward than the benchmarks suggest,” OpenAI CEO Sam Altman wrote on X minutes after the launch. He later added: “It was amazing to watch how much faster we were able to ship 5.3-Codex by using 5.3-Codex, and for sure this is a sign of things to come.”

That claim — that the model helped build itself — is a significant milestone in AI development. According to OpenAI’s announcement, the Codex team used early versions of GPT-5.3-Codex to debug its own training runs, manage deployment infrastructure, and diagnose test results and evaluations. The company describes it as “our first model that was instrumental in creating itself.”

OpenAI’s new coding model posts record-breaking benchmark scores, outpacing Anthropic’s Claude by double digits

The new model posts substantial gains across multiple industry benchmarks. GPT-5.3-Codex achieves 57% on SWE-Bench Pro, a rigorous evaluation of real-world software engineering that spans four programming languages and tests contamination-resistant, industrially relevant challenges. It scores 77.3% on Terminal-Bench 2.0, which measures the terminal skills essential for coding agents, and 64% on OSWorld, an agentic computer-use benchmark where models must complete productivity tasks in visual desktop environments.

The Terminal-Bench 2.0 result is particularly striking. According to performance data released Wednesday, GPT-5.3-Codex scored 77.3% compared to GPT-5.2-Codex’s 64.0% and the base GPT-5.2 model’s 62.2% — a 13-percentage-point leap in a single generation. One user on X noted that the score “absolutely demolished” Anthropic’s Opus 4.6, which reportedly achieved 65.4% on the same benchmark.

OpenAI also claims the model accomplishes these results with dramatically improved efficiency: less than half the tokens of its predecessor for equivalent tasks, plus more than 25% faster inference per token.

“Notably, GPT-5.3-Codex does so with fewer tokens than any prior model, letting users simply build more,” the company stated in its announcement.

From coding assistant to computer operator: GPT-5.3-Codex aims to automate the entire software development lifecycle

Perhaps more significant than the benchmark improvements is OpenAI’s positioning of GPT-5.3-Codex as a model that transcends pure coding. The company explicitly states that “Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.”

This expanded capability set includes debugging, deploying, monitoring, writing product requirement documents, editing copy, conducting user research, building slide decks, and analyzing data in spreadsheet applications. The model shows strong performance on GDPVal, an OpenAI evaluation released in 2025 that measures performance on well-specified knowledge-work tasks across 44 occupations.

The expansion signals OpenAI’s ambition to capture not just the developer tools market but the broader enterprise productivity software space — a market that includes established players like Microsoft, Salesforce, and ServiceNow, all of whom are racing to embed AI agents into their platforms.

OpenAI’s first ‘high capability’ cybersecurity model prompts new safety protocols and a $10 million defense fund

The pivot toward general-purpose computing brings new security considerations. In a notable disclosure, OpenAI revealed that GPT-5.3-Codex is the first model it classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and the first directly trained to identify software vulnerabilities.

“While we don’t have definitive evidence it can automate cyber attacks end-to-end, we’re taking a precautionary approach and deploying our most comprehensive cybersecurity safety stack to date,” the company stated. Mitigations include dual-use safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines incorporating threat intelligence.

Altman highlighted this development on X: “This is our first model that hits ‘high’ for cybersecurity on our preparedness framework. We are piloting a Trusted Access framework, and committing $10 million in API credits to accelerate cyber defense.”

The company is also expanding the private beta of Aardvark, its security research agent, and partnering with open-source maintainers to provide free codebase scanning for widely used projects. OpenAI cited Next.js as an example where a security researcher used Codex to discover vulnerabilities disclosed last week.

Super Bowl showdown: Sam Altman calls Anthropic’s advertising campaign ‘clearly dishonest’ as rivalry turns personal

The cybersecurity announcement, however, has been overshadowed by the increasingly personal nature of the OpenAI-Anthropic rivalry. The timing of Wednesday’s release cannot be understood without the context of OpenAI’s intensifying competition with Anthropic, the AI safety-focused startup founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei.

Both companies scheduled major product announcements for 10 a.m. Pacific Time today. Anthropic unveiled Claude Opus 4.6, which it describes as its “smartest model” that “plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.”

The head-to-head timing follows a week of escalating tensions. Anthropic announced it will air Super Bowl advertisements mocking OpenAI’s recent decision to begin testing ads within ChatGPT for free users. 

Altman responded with unusual directness, calling the advertisements “funny” but “clearly dishonest” in an extensive X post.

“We would obviously never run ads in the way Anthropic depicts them. We are not stupid and we know our users would reject that,” Altman wrote. “I guess it’s on brand for Anthropic doublespeak to use a deceptive ad to critique theoretical deceptive ads that aren’t real, but a Super Bowl ad is not where I would expect it.”

He went further, characterizing Anthropic as an “authoritarian company” that “wants to control what people do with AI.”

“Anthropic serves an expensive product to rich people,” Altman wrote. “More Texans use ChatGPT for free than total people use Claude in the US, so we have a differently-shaped problem than they do.”

Enterprise AI spending surges past projections as OpenAI’s market share faces pressure from Anthropic and Google

The public sparring masks a deadly serious business competition. The rivalry plays out against a backdrop of explosive enterprise AI adoption, where both companies are fighting for position in a rapidly expanding market.

According to survey data from Andreessen Horowitz released this week, enterprise spending on large language models has dramatically outpaced even bullish projections. Average enterprise LLM spending reached $7 million in 2025, 180% higher than 2024’s actual spending of $2.5 million — and 56% above what enterprises had projected for 2025 just a year earlier. Spending is projected to reach $11.6 million per enterprise in 2026, a further 65% increase.

The a16z data reveals shifting market dynamics that help explain the intensity of the competition. OpenAI maintains the largest average share of enterprise AI wallet, but that share is shrinking — from 62% in 2024 to a projected 53% in 2026. Anthropic’s share, meanwhile, has grown from 14% to a projected 18% over the same period, with Google showing similar gains.

Enterprise adoption patterns tell a more nuanced story. While OpenAI leads in overall usage, only 46% of surveyed OpenAI customers are using its most capable models in production, compared to 75% for Anthropic and 76% for Google. When including testing environments, 89% of Anthropic customers are testing or using the company’s most capable models — the highest rate among major providers.

For software development specifically — one of the primary use cases for both companies’ coding agents — the a16z survey shows OpenAI with approximately 35% market share, with Anthropic claiming a substantial and growing portion of the remainder.

Both AI labs race to become the enterprise operating system of choice, moving beyond models to full-stack platforms

These market dynamics explain why both companies are positioning themselves as platforms rather than mere model providers. OpenAI on Wednesday also launched Frontier, a new platform designed to serve as a comprehensive hub for businesses adopting a range of AI tools — including those developed by third parties — that can operate together seamlessly.

“We can be the partner of choice for AI transformation for enterprise. The sky is the limit in terms of revenue we can generate from a platform like that,” Fidji Simo, OpenAI’s CEO of applications, told reporters this week.

This follows Monday’s launch of the Codex desktop application for macOS, which OpenAI says has already surpassed 500,000 downloads. The app enables users to manage multiple AI coding agents simultaneously — a capability that becomes increasingly important as enterprises deploy agents for complex, long-running tasks.

Trillion-dollar compute obligations and $350 billion valuations reveal the massive financial stakes driving the AI coding race

The platform ambitions require extraordinary capital. The dueling launches underscore the staggering financial requirements of frontier AI development, with both companies burning through billions while racing to establish market dominance.

Anthropic is currently in discussions for a funding round that could bring in more than $20 billion at a valuation of at least $350 billion, according to Bloomberg, and is simultaneously planning an employee tender offer at that valuation.

OpenAI, meanwhile, has disclosed that it owes more than $1 trillion in financial obligations to backers — including Oracle, Microsoft, and Nvidia — that are essentially fronting compute costs in expectation of future returns.

GPT-5.3-Codex was “co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems,” according to OpenAI’s announcement—a reference to Nvidia’s latest Blackwell-generation AI supercomputing architecture.

The financial pressure adds urgency to both companies’ enterprise strategies. Unlike established tech giants with diversified revenue streams, both Anthropic and OpenAI must prove they can generate sufficient revenue from AI products to justify their extraordinary valuations and infrastructure costs.

OpenAI promises more Codex features in coming weeks as 500,000 users download the new desktop app

Looking ahead, OpenAI says GPT-5.3-Codex is available immediately for paid ChatGPT users across all Codex surfaces: the desktop app, command-line interface, IDE extensions, and web interface. API access is expected to follow.

The model includes a new interactivity feature: users can choose between “pragmatic” or “friendly” personalities — a customization Altman suggests users feel strongly about. More substantively, the model provides frequent progress updates during tasks, allowing users to interact in real time, ask questions, discuss approaches, and steer toward solutions without losing context.

“Instead of waiting for a final output, you can interact in real time,” OpenAI stated. “GPT-5.3-Codex talks through what it’s doing, responds to feedback, and keeps you in the loop from start to finish.”

The company promises more capabilities in the coming weeks, with Altman declaring: “I believe Codex is going to win.”

He concluded his response to Anthropic with a philosophical statement that frames the competition in stark terms: “This time belongs to the builders, not the people who want to control them.”

Whether that message resonates with enterprise customers — who according to a16z data cite trust, security, and compliance as their top concerns — remains to be seen. What’s clear is that the AI coding wars have begun in earnest, and neither company intends to cede ground.

Anthropic’s Claude Opus 4.6 brings 1M token context and ‘agent teams’ to take on OpenAI’s Codex

Anthropic on Thursday released Claude Opus 4.6, a major upgrade to its flagship artificial intelligence model that the company says plans more carefully, sustains longer autonomous workflows, and outperforms competitors including OpenAI’s GPT-5.2 on key enterprise benchmarks — a release that arrives at a tumultuous moment for the AI industry and global software markets.

The launch comes just three days after OpenAI released its own Codex desktop application in a direct challenge to Anthropic’s Claude Code momentum, and amid a $285 billion rout in software and services stocks that investors attribute partly to fears that Anthropic’s AI tools could disrupt established enterprise software businesses.

For the first time, Anthropic’s Opus-class models will feature a 1 million token context window, allowing the AI to process and reason across vastly more information than previous versions. The company also introduced “agent teams” in Claude Code — a research preview feature that enables multiple AI agents to work simultaneously on different aspects of a coding project, coordinating autonomously.

“We’re focused on building the most capable, reliable, and safe AI systems,” an Anthropic spokesperson told VentureBeat about the announcements. “Opus 4.6 is even better at planning, helping solve the most complex coding tasks. And the new agent teams feature means users can split work across multiple agents — one on the frontend, one on the API, one on the migration — each owning its piece and coordinating directly with the others.”

Why OpenAI and Anthropic are locked in an all-out war for enterprise developers

The release intensifies an already fierce competition between Anthropic and OpenAI, the two most valuable privately held AI companies in the world. OpenAI on Monday released a new desktop application for its Codex artificial intelligence coding system, a tool the company says transforms software development from a collaborative exercise with a single AI assistant into something more akin to managing a team of autonomous workers.

AI coding assistants have exploded in popularity over the last year, and OpenAI said more than 1 million developers have used Codex in the past month. The new Codex app is part of OpenAI’s ongoing effort to lure users and market share away from rivals like Anthropic and Cursor.

The timing of Anthropic’s release — just 72 hours after OpenAI’s Codex launch — underscores the breakneck pace of competition in AI development tools. OpenAI faces intensifying competition from Anthropic, which posted the largest share increase of any frontier lab since May 2025, according to a recent Andreessen Horowitz survey. Forty-four percent of enterprises now use Anthropic in production, driven by rapid capability gains in software development since late 2024. The desktop launch is a strategic counter to Claude Code’s momentum.

According to Anthropic’s announcement, Opus 4.6 achieves the highest score on Terminal-Bench 2.0, an agentic coding evaluation, and leads all other frontier models on Humanity’s Last Exam, a complex multi-discipline reasoning test. On GDPval-AA — a benchmark measuring performance on economically valuable knowledge work tasks in finance, legal and other domains — Opus 4.6 outperforms OpenAI’s GPT-5.2 by approximately 144 ELO points, which translates to obtaining a higher score approximately 70% of the time.

Inside Claude Code’s $1 billion revenue milestone and growing enterprise footprint

The stakes are substantial. Asked about Claude Code’s financial performance, the Anthropic spokesperson noted that in November, the company announced that Claude Code reached $1 billion in run rate revenue only six months after becoming generally available in May 2025.

The spokesperson highlighted major enterprise deployments: “Claude Code is used by Uber across teams like software engineering, data science, finance, and trust and safety; wall-to-wall deployment across Salesforce’s global engineering org; tens of thousands of devs at Accenture; and companies across industries like Spotify, Rakuten, Snowflake, Novo Nordisk, and Ramp.”

That enterprise traction has translated into skyrocketing valuations. Earlier this month, Anthropic signed a term sheet for a $10 billion funding round at a $350 billion valuation. Bloomberg reported that Anthropic is simultaneously working on a tender offer that would allow employees to sell shares at that valuation, offering liquidity to staffers who have watched the company’s worth multiply since its 2021 founding.

How Opus 4.6 solves the ‘context rot’ problem that has plagued AI models

One of Opus 4.6’s most significant technical improvements addresses what the AI industry calls “context rot“—the degradation of model performance as conversations grow longer. Anthropic says Opus 4.6 scores 76% on MRCR v2, a needle-in-a-haystack benchmark testing a model’s ability to retrieve information hidden in vast amounts of text, compared to just 18.5% for Sonnet 4.5.

“This is a qualitative shift in how much context a model can actually use while maintaining peak performance,” the company said in its announcement.

The model also supports outputs of up to 128,000 tokens — enough to complete substantial coding tasks or documents without breaking them into multiple requests.

For developers, Anthropic is introducing several new API features alongside the model: adaptive thinking, which allows Claude to decide when deeper reasoning would be helpful rather than requiring a binary on-off choice; four effort levels (low, medium, high, max) to control intelligence, speed and cost tradeoffs; and context compaction, a beta feature that automatically summarizes older context to enable longer-running tasks.

Anthropic’s delicate balancing act: Building powerful AI agents without losing control

Anthropic, which has built its brand around AI safety research, emphasized that Opus 4.6 maintains alignment with its predecessors despite its enhanced capabilities. On the company’s automated behavior audit measuring misaligned behaviors such as deception, sycophancy, and cooperation with misuse, Opus 4.6 “showed a low rate” of problematic responses while also achieving “the lowest rate of over-refusals — where the model fails to answer benign queries — of any recent Claude model.”

When asked how Anthropic thinks about safety guardrails as Claude becomes more agentic, particularly with multiple agents coordinating autonomously, the spokesperson pointed to the company’s published framework: “Agents have tremendous potential for positive impacts in work but it’s important that agents continue to be safe, reliable, and trustworthy. We outlined our framework for developing safe and trustworthy agents last year which shares core principles developers should consider when building agents.”

The company said it has developed six new cybersecurity probes to detect potentially harmful uses of the model’s enhanced capabilities, and is using Opus 4.6 to help find and patch vulnerabilities in open-source software as part of defensive cybersecurity efforts.

Sam Altman vs. Dario Amodei: The Super Bowl ad battle that exposed AI’s deepest divisions

The rivalry between Anthropic and OpenAI has spilled into consumer marketing in dramatic fashion. Both companies will feature prominently during Sunday’s Super Bowl. Anthropic is airing commercials that mock OpenAI’s decision to begin testing advertisements in ChatGPT, with the tagline: “Ads are coming to AI. But not to Claude.”

OpenAI CEO Sam Altman responded by calling the ads “funny” but “clearly dishonest,” posting on X that his company would “obviously never run ads in the way Anthropic depicts them” and that “Anthropic wants to control what people do with AI” while serving “an expensive product to rich people.”

The exchange highlights a fundamental strategic divergence: OpenAI has moved to monetize its massive free user base through advertising, while Anthropic has focused almost exclusively on enterprise sales and premium subscriptions.

The $285 billion stock selloff that revealed Wall Street’s AI anxiety

The launch occurs against a backdrop of historic market volatility in software stocks. A new AI automation tool from Anthropic PBC sparked a $285 billion rout in stocks across the software, financial services and asset management sectors on Tuesday as investors raced to dump shares with even the slightest exposure. A Goldman Sachs basket of US software stocks sank 6%, its biggest one-day decline since April’s tariff-fueled selloff.

The selloff was triggered by a new legal tool from Anthropic, which showed the AI industry’s growing push into industries that can unlock lucrative enterprise revenue needed to fund massive investments in the technology. One trigger for Tuesday’s selloff was Anthropic’s launch of plug-ins for its Claude Cowork agent on Friday, enabling automated tasks across legal, sales, marketing and data analysis.

Thomson Reuters plunged 15.83% Tuesday, its biggest single-day drop on record; and Legalzoom.com sank 19.68%. European legal software providers including RELX, owner of LexisNexis, and Wolters Kluwer experienced their worst single-day performances in decades.

Not everyone agrees the selloff is warranted. Nvidia CEO Jensen Huang said on Tuesday that fears AI would replace software and related tools were “illogical” and “time will prove itself.” Mark Murphy, head of U.S. enterprise software research at JPMorgan, said in a Reuters report it “feels like an illogical leap” to say a new plug-in from an LLM would “replace every layer of mission-critical enterprise software.”

What Claude’s new PowerPoint integration means for Microsoft’s AI strategy

Among the more notable product announcements: Anthropic is releasing Claude in PowerPoint in research preview, allowing users to create presentations using the same AI capabilities that power Claude’s document and spreadsheet work. The integration puts Claude directly inside a core Microsoft product — an unusual arrangement given Microsoft’s 27% stake in OpenAI.

The Anthropic spokesperson framed the move pragmatically in an interview with VentureBeat: “Microsoft has an official add-in marketplace for Office products with multiple add-ins available to help people with slide creation and iteration. Any developer can build a plugin for Excel or PowerPoint. We’re participating in that ecosystem to bring Claude into PowerPoint. This is about participating in the ecosystem and giving users the ability to work with the tools that they want, in the programs they want.”

The data behind enterprise AI adoption: Who’s winning and who’s losing ground

Data from a16z’s recent enterprise AI survey suggests both Anthropic and OpenAI face an increasingly competitive landscape. While OpenAI remains the most widely used AI provider in the enterprise, with approximately 77% of surveyed companies using it in production in January 2026, Anthropic’s adoption is rising rapidly — from near-zero in March 2024 to approximately 40% using it in production by January 2026.

The survey data also shows that 75% of Anthropic’s enterprise customers are using it in production, with 89% either testing or in production — figures that slightly exceed OpenAI’s 46% in production and 73% testing or in production rates among its customer base.

Enterprise spending on AI continues to accelerate. Average enterprise LLM spend reached $7 million in 2025, up 180% from $2.5 million in 2024, with projections suggesting $11.6 million in 2026 — a 65% increase year-over-year.

Pricing, availability, and what developers need to know about Claude Opus 4.6

Opus 4.6 is available immediately on claude.ai, the Claude API, and major cloud platforms. Developers can access it via claude-opus-4-6 through the API. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, with premium pricing of $10/$37.50 for prompts exceeding 200,000 tokens using the 1 million token context window.

For users who find Opus 4.6 “overthinking” simpler tasks — a characteristic Anthropic acknowledges can add cost and latency — the company recommends adjusting the effort parameter from its default high setting to medium.

The recommendation captures something essential about where the AI industry now stands. These models have grown so capable that their creators must now teach customers how to make them think less. Whether that represents a breakthrough or a warning sign depends entirely on which side of the disruption you’re standing on — and whether you remembered to sell your software stocks before Tuesday.

AI for transformation: How SAP’s Joule for Consultants reimagines project delivery

Presented by SAP


SAP’s AI solution, Joule, has already transformed how business users work — turning siloed data and tasks into intelligent, connected workflows. But consultants on SAP projects face a different set of needs: navigating complex implementations, evolving best practices, and the pressure to deliver faster. They need timely, expert-level guidance. They need an AI partner that can deliver accurate SAP knowledge in an instant. Enter SAP Joule for Consultants, purpose-built to help system integrators and consulting teams drive smarter, faster outcomes for their clients.

“The consultant’s focus is not using our business applications — it’s to help implement those systems, perform upgrades, and at the largest scale, help our customers transform their on-premises SAP ERP to SAP Business Suite in the cloud,” says Victor Alvarez, head of product marketing, Joule, at SAP. “Acting as a knowledge-grounded AI teammate, SAP Joule for Consultants delivers trusted answers at critical moments, guides design decisions, and keeps projects aligned with SAP’s latest best practices.”

It can both compress timelines and improve project execution at every stage of delivery, resulting in a higher quality of service, he adds.

“Our customers want to complete their SAP projects as quickly as possible to realize the value of those initiatives sooner,” Alvarez says. “SAP Joule for Consultants puts the most accurate and up-to-date information at the fingertips of the consultants and system integrators performing these projects. It helps consultants decide and act with agility, reduce overall costs, and shorten project timelines without sacrificing quality.”

Building a better AI teammate

AI is no longer an option in consulting. It’s reshaping how projects are designed, delivered, and scaled. For SAP consulting practices, this shift is especially pronounced as SAP cloud transformation and implementation projects demand deep knowledge and expertise. AI can help surface knowledge in seconds, making consultants more productive and improving project outcomes. As a result, practice leaders are not considering whether to use AI, but which AI solution to put at the core of their delivery model.

What sets SAP Joule for Consultants apart is its underlying knowledge base. Any AI solution can generate an answer. What matters most is if the answer is accurate and can be trusted to guide project execution. SAP Joule for Consultants is grounded in SAP’s most authoritative, comprehensive, and up-to-date knowledge base. The knowledge base is expertly curated by SAP and includes exclusive content not available to third-party solutions or AI-enabled search engines.

“We recognized an opportunity to help our partners transform their consulting practices with the most reliable AI assistance available,” Alvarez says. “We’ve leveraged our expertise and vast set of non-public customer enablement content to ground SAP Joule for Consultants with the most complete, accurate, and trustworthy knowledge.”

For SAP consultants at every level

Consultants at all experience levels have traditionally spent enormous time searching for information before they can take the next step — whether it’s making an architectural decision that shapes an entire implementation or solving a specific, tactical issue.

All that research time is slashed to an extraordinary degree with SAP Joule for Consultants. By delivering expert-level answers drawn from a continually updated knowledge base, the solution helps consultants keep work moving and make smarter decisions that minimize rework.

“If you think about cloud transformation, there’s a huge return on investment,” Alvarez explains. “For SAP customers, this ROI goes beyond traditional cloud benefits to also unlock the AI innovations available in SAP Business Suite. SAP Joule for Consultants is part of our commitment to help our customers reduce time to value and achieve AI impact at scale.”

Enabling SAP customers too

The solution isn’t just for consultants; it’s also for the IT departments within SAP customers. Organizations don’t have the luxury of engaging a consultant for every SAP project — sometimes the work needs to, or can be, done in-house. So, in the same way that SAP Joule for Consultants enables consultants with expert-level knowledge, internal IT teams can benefit from on-demand answers and guidance to accelerate internally staffed projects.

“They can use SAP Joule for Consultants to access the knowledge needed to take on more complex projects too,” Alvarez says. “That opens up more opportunity to engage system integrators on the more strategic and impactful projects.”

IT teams also benefit from elevated collaboration with their consulting partners. SAP Joule for Consultants helps bridge knowledge gaps that traditionally limit how much internal staff can contribute to and influence project outcomes.

“SAP Joule for Consultants makes customers more informed, so that they can collaborate with their consultant partners in a more impactful way,” he explains. “It illuminates the nuances of a project, helping clients ask the right questions and participate more actively in critical project decisions to ensure the best possible project outcomes.”

Powerful new features coming soon

SAP Joule for Consultants has been generally available for almost one year, and the demand from both large firms like KPMG and smaller organizations has been tremendous, Alvarez says — a strong validation of its value proposition. The next major milestone is already on the horizon: partner knowledge integration.

“Our system integrator partners have institutional knowledge that embodies their industry experience and the best practices that distinguish their services and create predictable project outcomes,” he explains. “Soon, they’ll be able to add that institutional knowledge to their instance of SAP Joule for Consultants — creating a single, unified solution that puts both SAP and firm-specific knowledge at their consultants’ fingertips.”


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

OpenAI launches centralized agent platform as enterprises push for multi-vendor flexibility

OpenAI launched Frontier, a platform for building and governing enterprise AI agents, as companies increasingly question whether to commit to single-vendor systems or maintain multi-model flexibility.

The platform offers integrated tools for agent execution, evaluation, and governance in one place. But Frontier also reflects OpenAI’s push into enterprise AI at a moment when organizations are actively moving toward multi-vendor architectures — creating tension between OpenAI’s centralized approach and what enterprises say they want.

Tatyana Mamut, CEO of the agent observability company Wayfound, told VentureBeat that enterprises don’t want to be locked into a single vendor or platform because AI strategies are ever-evolving. 

“They’re not ready to fully commit. Everybody I talk to knows that eventually they’ll move to a one-size-fits-all solution, but right now, things are moving too fast for us to commit,” Mamut said. “This is the reason why most AI contracts are not traditional SaaS contracts; nobody is signing multi-year contracts anymore because if something great comes out next month, I need to be able to pivot, and I can’t be locked in.”

How Frontier compares to AWS Bedrock

OpenAI is not the first to offer an end-to-end platform for building, prototyping, testing, deploying, and monitoring agents. AWS launched Bedrock AgentCore with the idea that there will be enterprise customers who don’t want to assemble an extensive collection of tools and platforms for their agentic AI projects. 

However, AWS offers a significant advantage: access to multiple LLMs for building agents. Enterprises can choose a hybrid system in which an agent selects the best LLM for each task. OpenAI has not made it clear if it will open Frontier to models and tools from other vendors.

OpenAI did not say whether Frontier users can bring any third-party tools they already use to the platform, and it didn’t comment on why it chose to release Frontier now when enterprises are considering more hybrid systems.

But the company is working with companies including Clay, Abridge, Harvey, Decagon, Ambience, and Sierra to design solutions within Frontier. 

What is Frontier

Frontier is a single platform that offers access to different enterprise-grade tools from OpenAI. The company told VentureBeat that Frontier will not replace offerings such as the Agents SDK, AgentKit, or its suite of APIs. 

OpenAI said Frontier helps bring context, agent execution, and evaluation into a single platform rather than multiple systems and tools.

“Frontier gives agents the same skills people need to succeed at work: shared context, onboarding, hands-on learning with feedback, and clear permissions and boundaries. That’s how teams move beyond isolated use cases to AI co-workers that work across the business,” OpenAI said in a blog post.

Users can connect their data sources, CRM tools, and other internal applications directly to Frontier, effectively creating a semantic layer that normalizes permissions and retrieval logic for agents built on the platform to pull information from. Frontier has an agent executive environment, which can run on local environments, cloud infrastructures, or “OpenAI-hosted runtimes without forcing teams to reinvent how work gets done.”

Built-in evaluation structures, security, and governance dashboards allow teams to monitor agent behavior and performance. These give organizations visibility into their agents’ success rates, accuracy, and latency. OpenAI said Frontier incorporates its enterprise-grade data security layer, including the option for companies to choose where to store their data at rest.

Frontier launched with a small group of initial customers, including HP, Intuit, Oracle, State Farm, Thermo Fisher, and Uber.

Security and governance concerns

Frontier is available only to a select group of customers with wider availability coming soon. Enterprise providers are already weighing what the platform needs to address.

Ellen Boehm, senior vice president for IoT and AI Identity Innovation at Keyfactor, told VentureBeat that companies will still need to focus their agents on security and identity. 

“Agent platforms like OpenAI’s Frontier model are critical for democratizing AI adoption beyond the enterprise,” she said. “This levels the playing field — startups get enterprise-grade capabilities without enterprise-scale infrastructure, which means more innovation and healthier competition across the market. But accessible doesn’t mean you skip the fundamentals.” 

Salesforce AI executive vice president and GM Madhav Thattai, who is overseeing an agent builder and library platform at his company, noted that no matter the platform, enterprises need to focus agents on value.

“What we’re finding is that to build an agent that actually does something at scale that creates real ROI is pretty challenging,” Thattai said. “The true business value for enterprises doesn’t reside in the AI model alone — it’s in the ‘last mile.'”

“That is the software layer that translates raw technology into trusted, autonomous execution. To traverse this last mile, agents must be able to reason through complexity and operate on trusted business data, which is exactly where we are focusing.” 

The ‘brownie recipe problem’: why LLMs must have fine-grained context to deliver real-time results

Today’s LLMs excel at reasoning, but can still struggle with context. This is particularly true in real-time ordering systems like Instacart

Instacart CTO Anirban Kundu calls it the “brownie recipe problem.”

It’s not as simple as telling an LLM ‘I want to make brownies.’ To be truly assistive when planning the meal, the model must go beyond that simple directive to understand what’s available in the user’s market based on their preferences — say, organic eggs versus regular eggs — and factor that into what’s deliverable in their geography so food doesn’t spoil. This among other critical factors. 

For Instacart, the challenge is juggling latency with the right mix of context to provide experiences in, ideally, less than one second’s time. 

“If reasoning itself takes 15 seconds, and if every interaction is that slow, you’re gonna lose the user,” Kundu said at a recent VB event. 

Mixing reasoning, real-world state, personalization

In grocery delivery, there’s a “world of reasoning” and a “world of state” (what’s available in the real world), Bose noted, both of which must be understood by an LLM along with user preference. But it’s not as simple as loading the entirety of a user’s purchase history and known interests into a reasoning model. 

“Your LLM is gonna blow up into a size that will be unmanageable,” said Kundu. 

To get around this, Instacart splits processing into chunks. First, data is fed into a large foundational model that can understand intent and categorize products. That processed data is then routed to small language models (SLMs) designed for catalog context (the types of food or other items that work together) and semantic understanding. 

In the case of catalog context, the SLM must be able to process multiple levels of details around the order itself as well as the different products. For instance, what products go together and what are their relevant replacements if the first choice isn’t in stock? These substitutions are “very, very important” for a company like Instacart, which Kundu said has “over double digit cases” where a product isn’t available in a local market. 

In terms of semantic understanding, say a shopper is looking to buy healthy snacks for children. The model needs to understand what a healthy snack is and what foods are appropriate for, and appeal to, an 8 year old, then identify relevant products. And, when those particular products aren’t available in a given market, the model has to also find related subsets of products. 

Then there’s the logistical element. For example, a product like ice cream melts quickly, and frozen vegetables also don’t fare well when left out in warmer temperatures. The model must have this context and calculate an acceptable deliverability time. 

“So you have this intent understanding, you have this categorization, then you have this other portion about logistically, how do you do it?”, Kundu noted.

Avoiding ‘monolithic’ agent systems

Like many other companies, Instacart is experimenting with AI agents, finding that a mix of agents works better than a “single monolith” that does multiple different tasks. The Unix philosophy of a modular operating system with smaller, focused tools helps address different payment systems, for instance, that have varying failure modes, Kundu explained. 

“Having to build all of that within a single environment was very unwieldy,” he said. Further, agents on the back end talk to many third-party platforms, including point-of-sale (POS) and catalog systems. Naturally, not all of them behave the same way; some are more reliable than others, and they have different update intervals and feeds. 

“So being able to handle all of those things, we’ve gone down this route of microagents rather than agents that are dominantly large in nature,” said Kundu. 

To manage agents, Instacart has integrated with OpenAI’s model context protocol (MCP), which standardizes and simplifies the process of connecting AI models to different tools and data sources.

The company also uses Google’s Universal Commerce Protocol (UCP) open standard, which allows AI agents to directly interact with merchant systems. 

However, Kundu’s team still deals with challenges. As he noted, it’s not about whether integration is possible, but how reliably those integrations behave and how well they’re understood by users. Discovery can be difficult, not just in identifying available services, but understanding which ones are appropriate for which task.

Instacart has had to implement MCP and UCP in “very different” cases, and the biggest problems they’ve run into are failure modes and latency, Kundu noted. “The response times and understandings of both of those services are very, very different I would say we spend probably two thirds of the time fixing those error cases.” 

Mistral drops Voxtral Transcribe 2, an open-source speech model that runs on-device for pennies

Mistral AI, the Paris-based startup positioning itself as Europe’s answer to OpenAI, released a pair of speech-to-text models on Wednesday that the company says can transcribe audio faster, more accurately, and far more cheaply than anything else on the market — all while running entirely on a smartphone or laptop.

The announcement marks the latest salvo in an increasingly competitive battle over voice AI, a technology that enterprise customers see as essential for everything from automated customer service to real-time translation. But unlike offerings from American tech giants, Mistral’s new Voxtral Transcribe 2 models are designed to process sensitive audio without ever transmitting it to remote servers — a feature that could prove decisive for companies in regulated industries like healthcare, finance, and defense.

“You’d like your voice and the transcription of your voice to stay close to where you are, meaning you want it to happen on device—on a laptop, a phone, or a smartwatch,” Pierre Stock, Mistral’s vice president of science operations, said in an interview with VentureBeat. “We make that possible because the model is only 4 billion parameters. It’s small enough to fit almost anywhere.”

Mistral splits its new AI transcription technology into batch processing and real-time applications

Mistral released two distinct models under the Voxtral Transcribe 2 banner, each engineered for different use cases.

  • Voxtral Mini Transcribe V2 handles batch transcription, processing pre-recorded audio files in bulk. The company says it achieves the lowest word error rate of any transcription service and is available via API at $0.003 per minute, roughly one-fifth the price of major competitors. The model supports 13 languages, including English, Mandarin Chinese, Japanese, Arabic, Hindi, and several European languages.

  • Voxtral Realtime, as its name suggests, processes live audio with a latency that can be configured down to 200 milliseconds — the blink of an eye. Mistral claims this is a breakthrough for applications where even a two-second delay proves unacceptable: live subtitling, voice agents, and real-time customer service augmentation.

The Realtime model ships under an Apache 2.0 open-source license, meaning developers can download the model weights from Hugging Face, modify them, and deploy them without paying Mistral a licensing fee. For companies that prefer not to run their own infrastructure, API access costs $0.006 per minute.

Stock said Mistral is betting on the open-source community to expand the model’s reach. “The open-source community is very imaginative when it comes to applications,” he said. “We’re excited to see what they’re going to do.”

Why on-device AI processing matters for enterprises handling sensitive data

The decision to engineer models small enough to run locally reflects a calculation about where the enterprise market is heading. As companies integrate AI into ever more sensitive workflows — transcribing medical consultations, financial advisory calls, legal depositions — the question of where that data travels has become a dealbreaker.

Stock painted a vivid picture of the problem during his interview. Current note-taking applications with audio capabilities, he explained, often pick up ambient noise in problematic ways: “It might pick up the lyrics of the music in the background. It might pick up another conversation. It might hallucinate from a background noise.”

Mistral invested heavily in training data curation and model architecture to address these issues. “All of that, we spend a lot of time ironing out the data and the way we train the model to robustify it,” Stock said.

The company also added enterprise-specific features that its American competitors have been slower to implement. Context biasing allows customers to upload a list of specialized terminology — medical jargon, proprietary product names, industry acronyms — and the model will automatically favor those terms when transcribing ambiguous audio. Unlike fine-tuning, which requires retraining the model, context biasing works through a simple API parameter.

“You only need a text list,” Stock explained. “And then the model will automatically bias the transcription toward these acronyms or these weird words. And it’s zero shots, no need for retraining, no need for weird stuff.”

From factory floors to call centers, Mistral targets high-noise industrial environments

Stock described two scenarios that capture how Mistral envisions the technology being deployed.

The first involves industrial auditing. Imagine technicians walking through a manufacturing facility, inspecting heavy machinery while shouting observations over the din of factory noise. “In the end, imagine like a perfect timestamped notes identifying who said what — so diarization — while being super robust,” Stock said. The challenge is handling what he called “weird technical language that no one is able to spell except these people.”

The second scenario targets customer service operations. When a caller contacts a support center, Voxtral Realtime can transcribe the conversation in real time, feeding text to backend systems that pull up relevant customer records before the caller finishes explaining the problem.

“The status will appear for the operator on the screen before the customer stops the sentence and stops complaining,” Stock explained. “Which means you can just interact and say, ‘Okay, I can see the status. Let me correct the address and send back the shipment.'”

He estimated this could reduce typical customer service interactions from multiple back-and-forth exchanges to just two interactions: the customer explains the problem, and the agent resolves it immediately.

Real-time translation across languages could arrive by the end of 2026

For all the focus on transcription, Stock made clear that Mistral views these models as foundational technology for a more ambitious goal: real-time speech-to-speech translation that feels natural.

“Maybe the end goal application and what the model is laying the groundwork for is live translation,” he said. “I speak French, you speak English. It’s key to have minimal latency, because otherwise you don’t build empathy. Your face is not out of sync with what you said one second ago.”

That goal puts Mistral in direct competition with Apple and Google, both of which have been racing to solve the same problem. Google’s latest translation model operates at a two-second delay — ten times slower than what Mistral claims for Voxtral Realtime.

Mistral positions itself as the privacy-first alternative for enterprise customers

Mistral occupies an unusual position in the AI landscape. Founded in 2023 by alumni of Meta and Google DeepMind, the company has raised over $2 billion and now carries a valuation of approximately $13.6 billion. Yet it operates with a fraction of the compute resources available to American hyperscalers — and has built its strategy around efficiency rather than brute force.

“The models we release are enterprise grade, industry leading, efficient — in particular, in terms of cost — can be embedded into the edge, unlocks privacy, unlocks control, transparency,” Stock said.

That approach has resonated particularly with European customers wary of dependence on American technology. In January, France’s Ministry of the Armed Forces signed a framework agreement giving the country’s military access to Mistral’s AI models—a deal that explicitly requires deployment on French-controlled infrastructure.

“I think a big barrier to adoption of voice AI is that, hey, if you’re in a sensitive industry like finance or in manufacturing or healthcare or insurance, you can’t have information you’re talking about just go to the cloud,” Howard Cohen, who participated in the interview alongside Stock, noted. “It needs to be either on device or needs to be on your premise.”

Mistral faces stiff competition from OpenAI, Google, and a rising China

The transcription market has grown fiercely competitive. OpenAI’s Whisper model has become something of an industry standard, available both through API and as downloadable open-source weights. Google, Amazon, and Microsoft all offer enterprise-grade speech services. Specialized players like Assembly AI and Deepgram have built substantial businesses serving developers who need reliable, scalable transcription.

Mistral claims its new models outperform all of them on accuracy benchmarks while undercutting them on price. “We are better than them on the benchmarks,” Stock said. Independent verification of those claims will take time, but the company points to performance on FLEURS, a widely used multilingual speech benchmark, where Voxtral models achieve word error rates competitive with or superior to alternatives from OpenAI and Google.

Perhaps more significantly, Mistral’s CEO Arthur Mensch has warned that American AI companies face pressure from an unexpected direction. Speaking at the World Economic Forum in Davos last month, Mensch dismissed the notion that Chinese AI lags behind the West as “a fairy tale.”

“The capabilities of China’s open-source technology is probably stressing the CEOs in the US,” he said.

The French startup bets that trust will determine the winner in enterprise voice AI

Stock predicted that 2026 would be “the year of note-taking” — the moment when AI transcription becomes reliable enough that users trust it completely.

“You need to trust the model, and the model basically cannot make any mistake, otherwise you would just lose trust in the product and stop using it,” he said. “The threshold is super, super hard.”

Whether Mistral has crossed that threshold remains to be seen. Enterprise customers will be the ultimate judges, and they tend to move slowly, testing claims against reality before committing budgets and workflows to new technology. The audio playground in Mistral Studio, where developers can test Voxtral Transcribe 2 with their own files, went live today.

But Stock’s broader argument deserves attention. In a market where American giants compete by throwing billions of dollars at ever-larger models, Mistral is making a different wager: that in the age of AI, smaller and local might beat bigger and distant. For the executives who spend their days worrying about data sovereignty, regulatory compliance, and vendor lock-in, that pitch may prove more compelling than any benchmark.

The race to dominate enterprise voice AI is no longer just about who builds the most powerful model. It’s about who builds the model you’re willing to let listen.