admin.codes » Category

Gong launches ‘Mission Andromeda’ with AI sales coaching, chatbot and open MCP connections to rivals

Gong, the revenue intelligence company that has spent a decade turning recorded sales calls into data, today launched what it calls Mission Andromeda — its most ambitious platform release to date, bundling a new AI-powered coaching product, a sales-focused chatbot, unified account management tools, and open interoperability with rival AI systems through the Model Context Protocol.

The release arrives at a pivotal moment. The revenue technology market is consolidating at a pace that would have been unthinkable two years ago, and Gong — still a private company with roughly $300 million in annual recurring revenue — finds itself at the center of a category that Gartner only formally defined three months ago. Mission Andromeda is Gong’s answer to a basic question facing every enterprise AI vendor in 2026: Can you move beyond surfacing insights and actually change how people work?

“The whole show, Andromeda, is basically a collection of very significant capabilities that take us a huge step forward,” Eilon Reshef, Gong’s co-founder and chief product officer, told VentureBeat in an interview ahead of the launch. He described it as an effort to make revenue teams “more productive as individuals” and to give leaders “better decisions” — positioning the release not as a feature dump, but as an operating system upgrade.

The new products target every layer of the sales workflow, from coaching to account management

Mission Andromeda contains four main components, each targeting a different layer of the sales workflow.

The headliner is Gong Enable, a brand-new product with its own pricing tier — Reshef described it as “in the tens of dollars per seat per month” — that attacks what the company sees as a gaping hole in most sales organizations: the disconnect between training and performance. Highspot and Seismic announced their intent to merge in February 2026, creating a combined enablement giant, and Gong is now moving directly onto their turf.

Gong Enable has three pieces. The first, AI Call Reviewer, analyzes completed customer calls and grades reps based on their organization’s own methodology. When asked whether this operates in real time, Reshef was direct: “For that particular agent, it’s post-call, because obviously you want to grade the whole call as a whole — maybe you didn’t do anything in minute one, minute 30.” The second piece, AI Trainer, lets reps practice high-stakes conversations — pricing objections, renewal risk scenarios — against AI-generated simulations built from the company’s own winning call patterns. The third, Initiative Tracking, links coaching programs to revenue metrics so leaders can see whether new behaviors actually show up in live deals.

Beyond Enable, the launch includes Gong Assistant, a conversational AI chatbot purpose-built for revenue teams that lets users ask questions about customer calls inside the platform. The release also introduces Account Console and Account Boards, which unify customer activity, risk signals, and next steps into a single view for sales and post-sales teams. And rounding out the package is built-in support for the Model Context Protocol, the open standard originally developed by Anthropic, enabling Gong to exchange data with AI systems from Microsoft, Salesforce, HubSpot, and others.

Gong uses four different LLM providers and says the real moat is the data, not the models

In a market where every company wants to claim proprietary AI supremacy, Reshef described a notably pragmatic approach to the models powering the new features. Gong uses both internal models and foundation models from external providers, he said, noting that “four out of the five leading AI companies, LLM, are basically Gong customers.”

The company picks models task by task. “Based on the product or task at hand, we pick the right model,” he said. “We would sometimes swap in and out a model if we feel it’s best for our customers and they get more and more power.” Reshef drew a clear line between what needs a large language model and what does not: “Our revenue prediction models are not using LLMs, but kind of the core interaction chatbots — of course, you’re going to use the foundation model.”

This approach contrasts with competitors that have hitched their wagon to a single AI provider. It also reflects a philosophical choice: Gong’s real moat, Reshef suggested, is not the models themselves but the data underneath — what the company calls the Revenue Graph, its proprietary layer that captures phone calls, Zoom meetings, emails, text messages, WhatsApp conversations, and more, stitching them together into a connected intelligence layer.

Recording every sales conversation raises obvious privacy questions, and Gong says it has spent a decade answering them

Storing and analyzing every customer conversation a sales team has raises obvious questions about privacy and data governance. Reshef was eager to address them head-on.

“We’ve been around the block for a long while — a little bit over a decade — with AI first,” he said. “Over the years, we’ve developed exactly those capabilities that are the most boring pieces of AI, which is: how do you collect the right data? How do you manage it? How do you manage permissions about it, retention policies, right to be forgotten?”

On the sensitive question of whether Gong trains its AI on customer data across accounts, Reshef drew a firm boundary. Training, he explained, happens per customer: “The majority of the training happens based on each customer’s data.” He pointed to large accounts like Cisco, which he said has 20,000 Gong users — enough data to train the AI Trainer from within their own environment. “AI Trainer can go mine what’s working in their environment. It might not work in their competitor’s environment — maybe their benefits are different, their objections are different.”

Cross-customer training, he said, happens “only in very, very rare cases, very safe based — like transcription. But we don’t do it for business-specific processes.”

MCP gives Gong an open door to rival platforms, but security remains an unsolved problem across the industry

Gong’s support for Model Context Protocol is perhaps the most strategically significant piece of the launch. The company now offers built-in client and server support for MCP, enabling organizations to connect Gong with other AI systems while maintaining clear controls over data access, usage, and provenance. Gong first announced MCP support in October 2025 at its Celebrate conference, where it revealed initial integrations with Microsoft Dynamics 365, Microsoft 365 Copilot, Salesforce Agentforce, and HubSpot CRM. Today’s launch builds on that foundation.

But Reshef did not sugarcoat MCP’s limitations. “MCP is very immature when it comes to security,” he told VentureBeat. The protocol lets enterprise AI systems share data and context, but trust remains the enterprise’s responsibility. He explained a two-sided model: Gong can pull data from partners like Zendesk through certified integrations, and simultaneously makes its own MCP server available so that tools like Microsoft Copilot can query Gong’s data. “It’s up to the company which connections they actually feel are secure enough,” he said. “The safest ones are the ones that we’ve kind of like certified in a way. But MCP is an open protocol. They can connect it to their own systems. We have no control over this.”

That candor matters. As MCP adoption accelerates across the enterprise software stack, security teams are scrambling to understand what happens when agentic AI systems start talking to each other without humans in the loop. Gong appears to be betting that transparency about the protocol’s immaturity will build more trust than marketing bravado.

Early customers report faster ramp times and higher win rates, but the newest features are still days old

When asked for hard numbers, Reshef offered a mix of platform-wide results and measured candor about the newest features. Existing Gong customers report roughly a 50 percent reduction in sales rep ramp time and 10 to 15 percent improvements in win rates, he said.

But on Gong Enable specifically, he acknowledged the product is still brand new. “The trainer has been in the market for literally, you know, days, a week,” he said. “I would probably lie to you if I said, ‘Hey, we’re already seeing people crushing it after taking three or four courses.'” For the earlier version of Enable that includes the AI Call Reviewer, however, he said customers are “definitely seeing a very high kind of skill improvement” and are attributing increases in win rates and quota attainment to those gains — though he conceded that “it’s always hard to do 100 percent attribution.”

Morningstar, one of Gong’s early adopters, offered a pre-launch endorsement. Rae Cheney, Director of Sales Enablement Technology at Morningstar, said in a statement that Gong Enable helped the firm “spend less time on status updates and more time on the work that actually moves deals.”

Reshef insists AI still needs a human operator, putting Gong at odds with the autonomous agent hype

One of the more interesting threads in Reshef’s remarks concerned his view of AI autonomy — or rather, its limits. He pushed back on what he called a “common misperception about AI” — that it operates completely autonomously.

“There has to be a person in the middle, which I call operator,” he said. “It could be RevOps. It could be enablement. In the case of training, it could be analysts. Sometimes it could be even business leaders.” Those operators, he argued, are responsible for a “repeatable process of AI doing something, measuring the AI” and adjusting over time.

This philosophy extends to the AI Call Reviewer’s feedback. Gong does not dictate what the system trains on — enablement leaders choose. “We don’t decide what they want to train on. We let them choose,” Reshef said. “You iterate, you optimize, you see how it goes, and there has to be somebody in the organization who’s responsible for making sure this aligns with the business needs.”

That stance puts Gong at odds with the more aggressive “autonomous agent” rhetoric emerging from some competitors, and it may resonate with enterprise buyers who remain cautious about letting AI run unsupervised in revenue-critical workflows.

A wave of mega-mergers is reshaping the revenue AI market, and Gong is racing to stay ahead of the combined giants

Mission Andromeda does not exist in a vacuum. The revenue AI landscape has been reshaped by a remarkable wave of consolidation over the past six months.

In a category-defining move, Clari and Salesloft merged in December 2025 to form what they called a “Revenue AI powerhouse,” combining roughly $450 million in ARR under new CEO Steve Cox. Just two weeks ago, Highspot and Seismic signed a definitive agreement to merge, creating a combined entity worth more than $6 billion focused on AI-powered sales enablement — the very same territory Gong is now invading with Enable.

Meanwhile, Gong was named a Leader in the inaugural 2025 Gartner Magic Quadrant for Revenue Action Orchestration, published in December. The company placed highest among the 12 vendors evaluated on both the “Ability to Execute” and “Completeness of Vision” axes and ranked first in all four evaluated use cases in Gartner’s companion Critical Capabilities report.

In his interview, Reshef did not name competitors directly, but he drew a clear contrast. “We’ve built a product from the ground up. It’s all organic,” he said. “All of the other players in the field have sort of stitched together tools. And obviously you can’t just get it to be a coherent product if you just stitch together tools. Some of them even have multiple logins.” That is a thinly veiled shot at the merged Clari-Salesloft entity, which Forrester has described as presenting a “bifurcated approach” — Salesloft serving frontline users while Clari supports management insights.

Reshef also pointed to growth as a competitive weapon. “We’re growing at the top deck side in terms of SaaS companies,” he said, adding that Gong hired roughly 200 R&D employees this year and plans to hire another 200. “It’s kind of a flywheel where we can invest more in R&D, we make the product better, we get more capabilities, more flexibility, more enterprise customers.”

Gong declines to discuss IPO timing, but its structured launch cadence tells a story of its own

Any discussion of Gong’s trajectory inevitably raises the question of a public offering. When asked directly, Reshef declined to comment: “I wouldn’t comment on IPO at this stage. No.”

The company has been on a clear growth arc. Gong has raised approximately $584 million to date, with its last official funding round valuing it at $7.25 billion — the culmination of a series of rapid jumps from $750 million in 2019 to $2.2 billion in 2020. The company reached an annual sales run rate of approximately $300 million in January 2025, driven largely by the adoption of AI, according to Calcalist.

But that valuation has since slipped. As Calcalist reported in November 2025, Gong is conducting a secondary round for company employees and investors at a valuation of roughly $4.5 billion — well below its 2021 peak. The offering is being conducted through Nasdaq’s private market platform and was in advanced stages at the time of the report. It is not yet clear whether the company has repriced employee options, some of which were issued at significantly higher valuations than the current secondary round. Gong told Calcalist that it “regularly receives inquiries from potential investors” but as a private company does not “engage in speculation.”

The structured quarterly launch cadence that Mission Andromeda inaugurates — complete with galactic naming conventions and coordinated product narratives — certainly resembles the kind of predictable, story-driven approach that public market investors reward. Reshef framed it differently: “We felt like having quarterly launches with a name, a mission, and a story around it makes it easier to work… It’s a good way to educate the market on a regular basis.”

Gong’s 50 percent productivity target reveals where it thinks the future of sales is heading

Reshef’s most revealing comment came when he laid out the company’s long-term thesis: Gong aims to increase productivity for revenue professionals by 50 percent. “We’re not there yet,” he admitted. “I think we’re like at 20 to 30 — whatever, hard to measure.”

He broke the productivity gain into two categories. The first is making high-complexity human tasks — like conducting a live Zoom sales call — better, through coaching, training, and review. “I think there’s going to be a long while, if ever, that Zoom conversations are going to get replaced by bots,” he said. The second is automating the manual drudgery: call preparation, post-meeting summaries, follow-up emails, account research briefs.

The distinction matters because it frames Gong’s ambition not as replacing salespeople but as making them dramatically more effective — a message calibrated to appeal to the thousands of revenue leaders who control Gong’s buying decisions. Whether that thesis holds will depend on whether Gong Enable, the AI Trainer, and the rest of Mission Andromeda can deliver measurable gains in a market that has been burned before by tools that promise insight but struggle to change behavior.

Gong currently serves more than 5,000 companies worldwide. The Clari-Salesloft merger has produced a rival with deeper combined resources. The Highspot-Seismic combination is assembling a sales enablement colossus. And a new Gartner category means every enterprise buyer now has a framework for comparison shopping. The next twelve months will test whether Mission Andromeda is the release that cements Gong’s position at the center of the revenue AI category — or the last big swing before the consolidated giants close in.

“Our mission is to be at the forefront,” Reshef said. “If everybody else is doing 20 percent, we’re going to do 50. If everybody is going to do 50, we’re going to do 80.”

In the revenue AI wars, that kind of confidence is easy to project. Delivering on it, with brand-new products still days old and a market being remade around you in real time, is something else entirely.

Data, Orchestration, technology

Anthropic says Claude Code transformed programming. Now Claude Cowork is coming for the rest of the enterprise.

Anthropic opened its virtual “Briefing: Enterprise Agents” event on Tuesday with a provocation. Kate Jensen, the company’s head of Americas, told viewers that the hype around enterprise AI agents in 2025 “turned out to be mostly premature,” with many pilots failing to reach production. “It wasn’t a failure of effort, it was a failure of approach, and it’s something we heard directly from our customers,” Jensen said.

The implicit promise: Anthropic has figured out the right approach, and it starts with the playbook that made Claude Code one of the most consequential developer tools of the past year. “In 2025 Claude transformed how developers work, and in 2026 it will do the same for knowledge work,” Jensen said. “The magic behind Claude Code is simple. When you can delegate hard challenges, you can focus on the work that actually matters. Cowork brings that same power to knowledge workers.”

That framing is central to understanding what Anthropic announced on Tuesday. The company rolled out a sweeping set of enterprise capabilities for Claude Cowork, the AI productivity platform it first released in research preview in January. Scott White, head of product for Claude Enterprise, described the ambition plainly during the keynote: “Cowork makes it possible for Claude to deliver polished, near final work. It goes beyond drafts and suggestions — actual completed projects and deliverables.”

The product updates are dense but consequential. Enterprise administrators can now build private plugin marketplaces tailored to their organizations, connecting to private GitHub repositories as plugin sources and controlling which plugins employees can access. Anthropic introduced new prebuilt plugin templates spanning HR, design, engineering, operations, financial analysis, investment banking, equity research, private equity, and wealth management. The company also shipped new MCP connectors for Google Drive, Google Calendar, Gmail, DocuSign, Apollo, Clay, Outreach, SimilarWeb, MSCI, LegalZoom, FactSet, WordPress, and Harvey — dramatically extending Claude’s reach into the software ecosystem that enterprises already use. And Claude can now pass context seamlessly between Cowork, Excel, and PowerPoint, including across multiple files, without requiring users to restart when switching applications.

White emphasized that the system is designed to feel native to each organization rather than generic. “We’ve heard loud and clear from enterprises — you want Claude to work the way that your company works, not just Claude for legal, but Cowork for legal at your company,” he said. “That’s exactly what today’s launches deliver.”

Real-world results from Spotify, Novo Nordisk, and Salesforce hint at what’s coming

To ground the product announcements in measurable outcomes, Anthropic showcased three enterprise deployments that illustrate both the scale and the variety of impact the company claims Claude can deliver.

At Spotify, engineers had long struggled with code migrations — the slow, manual work of updating and modernizing code across thousands of services. Jensen explained that after integrating Claude directly into the system Spotify’s engineers use daily, “any engineer can kick off a large-scale migration just by describing what they need in plain English.” The company reports up to a 90% reduction in engineering time, over 650 AI-generated code changes shipped per month, and roughly half of all Spotify updates now flowing through the system.

At Novo Nordisk, the pharmaceutical giant built an AI-powered platform called NovoScribe with Claude as its intelligence layer, targeting the grueling process of producing regulatory documentation for new medicines. Staff writers had previously averaged just over two reports per year. After deploying Claude, Jensen said, “documentation creation went from 10 plus weeks to 10 minutes. That’s a 95% reduction in resources for verification checks. Medicines are reaching patients faster.” Jensen also noted that Novo Nordisk used Claude Code to build the platform itself, enabling contributions from non-engineers — their digitalization strategy director, who holds a PhD in molecular biology rather than engineering, now prototypes features using natural language. “A team of 11 is operating like a team many times its size,” Jensen said.

Salesforce, meanwhile, uses Claude models to help power AI in Slack, reporting a 96% satisfaction rate for tools like its Slack bot and saving customers an estimated 97 minutes per week through summarization and recap features. The partnership reflects Anthropic’s broader ecosystem strategy: Jensen described the companies featured at the event as “Claude partners and domain experts with the data and trusted relationships that make Claude work in the real world.”

Enterprise leaders reveal the messy reality behind AI transformation

Perhaps the most illuminating segment of the event was a panel discussion featuring executives from Thomson Reuters, the New York Stock Exchange, and Epic, who provided candid assessments of AI’s enterprise reality that went well beyond the polished case studies.

Sridhar Masam, CTO of the New York Stock Exchange, described his organization as “rewiring our engineering process” with Claude Code and building internal AI agents using the Claude Agent SDK that can take instructions from a Jira ticket all the way to a committed piece of code. But he also identified fundamental shifts in how leaders must think. “The accountability is shifting,” he said. “Traditionally, we are so used to building deterministic platforms. You write code requirements and build. And now, with AI being probabilistic, the accountability doesn’t end when the project goes live, but on a daily basis, monitoring the behavior and outcomes.” He described a new paradigm beyond “buy versus build” — what he called “assembly,” the practice of combining multiple models, multiple vendors, platforms, data, and internal capabilities into solutions. And he noted that highly regulated industries must shift “from risk avoidance to risk calibration,” because simply avoiding AI is no longer a competitive option.

Steve Haske from Thomson Reuters, whose Co-Counsel product has reached a million users, was frank about the gap between what the technology can do and what organizations are ready for. “The tools are in many senses ahead of the change management,” he said. “A general counsel’s office, a law firm, a tax and accounting firm, an audit firm, need to rewire the processes to be able to take advantage of the benefits that the tools provide. And I think it’s 18 months away before that sort of change management catches up with the standard of the tool.” He also stressed an “ironclad guarantee” to Co-Counsel customers that “their input will not be part of our AI output,” and urged enterprise leaders to be “feverish” about protecting institutional intellectual property.

Seth Hain from Epic — the healthcare technology company behind MyChart — offered a finding that may foreshadow where enterprise AI adoption is truly heading. “Over half of our use of Claude Code is by non-developer roles across the company,” Hain said, describing how support and implementation staff had adopted the tool in ways the company never anticipated. Hain also described a deliberate trust-building strategy: Epic’s first AI capability was a medical record summarization that included links to the underlying source material, giving clinicians the ability to verify and build confidence before the company introduced more autonomous agent capabilities.

A year of Claude Code and MCP adoption explains why this moment feels different

Tuesday’s announcements cannot be understood in isolation. They are essentially the culmination of a year in which Anthropic transformed itself from a research-focused AI lab into a company with genuine enterprise distribution and developer ecosystem gravity.

The trajectory began with Claude Code, which Jensen noted had taken coding use cases “from assisting on tiny tasks to AI writing 90 or sometimes even 100% of the code, with enterprises shipping in weeks what once took many quarters.” But the deeper structural shift was the adoption of MCP — the Model Context Protocol — which has become the connective tissue allowing Claude to reach into and act upon data across an organization’s entire technology stack. Where previous AI tools were constrained to the information users manually fed them, MCP-connected Claude can pull context from Slack threads, Google Drive documents, CRM records, and financial systems simultaneously. This is what makes the plugin architecture announced Tuesday fundamentally different from earlier chatbot-style enterprise AI: it turns Claude into a reasoning layer that sits across an organization’s existing infrastructure rather than alongside it.

The implications for the broader AI industry are profound. Anthropic is effectively building a platform play — private plugin marketplaces, portable file-based plugins, and an expanding library of MCP connectors — that echoes the ecosystem strategies of earlier platform giants like Salesforce and Microsoft. The difference is velocity: Anthropic is compressing into months the kind of ecosystem development that previously took years. The company’s willingness to ship sector-specific plugin templates for investment banking, equity research, and wealth management alongside general-purpose tools signals that it sees no bright line between platform and application, between enabling partners and competing with them.

This strategic ambiguity is precisely what has spooked Wall Street. IBM shares suffered their worst single-day loss since October 2000 — down nearly 13.2% — on Monday after Anthropic published a blog post about using Claude Code to modernize COBOL, the decades-old programming language that runs on IBM’s mainframe systems. Enterprise software stocks had already been under heavy pressure since the initial Cowork announcement on January 30, with companies like ServiceNow, Salesforce, Snowflake, Intuit, and Thomson Reuters all experiencing steep declines. Cybersecurity companies tumbled after the company unveiled Claude Code Security on February 20.

Yet Tuesday’s event triggered a partial reversal that revealed something important about how markets are processing AI disruption. Companies named as Anthropic partners and integration targets — Salesforce, DocuSign, LegalZoom, Thomson Reuters, FactSet — all rallied, some sharply. Thomson Reuters surged more than 11%. The market appears to be drawing a new distinction: companies integrated into Anthropic’s ecosystem may benefit, while those standing outside it face existential risk.

Anthropic’s own economist warns that AI’s impact will be uneven — and fast

Peter McCrory, Anthropic’s head of economics, presented data from the Anthropic Economic Index that offered a sober counterweight to the event’s product optimism. Using privacy-preserving methods to analyze how people and businesses use Claude, McCrory’s team has tracked AI’s diffusion across more than 150 countries and every US state.

The headline finding is striking: a year ago, roughly a third of all US jobs had at least a quarter of their associated tasks appearing in Claude usage data. That figure has now risen to approximately one in every two jobs. “The scope of impact is broadening out throughout the economy as the tools and as the technology becomes more capable,” McCrory said. He characterized AI as a “general purpose technology” in the economic sense — meaning virtually no facet of the economy will be unaffected.

McCrory drew a critical distinction between automation, where Claude simply executes a task, and augmentation, where it collaborates with a human on more complex work. When businesses embed Claude through the API, he noted, “we see overwhelmingly Claude is being embedded in automated ways” — a pattern consistent with how transformative technologies have historically diffused through the economy.

On the question of job displacement, McCrory was measured but direct. He noted that “roles that typically require more years of schooling have the largest productivity or efficiency gains,” suggesting a dynamic economists call skill-biased technical change. He expressed concern about “jobs that are pure implementation” — citing data entry workers and technical writers as examples where Claude is already being used for tasks central to those occupations. But he emphasized that no evidence of widespread labor displacement has materialized yet, and pointed to forthcoming research that would introduce methodology for monitoring whether highly exposed workers are beginning to experience it.

His advice to enterprise leaders cut to the heart of the organizational challenge. “It might not just be about fundamental capabilities of the model,” McCrory said. “Do you have the right sort of data ecosystem, data infrastructure to provide the right information at the right time?” If the knowledge Claude needs to execute a sophisticated task exists only in a coworker’s head, he argued, “that’s not a technical problem, per se. That’s an organizational problem.”

The question every enterprise leader is now asking — and why no one has the answer yet

Jensen described a concept Anthropic calls “the thinking divide” — the growing gap between organizations that embed AI across employees, processes, and products simultaneously, and those that treat it as a point solution. The companies on the right side of that divide, she argued, will compound their advantage over time. Those on the wrong side “will find themselves falling further and further behind.”

Whether Anthropic ultimately functions as the rising tide that lifts the enterprise software ecosystem or the wave that swamps it remains genuinely uncertain. The same event that triggered a rally in shares of Anthropic’s named partners has also accelerated a broader reckoning for legacy software companies that cannot yet articulate how they fit into an AI-native world. McCrory, the economist, counseled humility. “Capabilities are moving very, very quickly,” he said. “It might represent an innovation in the method of innovation. So it’s not just making us better at the things that we do — it’s helping us discover new ways to do things.”

Thomson Reuters’ Haske perhaps put it most practically. “As leaders, we all have to get personally involved and personally invested in using the tools,” he said. “We’ve got to move fast. This environment is changing quickly. We cannot afford to get left behind.”

A Fortune 10 CIO recently told Jensen that enterprises would need to fit a decade of innovation into the next few years. The CIO smiled and said: “We’re going to do it in one with you.” Whether that confidence proves prescient or premature, one thing is clear from Tuesday’s event — the window for figuring it out is closing faster than most boardrooms realize.

Data, Orchestration, technology

Anthropic says DeepSeek, Moonshot, and MiniMax used 24,000 fake accounts to rip off Claude

Anthropic dropped a bombshell on the artificial intelligence industry Monday, publicly accusing three prominent Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — of orchestrating coordinated, industrial-scale campaigns to siphon capabilities from its Claude models using tens of thousands of fraudulent accounts.

The San Francisco-based company said the three labs collectively generated more than 16 million exchanges with Claude through approximately 24,000 fake accounts, all in violation of Anthropic’s terms of service and regional access restrictions. The campaigns, Anthropic said, are the most concrete and detailed public evidence to date of a practice that has haunted Silicon Valley for months: foreign competitors systematically using a technique called distillation to leapfrog years of research and billions of dollars in investment.

“These campaigns are growing in intensity and sophistication,” Anthropic wrote in a technical blog post published Monday. “The window to act is narrow, and the threat extends beyond any single company or region. Addressing it will require rapid, coordinated action among industry players, policymakers, and the global AI community.”

The disclosure marks a dramatic escalation in the simmering tensions between American and Chinese AI developers — and it arrives at a moment when Washington is actively debating whether to tighten or loosen export controls on the advanced chips that power AI training. Anthropic, led by CEO Dario Amodei, has been among the most vocal advocates for restricting chip sales to China, and the company explicitly connected Monday’s revelations to that policy fight.

How AI distillation went from obscure research technique to geopolitical flashpoint

To understand what Anthropic alleges, it helps to understand what distillation actually is — and how it evolved from an academic curiosity into the most contentious issue in the global AI race.

At its core, distillation is a process of extracting knowledge from a larger, more powerful AI model — the “teacher” — to create a smaller, more efficient one — the “student.” The student model learns not from raw data, but from the teacher’s outputs: its answers, reasoning patterns, and behaviors. Done correctly, the student can achieve performance remarkably close to the teacher’s while requiring a fraction of the compute to train.

As Anthropic itself acknowledged, distillation is “a widely used and legitimate training method.” Frontier AI labs, including Anthropic, routinely distill their own models to create smaller, cheaper versions for customers. But the same technique can be weaponized. A competitor can pose as a legitimate customer, bombard a frontier model with carefully crafted prompts, collect the outputs, and use those outputs to train a rival system — capturing capabilities that took years and hundreds of millions of dollars to develop.

The technique burst into public consciousness in January 2025 when DeepSeek released its R1 reasoning model, which appeared to match or approach the performance of leading American models at dramatically lower cost. Databricks CEO Ali Ghodsi captured the industry’s anxiety at the time, telling CNBC: “This distillation technique is just so extremely powerful and so extremely cheap, and it’s just available to anyone.” He predicted the technique would usher in an era of intense competition for large language models.

That prediction proved prescient. In the weeks following DeepSeek’s release, researchers at UC Berkeley said they recreated OpenAI’s reasoning model for just $450 in 19 hours. Researchers at Stanford and the University of Washington followed with their own version built in 26 minutes for under $50 in compute credits. The startup Hugging Face replicated OpenAI’s Deep Research feature as a 24-hour coding challenge. DeepSeek itself openly released a family of distilled models on Hugging Face — including versions built on top of Qwen and Llama architectures — under the permissive MIT license, with the model card explicitly stating that the DeepSeek-R1 series supports commercial use and allows for any modifications and derivative works, “including, but not limited to, distillation for training other LLMs.”

But what Anthropic described Monday goes far beyond academic replication or open-source experimentation. The company detailed what it characterized as deliberate, covert, and large-scale intellectual property extraction by well-resourced commercial laboratories operating under the jurisdiction of the Chinese government.

Anthropic traces 16 million fraudulent exchanges to researchers at DeepSeek, Moonshot, and MiniMax

Anthropic attributed each campaign “with high confidence” through IP address correlation, request metadata, infrastructure indicators, and corroboration from unnamed industry partners who observed the same actors on their own platforms. Each campaign specifically targeted what Anthropic described as Claude’s most differentiated capabilities: agentic reasoning, tool use, and coding.

DeepSeek, the company that ignited the distillation debate, conducted what Anthropic described as the most technically sophisticated of the three operations, generating over 150,000 exchanges with Claude. Anthropic said DeepSeek’s prompts targeted reasoning capabilities, rubric-based grading tasks designed to make Claude function as a reward model for reinforcement learning, and — in a detail likely to draw particular political attention — the creation of “censorship-safe alternatives to policy sensitive queries.”

Anthropic alleged that DeepSeek “generated synchronized traffic across accounts” with “identical patterns, shared payment methods, and coordinated timing” that suggested load balancing to maximize throughput while evading detection. In one particularly notable technique, Anthropic said DeepSeek’s prompts “asked Claude to imagine and articulate the internal reasoning behind a completed response and write it out step by step — effectively generating chain-of-thought training data at scale.” The company also alleged it observed tasks in which Claude was used to generate alternatives to politically sensitive queries about “dissidents, party leaders, or authoritarianism,” likely to train DeepSeek’s own models to steer conversations away from censored topics. Anthropic said it was able to trace these accounts to specific researchers at the lab.

Moonshot AI, the Beijing-based creator of the Kimi models, ran the second-largest operation by volume at over 3.4 million exchanges. Anthropic said Moonshot targeted agentic reasoning and tool use, coding and data analysis, computer-use agent development, and computer vision. The company employed “hundreds of fraudulent accounts spanning multiple access pathways,” making the campaign harder to detect as a coordinated operation. Anthropic attributed the campaign through request metadata that “matched the public profiles of senior Moonshot staff.” In a later phase, Anthropic said, Moonshot adopted a more targeted approach, “attempting to extract and reconstruct Claude’s reasoning traces.”

MiniMax, the least publicly known of the three but the most prolific by volume, generated over 13 million exchanges — more than three-quarters of the total. Anthropic said MiniMax’s campaign focused on agentic coding, tool use, and orchestration. The company said it detected MiniMax’s campaign while it was still active, “before MiniMax released the model it was training,” giving Anthropic “unprecedented visibility into the life cycle of distillation attacks, from data generation through to model launch.” In a detail that underscores the urgency and opportunism Anthropic alleges, the company said that when it released a new model during MiniMax’s active campaign, MiniMax “pivoted within 24 hours, redirecting nearly half their traffic to capture capabilities from our latest system.”

How proxy networks and ‘hydra cluster’ architectures helped Chinese labs bypass Anthropic’s China ban

Anthropic does not currently offer commercial access to Claude in China, a policy it maintains for national security reasons. So how did these labs access the models at all?

The answer, Anthropic said, lies in commercial proxy services that resell access to Claude and other frontier AI models at scale. Anthropic described these services as running what it calls “hydra cluster” architectures — sprawling networks of fraudulent accounts that distribute traffic across Anthropic’s API and third-party cloud platforms. “The breadth of these networks means that there are no single points of failure,” Anthropic wrote. “When one account is banned, a new one takes its place.” In one case, Anthropic said, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder.

The description suggests a mature and well-resourced infrastructure ecosystem dedicated to circumventing access controls — one that may serve many more clients than just the three labs Anthropic named.

Why Anthropic framed distillation as a national security crisis, not just an IP dispute

Anthropic did not treat this as a mere terms-of-service violation. The company embedded its technical disclosure within an explicit national security argument, warning that “illicitly distilled models lack necessary safeguards, creating significant national security risks.”

The company argued that models built through illicit distillation are “unlikely to retain” the safety guardrails that American companies build into their systems — protections designed to prevent AI from being used to develop bioweapons, carry out cyberattacks, or enable mass surveillance. “Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems,” Anthropic wrote, “enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance.”

This framing directly connects to the chip export control debate that Amodei has made a centerpiece of his public advocacy. In a detailed essay published in January 2025, Amodei argued that export controls are “the most important determinant of whether we end up in a unipolar or bipolar world” — a world where either only the U.S. and its allies possess the most powerful AI, or one where China achieves parity. He specifically noted at the time that he was “not taking any position on reports of distillation from Western models” and would “just take DeepSeek at their word that they trained it the way they said in the paper.”

Monday’s disclosure is a sharp departure from that earlier restraint. Anthropic now argues that distillation attacks “undermine” export controls “by allowing foreign labs, including those subject to the control of the Chinese Communist Party, to close the competitive advantage that export controls are designed to preserve through other means.” The company went further, asserting that “without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective.” In other words, Anthropic is arguing that what some observers interpreted as proof that Chinese labs can innovate around chip restrictions was actually, in significant part, the result of stealing American capabilities.

The murky legal landscape around AI distillation may explain Anthropic’s political strategy

Anthropic’s decision to frame this as a national security issue rather than a legal dispute may reflect the difficult reality that intellectual property law offers limited recourse against distillation.

As a March 2025 analysis by the law firm Winston & Strawn noted, “the legal landscape surrounding AI distillation is unclear and evolving.” The firm’s attorneys observed that proving a copyright claim in this context would be challenging, since it remains unclear whether the outputs of AI models qualify as copyrightable creative expression. The U.S. Copyright Office affirmed in January 2025 that copyright protection requires human authorship, and that “mere provision of prompts does not render the outputs copyrightable.”

The legal picture is further complicated by the way frontier labs structure output ownership. OpenAI’s terms of use, for instance, assign ownership of model outputs to the user — meaning that even if a company can prove extraction occurred, it may not hold copyrights over the extracted data. Winston & Strawn noted that this dynamic means “even if OpenAI can present enough evidence to show that DeepSeek extracted data from its models, OpenAI likely does not have copyrights over the data.” The same logic would almost certainly apply to Anthropic’s outputs.

Contract law may offer a more promising avenue. Anthropic’s terms of service prohibit the kind of systematic extraction the company describes, and violation of those terms is a more straightforward legal claim than copyright infringement. But enforcing contractual terms against entities operating through proxy services and fraudulent accounts in a foreign jurisdiction presents its own formidable challenges.

This may explain why Anthropic chose the national security frame over a purely legal one. By positioning distillation attacks as threats to export control regimes and democratic security rather than as intellectual property disputes, Anthropic appeals to policymakers and regulators who have tools — sanctions, entity list designations, enhanced export restrictions — that go far beyond what civil litigation could achieve.

What Anthropic’s distillation crackdown means for every company running a frontier AI model

Anthropic outlined a multipronged defensive response. The company said it has built classifiers and behavioral fingerprinting systems designed to identify distillation attack patterns in API traffic, including detection of chain-of-thought elicitation used to construct reasoning training data. It is sharing technical indicators with other AI labs, cloud providers, and relevant authorities to build what it described as a more holistic picture of the distillation landscape. The company has also strengthened verification for educational accounts, security research programs, and startup organizations — the pathways most commonly exploited for setting up fraudulent accounts — and is developing model-level safeguards designed to reduce the usefulness of outputs for illicit distillation without degrading the experience for legitimate customers.

But the company acknowledged that “no company can solve this alone,” calling for coordinated action across the industry, cloud providers, and policymakers.

The disclosure is likely to reverberate through multiple ongoing policy debates. In Congress, the bipartisan No DeepSeek on Government Devices Act has already been introduced. Federal agencies including NASA have banned DeepSeek from employee devices. And the broader question of chip export controls — which the Trump administration has been weighing amid competing pressures from Nvidia and national security hawks — now has a new and vivid data point.

For the AI industry’s technical decision-makers, the implications are immediate and practical. If Anthropic’s account is accurate, the proxy infrastructure enabling these attacks is vast, sophisticated, and adaptable — and it is not limited to targeting a single company. Every frontier AI lab with an API is a potential target. The era of treating model access as a simple commercial transaction may be coming to an end, replaced by one in which API security is as strategically important as the model weights themselves.

Anthropic has now put names, numbers, and forensic detail behind accusations that the industry had only whispered about for months. Whether that evidence galvanizes the coordinated response the company is calling for — or simply accelerates an arms race between distillers and defenders — may depend on a question no classifier can answer: whether Washington sees this as an act of espionage or just the cost of doing business in an era when intelligence itself has become a commodity.

Data, Security, technology

How Alison.ai is bringing objectivity to video ads before media budgets are spent

Alison.ai is pushing creative validation upstream, using computer vision and predictive scoring to help brands assess video ads earlier and improve performance before campaigns launch.
The post How Alison.ai is bringing objectivity to video ads before …

Alison AI, artificial intelligence, Big Data, Contributor Content, Data, video advertising

Rapidata emerges to shorten AI model development cycles from months to days with near real-time RLHF

Despite growing chatter about a future when much human work is automated by AI, one of the ironies of this current tech boom is how stubbornly reliant on human beings it remains, specifically the process of training AI models using reinforcement learning from human feedback (RLHF).

At its simplest, RLHF is a tutoring system: after an AI is trained on curated data, it still makes mistakes or sounds robotic. Human contractors are then hired en masse by AI labs to rate and rank a new model’s outputs while it trains, and the model learns from their ratings, adjusting its behavior to offer higher-rated outputs. This process is all the more important as AI expands to produce multimedia outputs like video, audio, and imagery which may have more nuanced and subjective measures of quality.

Historically, this tutoring process has been a massive logistical headache and PR nightmare for AI companies, relying on fragmented networks of foreign contractors and static labeling pools in specific, low-income geographic hubs, cast by the media as low wage — even exploitative. It’s also inefficient: requiring AI labs wait weeks or months for a single batch of feedback, delaying model progress.

Now a new startup has emerged to make the process far more efficient: Rapidata‘s platform effectively “gamifies” RLHF by pushing said review tasks around the globe to nearly 20 million users of popular apps, including Duolingo or Candy Crush, in the form of short, opt-in review tasks they can choose to complete in place of watching mobile ads, with data sent back to a commissioning AI lab instantly.

As shared with VentureBeat in a press release, this platform allows AI labs to “iterate on models in near-real-time,” significantly shortening development timelines compared to traditional methods.

CEO and founder Jason Corkill stated in the same release that Rapidata makes “human judgment available at a global scale and near real time, unlocking a future where AI teams can run constant feedback loops and build systems that evolve every day instead of every release cycle.””

Rapidata treats RLHF as high-speed infrastructure rather than a manual labor problem. Today, the company exclusively announced to us at VentureBeat its emergence with an $8.5 million seed round co-led by Canaan Partners and IA Ventures, with participation from Acequia Capital and BlueYard, to scale its unique approach to on-demand human data.

The pub conversation that built a human cloud

The genesis of Rapidata was born not in a boardroom, but at a table over a few beers. When Corkill was a student at ETH Zurich, working in robotics and computer vision, when he hit the wall that every AI engineer eventually faces: the data annotation bottleneck.

“Specifically, I’ve been working in robotics, AI and computer vision for quite a few years now, studied at ETH here in Zurich, and just always was frustrated with data annotation,” Corkill recalled in a recent interview. “Always when you needed humans or human data annotation, that’s kind of when your project was stopped in its tracks, because up until then, you could move it forward by just pushing longer nights. But when you needed the large scale human annotation, you had to go to someone and then wait for a few weeks”.

Frustrated by this delay, Corkill and his co-founders realized that the existing labor model for AI was fundamentally broken for a world moving at the speed of modern compute. While compute scales exponentially, the traditional human workforce—bound by manual onboarding, regional hiring, and slow payment cycles—does not. Rapidata was born from the idea that human judgment could be delivered as a globally distributed, near-instantaneous service.

Technology: Turning digital footprints into training data

The core innovation of Rapidata lies in its distribution method. Rather than hiring full-time annotators in specific regions, Rapidata leverages the existing attention economy of the mobile app world. By partnering with third-party apps like Candy Crush or Duolingo, Rapidata offers users a choice: watch a traditional ad or spend a few seconds providing feedback for an AI model.

“The users are asked, ‘Hey, would you rather instead of watching ads and having, you know, companies buy your eyeballs like that, would you rather like annotate some data, give feedback?'” Corkill explained. According to Corkill, between 50% and 60% of users opt for the feedback task over a traditional video advertisement.

This “crowd intelligence” approach allows AI teams to tap into a diverse, global demographic at an unprecedented scale.

The global network: Rapidata currently reaches between 15 and 20 million people.
Massive parallelism: The platform can process 1.5 million human annotations in a single hour.
Speed: Feedback cycles that previously took weeks or months are reduced to hours or even minutes.
Quality control: The platform builds trust and expertise profiles for respondents over time, ensuring that complex questions are matched with the most relevant human judges.
Anonymity: While users are tracked via anonymized IDs to ensure consistency and reliability, Rapidata does not collect personal identities, maintaining privacy while optimizing for data quality.

Online RLHF: Moving into the GPU

The most significant technological leap Rapidata is enabling is what Corkill describes as “online RLHF”. Traditionally, AI is trained in disconnected batches: you train the model, stop, send data to humans, wait weeks for labels, and then resume. This creates a “circle” of information that often lacks fresh human input.

Rapidata is moving this judgment directly into the training loop. Because their network is so fast, they can integrate via API directly with the GPUs running the model.

“We’ve always had this idea of reinforcement learning for human feedback… so far, you always had to do it like in batches,” Corkill said. “Now, if you go all the way down, we have a few clients now where, because we’re so fast, we can be directly, basically in the process, like in in the processor on the GPU right, and the GPU calculate some output, and it can immediately request from us in a distributed fashion. ‘Oh, I need, I need, I need a human to look at this.’ I get the answer and then apply that loss, which has not been possible so far”.

Currently, the platform supports roughly 5,500 humans per minute providing live feedback to models running on thousands of GPUs. This prevents “reward model hacking,” where two AI models trick each other in a feedback loop, by grounding the training in actual human nuance.

Product: Solving for taste and global context

As AI moves beyond simple object recognition into generative media, the requirements for data labeling have evolved from objective tagging to subjective “taste-based” curation. It is no longer just about “is this a cat?” but rather “is this voice synthesis convincing?” or “which of these two summaries feels more professional?”.

Lily Clifford, CEO of the voice AI startup Rime, notes that Rapidata has been transformative for testing models in real-world contexts. “Previously, gathering meaningful feedback meant cobbling together vendors and surveys, segment by segment, or country by country, which didn’t scale,” Clifford said. Using Rapidata, Rime can reach the right audiences—whether in Sweden, Serbia, or the United States—and see how models perform in real customer workflows in days, not months.

“Most models are factually correct, but I’m sure you’re you have received emails that feel, you know, not authentic, right?” Corkill noted. “You can smell an AI email, you can smell an AI image or a video, it’s immediately clear to you… these models still don’t feel human, and you need human feedback to do that”.

The economic and operational shift

From an operational standpoint, Rapidata positions itself as an infrastructure layer that eliminates the need for companies to manage their own custom annotation operations. By providing a scalable network, the company is lowering the barrier to entry for AI teams that previously struggled with the cost and complexity of traditional feedback loops.

Jared Newman of Canaan Partners, who led the investment, suggests that this infrastructure is essential for the next generation of AI. “Every serious AI deployment depends on human judgment somewhere in the lifecycle,” Newman said. “As models move from expertise-based tasks to taste-based curation, the demand for scalable human feedback will grow dramatically”.

A future of human use

While the current focus is on the model labs of the Bay Area, Corkill sees a future where the AI models themselves become the primary customers of human judgment. He calls this “human use”.

In this vision, a car designer AI wouldn’t just generate a generic vehicle; it could programmatically call Rapidata to ask 25,000 people in the French market what they think of a specific aesthetic, iterate on that feedback, and refine its design within hours.

“Society is in constant flux,” Corkill noted, addressing the trend of using AI to simulate human behavior. “If they simulate a society now, the simulation will be stable for and maybe mirror ours for a few months, but then it completely changes, because society has changed and has developed completely differently”.

By creating a distributed, programmatic way to access human brain capacity worldwide, Rapidata is positioning itself as the vital interconnect between silicon and society. With $8.5 million in new funding, the company plans to move aggressively to ensure that as AI scales, the human element is no longer a bottleneck, but a real-time feature.

Data

The ‘last-mile’ data problem is stalling enterprise agentic AI — ‘golden pipelines’ aim to fix it

Traditional ETL tools like dbt or Fivetran prepare data for reporting: structured analytics and dashboards with stable schemas. AI applications need something different: preparing messy, evolving operational data for model inference in real-time.

Empromptu calls this distinction “inference integrity” versus “reporting integrity.” Instead of treating data preparation as a separate discipline, golden pipelines integrate normalization directly into the AI application workflow, collapsing what typically requires 14 days of manual engineering into under an hour, the company says. Empromptu’s “golden pipeline” approach is a way to accelerate data preparation and make sure that data is accurate.

The company works primarily with mid-market and enterprise customers in regulated industries where data accuracy and compliance are non-negotiable. Fintech is Empromptu’s fastest-growing vertical, with additional customers in healthcare and legal tech. The platform is HIPAA compliant and SOC 2 certified.

“Enterprise AI doesn’t break at the model layer, it breaks when messy data meets real users,” Shanea Leven, CEO and co-founder of Empromptu told VentureBeat in an exclusive interview. “Golden pipelines bring data ingestion, preparation and governance directly into the AI application workflow so teams can build systems that actually work in production.”

How golden pipelines work

Golden pipelines operate as an automated layer that sits between raw operational data and AI application features.

The system handles five core functions. First, it ingests data from any source including files, databases, APIs and unstructured documents. It then processes that data through automated inspection and cleaning, structuring with schema definitions, and labeling and enrichment to fill gaps and classify records. Built-in governance and compliance checks include audit trails, access controls and privacy enforcement.

The technical approach combines deterministic preprocessing with AI-assisted normalization. Instead of hard-coding every transformation, the system identifies inconsistencies, infers missing structure and generates classifications based on model context. Every transformation is logged and tied directly to downstream AI evaluation.

The evaluation loop is central to how golden pipelines function. If data normalization reduces downstream accuracy, the system catches it through continuous evaluation against production behavior. That feedback coupling between data preparation and model performance distinguishes golden pipelines from traditional ETL tools, according to Leven.

Golden pipelines are embedded directly into the Empromptu Builder and run automatically as part of creating an AI application. From the user’s perspective, teams are building AI features. Under the hood, golden pipelines ensure the data feeding those features is clean, structured, governed and ready for production use.

Reporting integrity versus inference integrity

Leven positions golden pipelines as solving a fundamentally different problem than traditional ETL tools like dbt, Fivetran or Databricks.

“Dbt and Fivetran are optimized for reporting integrity. Golden pipelines are optimized for inference integrity,” Leven said. “Traditional ETL tools are designed to move and transform structured data based on predefined rules. They assume schema stability, known transformations and relatively static logic.”

“We’re not replacing dbt or Fivetran, enterprises will continue to use those for warehouse integrity and structured reporting,” Leven said. “Golden pipelines sit closer to the AI application layer. They solve the last-mile problem: how do you take real-world, imperfect operational data and make it usable for AI features without months of manual wrangling?”

The trust argument for AI-driven normalization rests on auditability and continuous evaluation.

“It is not unsupervised magic. It is reviewable, auditable and continuously evaluated against production behavior,” Leven said. “If normalization reduces downstream accuracy, the evaluation loop catches it. That feedback coupling between data preparation and model performance is something traditional ETL pipelines do not provide.”

Customer deployment: VOW tackles high-stakes event data

The golden pipeline approach is already having an impact in the real world.

Event management platform VOW handles high-profile events for organizations like GLAAD as well as multiple sports organizations. When GLAAD plans an event, data populates across sponsor invites, ticket purchases, tables, seats and more. The process happens quickly and data consistency is non-negotiable.

“Our data is more complex than the average platform,” Jennifer Brisman, CEO of VOW, told VentureBeat. “When GLAAD plans an event that data gets populated across sponsor invites, ticket purchases, tables and seats, and more. And it all has to happen very quickly.”

VOW was writing regex scripts manually. When the company decided to build an AI-generated floor plan feature that updated data in near real-time and populated information across the platform, ensuring data accuracy became critical. Golden Pipelines automated the process of extracting data from floor plans that often arrived messy, inconsistent and unstructured, then formatting and sending it without extensive manual effort across the engineering team.

VOW initially used Empromptu for AI-generated floor plan analysis that neither Google’s AI team nor Amazon’s AI team could solve. The company is now rewriting its entire platform on Empromptu’s system.

What this means for enterprise AI deployments

Golden pipelines target a specific deployment pattern: organizations building integrated AI applications where data preparation is currently a manual bottleneck between prototype and production.

The approach makes less sense for teams that already have mature data engineering organizations with established ETL processes optimized for their specific domains, or for organizations building standalone AI models rather than integrated applications.

The decision point is whether data preparation is blocking AI velocity in the organization. If data scientists are preparing datasets for experimentation that engineering teams then rebuild from scratch for production, integrated data prep addresses that gap.

If the bottleneck is elsewhere in the AI development lifecycle, it won’t. The trade-off is platform integration vs tool flexibility. Teams using golden pipelines commit to an integrated approach where data preparation, AI application development and governance happen in a single platform. Organizations that prefer assembling best-of-breed tools for each function will find that approach limiting. The benefit is eliminating handoffs between data prep and application development. The cost is reduced optionality in how those functions are implemented.

Data

SurrealDB 3.0 wants to replace your five-database RAG stack with one

Building retrieval-augmented generation (RAG) systems for AI agents often involves using multiple layers and technologies for structured data, vectors and graph information. In recent months it has also become increasingly clear that agentic AI systems need memory, sometimes referred to as contextual memory, to operate effectively.

The complexity and synchronization of having different data layers to enable context can lead to performance and accuracy issues. It’s a challenge that SurrealDB is looking to solve.

SurrealDB on Tuesday launched version 3.0 of its namesake database alongside a $23 million Series A extension, bringing total funding to $44 million. The company had taken a different architectural approach than relational databases like PostgreSQL, native vector databases like Pinecone or a graph database like Neo4j. The OpenAI engineering team recently detailed how it scaled Postgres to 800 million users using read replicas — an approach that works for read-heavy workloads. SurrealDB takes a different approach: Store agent memory, business logic, and multi-modal data directly inside the database. Instead of synchronizing across multiple systems, vector search, graph traversal, and relational queries all run transactionally in a single Rust-native engine that maintains consistency.

“People are running DuckDB, Postgres, Snowflake, Neo4j, Quadrant or Pinecone all together, and then they’re wondering why they can’t get good accuracy in their agents,” CEO and co-founder Tobie Morgan Hitchcock told VentureBeat. “It’s because they’re having to send five different queries to five different databases which only have the knowledge or the context that they deal with.”

The architecture has resonated with developers, with 2.3 million downloads and 31,000 GitHub stars to date for the database. Existing deployments span edge devices in cars and defense systems, product recommendation engines for major New York retailers, and Android ad serving technologies, according to Hitchcock.

Agentic AI memory baked into the database

SurrealDB stores agent memory as graph relationships and semantic metadata directly in the database, not in application code or external caching layers.

The Surrealism plugin system in SurrealDB 3.0 lets developers define how agents build and query this memory; the logic runs inside the database with transactional guarantees rather than in middleware.

Here’s what that means in practice: When an agent interacts with data, it creates context graphs that link entities, decisions and domain knowledge as database records. These relationships are queryable through the same SurrealQL interface used for vector search and structured data. An agent asking about a customer issue can traverse graph connections to related past incidents, pull vector embeddings of similar cases, and join with structured customer data — all in one transactional query.

“People don’t want to store just the latest data anymore,” Hitchcock said. “They want to store all that data. They want to analyze and have the AI understand and run through all the data of an organization over the last year or two, because that informs their model, their AI agent about context, about history, and that can therefore deliver better results.”

How SurrealDB’s architecture differs from traditional RAG stacks

Traditional RAG systems query databases based on data types. Developers write separate queries for vector similarity search, graph traversal, and relational joins, then merge results in application code. This creates synchronization delays as queries round-trip between systems.

In contrast, Hitchcock explained that SurrealDB stores data as binary-encoded documents with graph relationships embedded directly alongside them. A single query through SurrealQL can traverse graph relationships, perform vector similarity searches, and join structured records without leaving the database.

That architecture also affects how consistency works at scale: Every node maintains transactional consistency, even at 50+ node scale, Hitchcock said. When an agent writes new context to node A, a query on node B immediately sees that update. No caching, no read replicas.

“A lot of our use cases, a lot of our deployments are where data is constantly updated and the relationships, the context, the semantic understanding, or the graph connections between that data needs to be constantly refreshed,” he said. “So no caching. There’s no read replicas. In SurrealDB, every single thing is transactional.”

What this means for enterprise IT

“It’s important to say SurrealDB is not the best database for every task. I’d love to say we are, but it’s not. And you can’t be,” Hitchcock said. “If you only need analysis over petabytes of data and you’re never really updating that data, then you’re going to be best going with object storage or a columnar database. If you’re just dealing with vector search, then you can go with a vector database like Quadrant or Pinecone, and that’s going to suffice.”

The inflection point comes when you need multiple data types together. The practical benefit shows up in development timelines. What used to take months to build with multi-database orchestration can now launch in days, Hitchcock said.

Data

OpenAI deploys Cerebras chips for 15x faster code generation in first major move beyond Nvidia

OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company’s first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads.

The partnership arrives at a pivotal moment for OpenAI. The company finds itself navigating a frayed relationship with longtime chip supplier Nvidia, mounting criticism over its decision to introduce advertisements into ChatGPT, a newly announced Pentagon contract, and internal organizational upheaval that has seen a safety-focused team disbanded and at least one researcher resign in protest.

“GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage,” an OpenAI spokesperson told VentureBeat. “Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so use cases such as real-time coding in Codex feel more responsive as you iterate.”

The careful framing — emphasizing that GPUs “remain foundational” while positioning Cerebras as a “complement” — underscores the delicate balance OpenAI must strike as it diversifies its chip suppliers without alienating Nvidia, the dominant force in AI accelerators.

Speed gains come with capability tradeoffs that OpenAI says developers will accept

Codex-Spark represents OpenAI’s first model purpose-built for real-time coding collaboration. The company claims the model delivers generation speeds 15 times faster than its predecessor, though it declined to provide specific latency metrics such as time-to-first-token or tokens-per-second figures.

“We aren’t able to share specific latency numbers, however Codex-Spark is optimized to feel near-instant—delivering 15x faster generation speeds while remaining highly capable for real-world coding tasks,” the OpenAI spokesperson said.

The speed gains come with acknowledged capability tradeoffs. On SWE-Bench Pro and Terminal-Bench 2.0 — two industry benchmarks that evaluate AI systems’ ability to perform complex software engineering tasks autonomously — Codex-Spark underperforms the full GPT-5.3-Codex model. OpenAI positions this as an acceptable exchange: developers get responses fast enough to maintain creative flow, even if the underlying model cannot tackle the most sophisticated multi-step programming challenges.

The model launches with a 128,000-token context window and supports text only — no image or multimodal inputs. OpenAI has made it available as a research preview to ChatGPT Pro subscribers through the Codex app, command-line interface, and Visual Studio Code extension. A small group of enterprise partners will receive API access to evaluate integration possibilities.

“We are making Codex-Spark available in the API for a small set of design partners to understand how developers want to integrate Codex-Spark into their products,” the spokesperson explained. “We’ll expand access over the coming weeks as we continue tuning our integration under real workloads.”

Cerebras hardware eliminates bottlenecks that plague traditional GPU clusters

The technical architecture behind Codex-Spark tells a story about inference economics that increasingly matters as AI companies scale consumer-facing products. Cerebras’s Wafer Scale Engine 3 — a single chip roughly the size of a dinner plate containing 4 trillion transistors — eliminates much of the communication overhead that occurs when AI workloads spread across clusters of smaller processors.

For training massive models, that distributed approach remains necessary and Nvidia’s GPUs excel at it. But for inference — the process of generating responses to user queries — Cerebras argues its architecture can deliver results with dramatically lower latency. Sean Lie, Cerebras’s CTO and co-founder, framed the partnership as an opportunity to reshape how developers interact with AI systems.

“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience,” Lie said in a statement. “This preview is just the beginning.”

OpenAI’s infrastructure team did not limit its optimization work to the Cerebras hardware. The company announced latency improvements across its entire inference stack that benefit all Codex models regardless of underlying hardware, including persistent WebSocket connections and optimizations within the Responses API. The results: 80 percent reduction in overhead per client-server round trip, 30 percent reduction in per-token overhead, and 50 percent reduction in time-to-first-token.

A $100 billion Nvidia megadeal has quietly fallen apart behind the scenes

The Cerebras partnership takes on additional significance given the increasingly complicated relationship between OpenAI and Nvidia. Last fall, when OpenAI announced its Stargate infrastructure initiative, Nvidia publicly committed to investing $100 billion to support OpenAI as it built out AI infrastructure. The announcement appeared to cement a strategic alliance between the world’s most valuable AI company and its dominant chip supplier.

Five months later, that megadeal has effectively stalled, according to multiple reports. Nvidia CEO Jensen Huang has publicly denied tensions, telling reporters in late January that there is “no drama” and that Nvidia remains committed to participating in OpenAI’s current funding round. But the relationship has cooled considerably, with friction stemming from multiple sources.

OpenAI has aggressively pursued partnerships with alternative chip suppliers, including the Cerebras deal and separate agreements with AMD and Broadcom. From Nvidia’s perspective, OpenAI may be using its influence to commoditize the very hardware that made its AI breakthroughs possible. From OpenAI’s perspective, reducing dependence on a single supplier represents prudent business strategy.

“We will continue working with the ecosystem on evaluating the most price-performant chips across all use cases on an ongoing basis,” OpenAI’s spokesperson told VentureBeat. “GPUs remain our priority for cost-sensitive and throughput-first use cases across research and inference.” The statement reads as a careful effort to avoid antagonizing Nvidia while preserving flexibility — and reflects a broader reality that training frontier AI models still requires exactly the kind of massive parallel processing that Nvidia GPUs provide.

Disbanded safety teams and researcher departures raise questions about OpenAI’s priorities

The Codex-Spark launch comes as OpenAI navigates a series of internal challenges that have intensified scrutiny of the company’s direction and values. Earlier this week, reports emerged that OpenAI disbanded its mission alignment team, a group established in September 2024 to promote the company’s stated goal of ensuring artificial general intelligence benefits humanity. The team’s seven members have been reassigned to other roles, with leader Joshua Achiam given a new title as OpenAI’s “chief futurist.”

OpenAI previously disbanded another safety-focused group, the superalignment team, in 2024. That team had concentrated on long-term existential risks from AI. The pattern of dissolving safety-oriented teams has drawn criticism from researchers who argue that OpenAI’s commercial pressures are overwhelming its original non-profit mission.

The company also faces fallout from its decision to introduce advertisements into ChatGPT. Researcher Zoë Hitzig resigned this week over what she described as the “slippery slope” of ad-supported AI, warning in a New York Times essay that ChatGPT’s archive of intimate user conversations creates unprecedented opportunities for manipulation. Anthropic seized on the controversy with a Super Bowl advertising campaign featuring the tagline: “Ads are coming to AI. But not to Claude.”

Separately, the company agreed to provide ChatGPT to the Pentagon through Genai.mil, a new Department of Defense program that requires OpenAI to permit “all lawful uses” without company-imposed restrictions — terms that Anthropic reportedly rejected. And reports emerged that Ryan Beiermeister, OpenAI’s vice president of product policy who had expressed concerns about a planned explicit content feature, was terminated in January following a discrimination allegation she denies.

OpenAI envisions AI coding assistants that juggle quick edits and complex autonomous tasks

Despite the surrounding turbulence, OpenAI’s technical roadmap for Codex suggests ambitious plans. The company envisions a coding assistant that seamlessly blends rapid-fire interactive editing with longer-running autonomous tasks — an AI that handles quick fixes while simultaneously orchestrating multiple agents working on more complex problems in the background.

“Over time, the modes will blend — Codex can keep you in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when you want breadth and speed, so you don’t have to choose a single mode up front,” the OpenAI spokesperson told VentureBeat.

This vision would require not just faster inference but sophisticated task decomposition and coordination across models of varying sizes and capabilities. Codex-Spark establishes the low-latency foundation for the interactive portion of that experience; future releases will need to deliver the autonomous reasoning and multi-agent coordination that would make the full vision possible.

For now, Codex-Spark operates under separate rate limits from other OpenAI models, reflecting constrained Cerebras infrastructure capacity during the research preview. “Because it runs on specialized low-latency hardware, usage is governed by a separate rate limit that may adjust based on demand during the research preview,” the spokesperson noted. The limits are designed to be “generous,” with OpenAI monitoring usage patterns as it determines how to scale.

The real test is whether faster responses translate into better software

The Codex-Spark announcement arrives amid intense competition for AI-powered developer tools. Anthropic’s Claude Cowork product triggered a selloff in traditional software stocks last week as investors considered whether AI assistants might displace conventional enterprise applications. Microsoft, Google, and Amazon continue investing heavily in AI coding capabilities integrated with their respective cloud platforms.

OpenAI’s Codex app has demonstrated rapid adoption since launching ten days ago, with more than one million downloads and weekly active users growing 60 percent week-over-week. More than 325,000 developers now actively use Codex across free and paid tiers. But the fundamental question facing OpenAI — and the broader AI industry — is whether speed improvements like those promised by Codex-Spark translate into meaningful productivity gains or merely create more pleasant experiences without changing outcomes.

Early evidence from AI coding tools suggests that faster responses encourage more iterative experimentation. Whether that experimentation produces better software remains contested among researchers and practitioners alike. What seems clear is that OpenAI views inference latency as a competitive frontier worth substantial investment, even as that investment takes it beyond its traditional Nvidia partnership into untested territory with alternative chip suppliers.

The Cerebras deal is a calculated bet that specialized hardware can unlock use cases that general-purpose GPUs cannot cost-effectively serve. For a company simultaneously battling competitors, managing strained supplier relationships, and weathering internal dissent over its commercial direction, it is also a reminder that in the AI race, standing still is not an option. OpenAI built its reputation by moving fast and breaking conventions. Now it must prove it can move even faster — without breaking itself.

Data, technology

‘Observational memory’ cuts AI agent costs 10x and outscores RAG on long-context benchmarks

RAG isn’t always fast enough or intelligent enough for modern agentic AI workflows. As teams move from short-lived chatbots to long-running, tool-heavy agents embedded in production systems, those limitations are becoming harder to work around.

In response, teams are experimenting with alternative memory architectures — sometimes called contextual memory or agentic memory — that prioritize persistence and stability over dynamic retrieval.

One of the more recent implementations of this approach is “observational memory,” an open-source technology developed by Mastra, which was founded by the engineers who previously built and sold the Gatsby framework to Netlify.

Unlike RAG systems that retrieve context dynamically, observational memory uses two background agents (Observer and Reflector) to compress conversation history into a dated observation log. The compressed observations stay in context, eliminating retrieval entirely. For text content, the system achieves 3-6x compression. For tool-heavy agent workloads generating large outputs, compression ratios hit 5-40x.

The tradeoff is that observational memory prioritizes what the agent has already seen and decided over searching a broader external corpus, making it less suitable for open-ended knowledge discovery or compliance-heavy recall use cases.

The system scored 94.87% on LongMemEval using GPT-5-mini, while maintaining a completely stable, cacheable context window. On the standard GPT-4o model, observational memory scored 84.23% compared to Mastra’s own RAG implementation at 80.05%.

“It has this great characteristic of being both simpler and it is more powerful, like it scores better on the benchmarks,” Sam Bhagwat, co-founder and CEO of Mastra, told VentureBeat.

How it works: Two agents compress history into observations

The architecture is simpler than traditional memory systems but delivers better results.

Observational memory divides the context window into two blocks. The first contains observations — compressed, dated notes extracted from previous conversations. The second holds raw message history from the current session.

Two background agents manage the compression process. When unobserved messages hit 30,000 tokens (configurable), the Observer agent compresses them into new observations and appends them to the first block. The original messages get dropped. When observations reach 40,000 tokens (also configurable), the Reflector agent restructures and condenses the observation log, combining related items and removing superseded information.

“The way that you’re sort of compressing these messages over time is you’re actually just sort of getting messages, and then you have an agent sort of say, ‘OK, so what are the key things to remember from this set of messages?'” Bhagwat said. “You kind of compress it, and then you get in another 30,000 tokens, and you compress that.”

The format is text-based, not structured objects. No vector databases or graph databases required.

Stable context windows cut token costs up to 10x

The economics of observational memory come from prompt caching. Anthropic, OpenAI, and other providers reduce token costs by 4-10x for cached prompts versus those that are uncached. Most memory systems can’t take advantage of this because they change the prompt every turn by injecting dynamically retrieved context, which invalidates the cache. For production teams, that instability translates directly into unpredictable cost curves and harder-to-budget agent workloads.

Observational memory keeps the context stable. The observation block is append-only until reflection runs, which means the system prompt and existing observations form a consistent prefix that can be cached across many turns. Messages keep getting appended to the raw history block until the 30,000 token threshold hits. Every turn before that is a full cache hit.

When observation runs, messages are replaced with new observations appended to the existing observation block. The observation prefix stays consistent, so the system still gets a partial cache hit. Only during reflection (which runs infrequently) is the entire cache invalidated.

The average context window size for Mastra’s LongMemEval benchmark run was around 30,000 tokens, far smaller than the full conversation history would require.

Why this differs from traditional compaction

Most coding agents use compaction to manage long context. Compaction lets the context window fill all the way up, then compresses the entire history into a summary when it’s about to overflow. The agent continues, the window fills again, and the process repeats.

Compaction produces documentation-style summaries. It captures the gist of what happened but loses specific events, decisions and details. The compression happens in large batches, which makes each pass computationally expensive. That works for human readability, but it often strips out the specific decisions and tool interactions agents need to act consistently over time.

The Observer, on the other hand, runs more frequently, processing smaller chunks. Instead of summarizing the conversation, it produces an event-based decision log — a structured list of dated, prioritized observations about what specifically happened. Each observation cycle handles less context and compresses it more efficiently.

The log never gets summarized into a blob. Even during reflection, the Reflector reorganizes and condenses the observations to find connections and drop redundant data. But the event-based structure persists. The result reads like a log of decisions and actions, not documentation.

Enterprise use cases: Long-running agent conversations

Mastra’s customers span several categories. Some build in-app chatbots for CMS platforms like Sanity or Contentful. Others create AI SRE systems that help engineering teams triage alerts. Document processing agents handle paperwork for traditional businesses moving toward automation.

What these use cases share is the need for long-running conversations that maintain context across weeks or months. An agent embedded in a content management system needs to remember that three weeks ago the user asked for a specific report format. An SRE agent needs to track which alerts were investigated and what decisions were made.

“One of the big goals for 2025 and 2026 has been building an agent inside their web app,” Bhagwat said about B2B SaaS companies. “That agent needs to be able to remember that, like, three weeks ago, you asked me about this thing, or you said you wanted a report on this kind of content type, or views segmented by this metric.”

In those scenarios, memory stops being an optimization and becomes a product requirement — users notice immediately when agents forget prior decisions or preferences.

Observational memory keeps months of conversation history present and accessible. The agent can respond while remembering the full context, without requiring the user to re-explain preferences or previous decisions.

The system shipped as part of Mastra 1.0 and is available now. The team released plug-ins this week for LangChain, Vercel’s AI SDK, and other frameworks, enabling developers to use observational memory outside the Mastra ecosystem.

What it means for production AI systems

Observational memory offers a different architectural approach than the vector database and RAG pipelines that dominate current implementations. The simpler architecture (text-based, no specialized databases) makes it easier to debug and maintain. The stable context window enables aggressive caching that cuts costs. The benchmark performance suggests that the approach can work at scale.

For enterprise teams evaluating memory approaches, the key questions are:

How much context do your agents need to maintain across sessions?
What’s your tolerance for lossy compression versus full-corpus search?
Do you need the dynamic retrieval that RAG provides, or would stable context work better?
Are your agents tool-heavy, generating large amounts of output that needs compression?

The answers determine whether observational memory fits your use case. Bhagwat positions memory as one of the top primitives needed for high-performing agents, alongside tool use, workflow orchestration, observability, and guardrails. For enterprise agents embedded in products, forgetting context between sessions is unacceptable. Users expect agents to remember their preferences, previous decisions and ongoing work.

“The hardest thing for teams building agents is the production, which can take time,” Bhagwat said. “Memory is a really important bit in that, because it’s just jarring if you use any sort of agentic tool and you sort of told it something and then it just kind of forgot it.”

As agents move from experiments to embedded systems of record, how teams design memory may matter as much as which model they choose.

Data