Yesterday amid a flurry of enterprise AI product updates, Google announced arguably its most significant one for enterprise customers: the public preview availability of Gemini Embedding 2, its new embeddings model — a significant evolution in how machines represent and retrieve information across different media types.
While previous embedding models were largely restricted to text, this new model natively integrates text, images, video, audio, and documents into a single numerical space — reducing latency by as much as 70% for some customers and reducing total cost for enterprises who use AI models powered by their own data to complete business tasks.
VentureBeat collaborator Sam Witteveen, co-founder of AI and ML training company Red Dragon AI, received early access to Gemini Embedding 2 and published a video of his impressions on YouTube. Watch it below:
For those who have encountered the term “embeddings” in AI discussions but find it abstract, a useful analogy is that of a universal library.
In a traditional library, books are organized by metadata: author, title, or genre. In the “embedding space” of an AI, information is organized by ideas.
Imagine a library where books aren’t organized by the Dewey Decimal System, but by their “vibe” or “essence”. In this library, a biography of Steve Jobs would physically fly across the room to sit next to a technical manual for a Macintosh. A poem about a sunset would drift toward a photography book of the Pacific Coast, with all thematically similar content organized in beautiful hovering “clouds” of books. This is basically what an embedding model does.
An embedding model takes complex data—like a sentence, a photo of a sunset, or a snippet of a podcast—and converts it into a long list of numbers called a vector.
These numbers represent coordinates in a high-dimensional map. If two items are “semantically” similar (e.g., a photo of a golden retriever and the text “man’s best friend”), the model places their coordinates very close to each other in this map. Today, these models are the invisible engine behind:
Search Engines: Finding results based on what you mean, not just the specific words you typed.
Recommendation Systems: Netflix or Spotify suggesting content because its “coordinates” are near things you already like.
Enterprise AI: Large companies use them for Retrieval-Augmented Generation (RAG), where an AI assistant “looks up” a company’s internal PDFs to answer an employee’s question accurately.
The concept of mapping words to vectors dates back to the 1950s with linguists like John Rupert Firth, but the modern “vector revolution” began in the early 2000s when Yoshua Bengio’s team first used the term “word embeddings”. The real breakthrough for the industry was Word2Vec, released by a team at Google led by Tomas Mikolov in 2013. Today, the market is led by a handful of major players:
OpenAI: Known for its widely-used text-embedding-3 series.
Google: With the new Gemini and previous Gecko models.
Anthropic and Cohere: Providing specialized models for enterprise search and developer workflows.
By moving beyond text to a natively multimodal architecture, Google is attempting to create a singular, unified map for the sum of human digital expression—text, images, video, audio, and documents—all residing in the same mathematical neighborhood.
Most leading models are still “text-first.” If you want to search a video library, the AI usually has to transcribe the video into text first, then embed that text.
Google’s Gemini Embedding 2 is natively multimodal.
As Logan Kilpatrick of Google DeepMind posted on X, the model allows developers to “bring text, images, video, audio, and docs into the same embedding space”.
It understands audio as sound waves and video as motion directly, without needing to turn them into text first. This reduces “translation” errors and captures nuances that text alone might miss.
For developers and enterprises, the “natively multimodal” nature of Gemini Embedding 2 represents a shift toward more efficient AI pipelines.
By mapping all media into a single 3,072-dimensional space, developers no longer need separate systems for image search and text search; they can perform “cross-modal” retrieval—using a text query to find a specific moment in a video or an image that matches a specific sound.
And unlike its predecessors, Gemini Embedding 2 can process requests that mix modalities. A developer can send a request containing both an image of a vintage car and the text “What is the engine type?”. The model doesn’t process them separately; it treats them as a single, nuanced concept. This allows for a much deeper understanding of real-world data where the “meaning” is often found in the intersection of what we see and what we say.
One of the model’s more technical features is Matryoshka Representation Learning. Named after Russian nesting dolls, this technique allows the model to “nest” the most important information in the first few numbers of the vector.
An enterprise can choose to use the full 3072 dimensions for maximum precision, or “truncate” them down to 768 or 1536 dimensions to save on database storage costs with minimal loss in accuracy.
Gemini Embedding 2 establishes a new performance ceiling for multimodal depth, specifically outperforming previous industry leaders across text, image, and video evaluation tasks.
The model’s most significant lead is found in video and audio retrieval, where its native architecture allows it to bypass the performance degradation typically associated with text-based transcription pipelines.
Specifically, in video-to-text and text-to-video retrieval tasks, the model demonstrates a measurable performance gap over existing industry leaders, accurately mapping motion and temporal data into a unified semantic space.
The technical results show a distinct advantage in the following standardized categories:
Multimodal Retrieval: Gemini Embedding 2 consistently outperforms leading text and vision models in complex retrieval tasks that require understanding the relationship between visual elements and textual queries.
Speech and Audio Depth: The model introduces a new standard for native audio embeddings, achieving higher accuracy in capturing phonetic and tonal intent compared to models that rely on intermediate text-transcription.
Contextual Scaling: In text-based benchmarks, the model maintains high precision while utilizing its expansive 8,192 token context window, ensuring that long-form documents are embedded with the same semantic density as shorter snippets.
Dimension Flexibility: Testing across the Matryoshka Representation Learning (MRL) layers reveals that even when truncated to 768 dimensions, the model retains a significant majority of its 3,072-dimension performance, outperforming fixed-dimension models of similar size.
For the modern enterprise, information is often a fragmented mess. A single customer issue might involve a recorded support call (audio), a screenshot of an error (image), a PDF of a contract (document), and a series of emails (text).
In previous years, searching across these formats required four different pipelines. With Gemini Embedding 2, an enterprise can create a Unified Knowledge Base. This enables a more advanced form of RAG, wherein a company’s internal AI doesn’t just look up facts, but understands the relationship between them regardless of format.
Early partners are already reporting drastic efficiency gains:
Sparkonomy, a creator economy platform, reported that the model’s native multimodality slashed their latency by up to 70%. By removing the need for intermediate LLM “inference” (the step where one model explains a video to another), they nearly doubled their semantic similarity scores for matching creators with brands.
Everlaw, a legal tech firm, is using the model to navigate the “high-stakes setting” of litigation discovery. In legal cases where millions of records must be parsed, Gemini’s ability to index images and videos alongside text allows legal professionals to find “smoking gun” evidence that traditional text-search would miss.
In its announcement, Google was upfront about some of the current limitations of Gemini Embedding 2. The new model can accommodate vectorization of individual files that comprise of as many as 8,192 text tokens, 6 images (in as single batch), 128 seconds of video (2 minutes, 8 seconds long), 80 seconds of native audio (1.34 minutes), and a 6-page PDF.
It is vital to clarify that these are input limits per request, not a cap on what the system can remember or store.
Think of it like a scanner. If a scanner has a limit of “one page at a time,” it doesn’t mean you can only ever scan one page. it means you have to feed the pages in one by one.
Individual File Size: You cannot “embed” a 100-page PDF in a single call. You must “chunk” the document—splitting it into segments of 6 pages or fewer—and send each segment to the model individually.
Cumulative Knowledge: Once those chunks are converted into vectors, they can all live together in your database. You can have a database containing ten million 6-page PDFs, and the model will be able to search across all of them simultaneously.
Video and Audio: Similarly, if you have a 10-minute video, you would break it into 128-second segments to create a searchable “timeline” of embeddings.
As of March 10, 2026, Gemini Embedding 2 is officially in Public Preview.
For developers and enterprise leaders, this means the model is accessible for immediate testing and production integration, though it is still subject to the iterative refinements typical of “preview” software before it reaches General Availability (GA).
The model is deployed across Google’s two primary AI gateways, each catering to a different scale of operation:
Gemini API: Targeted at rapid prototyping and individual developers, this path offers a simplified pricing structure.
Vertex AI (Google Cloud): The enterprise-grade environment designed for massive scale, offering advanced security controls and integration with the broader Google Cloud ecosystem.
It’s also already integrated with the heavy hitters of AI infrastructure: LangChain, LlamaIndex, Haystack, Weaviate, Qdrant, and ChromaDB.
In the Gemini API, Google has introduced a tiered pricing model that distinguishes between “standard” data (text, images, and video) and “native” audio.
The Free Tier: Developers can experiment with the model at no cost, though this tier comes with rate limits (typically 60 requests per minute) and uses data to improve Google’s products.
The Paid Tier: For production-level volume, the cost is calculated per million tokens. For text, image, and video inputs, the rate is $0.25 per 1 million tokens.
The “Audio Premium”: Because the model natively ingests audio data without intermediate transcription—a more computationally intensive task—the rate for audio inputs is doubled to $0.50 per 1 million tokens.
For large-scale deployments on Vertex AI, the pricing follows an enterprise-centric “Pay-as-you-go” (PayGo) model. This allows organizations to pay for exactly what they use across different processing modes:
Flex PayGo: Best for unpredictable, bursty workloads.
Provisioned Throughput: Designed for enterprises that require guaranteed capacity and consistent latency for high-traffic applications.
Batch Prediction: Ideal for re-indexing massive historical archives, where time-sensitivity is lower but volume is extremely high.
By making the model available through these diverse channels and integrating it natively with libraries like LangChain, LlamaIndex, and Weaviate, Google has ensured that the “switching cost” for businesses isn’t just a matter of price, but of operational ease. Whether a startup is building its first RAG-based assistant or a multinational is unifying decades of disparate media archives, the infrastructure is now live and globally accessible.
In addition, the official Gemini API and Vertex AI Colab notebooks, which contain the Python code necessary to implement these features, are licensed under the Apache License, Version 2.0.
The Apache 2.0 license is highly regarded in the tech community because it is “permissive.” It allows developers to take Google’s implementation code, modify it, and use it in their own commercial products without having to pay royalties or “open source” their own proprietary code in return.
For Chief Data Officers and technical leads, the decision to migrate to Gemini Embedding 2 hinges on the transition from a “text-plus” strategy to a “natively multimodal” one.
If your organization currently relies on fragmented pipelines — where images and videos are first transcribed or tagged by separate models before being indexed — the upgrade is likely a strategic necessity.
This model eliminates the “translation tax” of using intermediate LLMs to describe visual or auditory data, a move that partners like Sparkonomy found reduced latency by up to 70% while doubling semantic similarity scores. For businesses managing massive, diverse datasets, this isn’t just a performance boost; it is a structural simplification that reduces the number of points where “meaning” can be lost or distorted.
The effort to switch from a text-only foundation is lower than one might expect due to what early users describe as excellent “API continuity”.
Because the model integrates with industry-standard frameworks like LangChain, LlamaIndex, and Vector Search, it can often be “dropped into” existing workflows with minimal code changes. However, the real cost and energy investment lies in re-indexing. Moving to this model requires re-embedding your existing corpus to ensure all data points exist in the same 3,072-dimensional space.
While this is a one-time computational hurdle, it is the prerequisite for unlocking cross-modal search—where a simple text query can suddenly “see” into your video archives or “hear” specific customer sentiment in call recordings.
The primary trade-off for data leaders to weigh is the balance between high-fidelity retrieval and long-term storage economics. Gemini Embedding 2 addresses this directly through Matryoshka Representation Learning (MRL), which allows you to truncate vectors from 3072 dimensions down to 768 without a linear drop in quality.
This gives CDOs a tactical lever: you can choose maximum precision for high-stakes legal or medical discovery—as seen in Everlaw’s 20% lift in recall—while utilizing smaller, more efficient vectors for lower-priority recommendation engines to keep cloud storage costs in check.
Ultimately, the ROI is found in the “lift” of accuracy; in a landscape where an AI’s value is defined by its context, the ability to natively index a 6-page PDF or 128 seconds of video directly into a knowledge base provides a depth of insight that text-only models simply cannot replicate.
OpenAI on Monday launched a set of interactive visual tools inside ChatGPT that let users manipulate mathematical and scientific formulas in real time — a genuinely impressive education feature that also serves as the company’s most direct attempt yet to change the subject during the worst ten days of its corporate life.
The new experience covers more than 70 core math and science concepts, from the Pythagorean theorem to Ohm’s law to compound interest. When a user asks ChatGPT to explain one of these topics, the chatbot now generates a dynamic module with adjustable sliders alongside its written response. Drag a variable, and the equations, graphs, and diagrams update instantly. The feature is available today to all logged-in users worldwide, across every plan, including free.
OpenAI tells VentureBeat that 140 million people already use ChatGPT each week for math and science learning. That is a staggering number — and it goes a long way toward explaining why the company chose this particular week to ship a product designed to make those users’ experience meaningfully better. Since late February, OpenAI has been sued by the family of a 12-year-old mass shooting victim who alleges the company knew the attacker was planning violence through ChatGPT; lost its head of robotics over a Pentagon deal that triggered a near-300% spike in app uninstalls; watched more than 30 of its own employees file a legal brief supporting rival Anthropic against the U.S. government; and scrapped plans with Oracle to expand a flagship data center in Texas. Its chief competitor’s app, Claude, now sits atop the App Store.
The interactive learning tools are, on their merits, a strong product. But they arrive at a company fighting on every front simultaneously — and burning through an estimated $15 billion in cash this year to do it.
The feature is built on a simple pedagogical premise: students understand formulas better when they can see what happens as the inputs change.
Ask ChatGPT “help me understand the Pythagorean theorem,” and the system now responds with a written explanation alongside an interactive panel. On the left, the formula $a^2 + b^2 = c^2$ appears in clean notation with sliders for sides $a$ and $b$. On the right, a geometric visualization — a right triangle with squares drawn on each side — reshapes dynamically as you adjust the values. The computed hypotenuse updates in real time. The same treatment applies across topics: voltage and resistance for Ohm’s law, pressure and temperature for the ideal gas equation, radius and height for cone volume.
OpenAI’s initial roster of more than 70 topics targets high school and introductory college material: binomial squares, Charles’ law, circle equations, Coulomb’s law, cylinder volume, degrees of freedom, exponential decay, Hooke’s law, kinetic energy, the lens equation, linear equations, slope-intercept form, surface area of a sphere, trigonometric angle sum identities, and others.
The company cited research suggesting that “visual, interaction-based learning can lead to stronger conceptual understanding than traditional instruction for many students,” and pointed to a recent Gallup survey in which more than half of U.S. adults said they struggle with math. In early testing, OpenAI said, students reported the modules helped them grasp how variables relate to one another, and parents described using them to work through problems alongside their children.
Anjini Grover, a high school mathematics teacher quoted in OpenAI’s announcement, said the feature stands out for “how strongly this feature emphasizes conceptual understanding.” Raquel Gibson, a high school algebra teacher, called it “a step towards empowering students to independently explore abstract concepts.”
The tools build on ChatGPT’s existing education features — a “study mode” for step-by-step problem solving and a quizzes feature for exam prep — and OpenAI said it plans to expand interactive learning to additional subjects. The company also said it intends to publish research through its NextGenAI initiative and OpenAI Learning Lab to study how AI shapes learning outcomes over time.
The education launch shares the calendar with the most serious legal challenge OpenAI has ever faced.
On Monday, the mother of 12-year-old Maya Gebala filed a civil lawsuit against OpenAI in B.C. Supreme Court, alleging the company had “specific knowledge of the shooter’s long-range planning of a mass casualty event” through ChatGPT interactions and “took no steps to act upon this knowledge.” Gebala was shot three times during a mass shooting in Tumbler Ridge, British Columbia on February 10 that killed eight people and the 18-year-old attacker. She suffered what the lawsuit describes as a catastrophic traumatic brain injury with permanent cognitive and physical disabilities.
The claim paints a damning picture of how the shooter used ChatGPT. It alleges the platform functioned as a “counsellor, pseudo-therapist, trusted confidante, friend, and ally” and was “intentionally designed to foster psychological dependency between the user and ChatGPT.” The shooter was under 18 when they began using the service, the suit states, and despite OpenAI’s requirement that minors obtain parental consent, the company “took no steps to implement age verification or consent procedures.”
OpenAI has separately acknowledged that it suspended the shooter’s account months before the attack but did not alert Canadian law enforcement — a decision that provoked sharp political fallout. B.C. Premier David Eby said after a virtual meeting with Altman that the CEO agreed to apologize to the people of Tumbler Ridge and work with the provincial government on AI regulation recommendations.
None of the claims have been proven in court. OpenAI has not publicly commented on the lawsuit. But the case poses a question that transcends any single legal proceeding: when an AI company’s own internal systems identify a user as dangerous enough to ban, what obligation does it have to tell someone?
The Tumbler Ridge lawsuit is unfolding against the backdrop of an internal crisis that has already cost OpenAI key talent and millions of users.
On February 28, CEO Sam Altman announced a deal giving the Pentagon access to OpenAI’s AI models inside secure government computing systems. The agreement came days after Anthropic CEO Dario Amodei publicly refused similar terms, saying his company could not proceed without assurances against autonomous weapons and mass domestic surveillance. The Pentagon responded by designating Anthropic a “supply-chain risk” — a classification normally reserved for foreign adversaries — and Defense Secretary Pete Hegseth barred any military contractor from conducting commercial activity with the company.
The reaction inside OpenAI was immediate. Caitlin Kalinowski, who joined from Meta in 2024 to build out the company’s robotics hardware division, resigned on principle. “AI has an important role in national security,” she wrote publicly. “But surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got.” Research scientist Aidan McLaughlin wrote on social media that he “personally don’t think this deal was worth it.” Another employee told CNN that many OpenAI staffers “really respect” Anthropic for walking away.
The reaction outside the company was even more dramatic. ChatGPT uninstalls spiked more than 295% on the day the deal was announced. Anthropic’s Claude surged to No. 1 among free apps on the U.S. Apple App Store and remained there as of this past weekend. Protesters gathered outside OpenAI’s San Francisco headquarters calling for a “QuitGPT” movement.
And in the most extraordinary development, more than 30 OpenAI and Google DeepMind employees — including DeepMind chief scientist Jeff Dean — filed an amicus brief Monday supporting Anthropic’s lawsuit against the Defense Department. The brief argued that the Pentagon’s actions, “if allowed to proceed,” would “undoubtedly have consequences for the United States’ industrial and scientific competitiveness in the field of artificial intelligence and beyond.” The employees signed in their personal capacity, but the spectacle of OpenAI’s own researchers rallying to a competitor’s legal defense against the same government their company just partnered with has no real precedent in the industry.
Altman, to his credit, has not pretended the situation is fine. In an internal memo later shared publicly, he admitted the deal “was definitely rushed” and “just looked opportunistic and sloppy.” He revised the contract to include explicit prohibitions against mass domestic surveillance and the use of OpenAI technology on commercially acquired data. He also publicly said that enforcing the supply-chain risk designation against Anthropic “would be very bad for our industry and our country.”
Meanwhile, Anthropic warned in court filings that the Pentagon’s blacklisting could cost it up to $5 billion in lost business — roughly equivalent to its total revenue since commercializing its AI technology in 2023. The company is seeking a temporary court order to continue working with military contractors while the case proceeds.
Strip away the lawsuits and the politics, and OpenAI still has a math problem of its own.
The company is expected to burn through approximately $15 billion in cash this year, up from $9 billion in 2025. It has roughly 910 million weekly users. About 95% of them pay nothing. Subscriptions alone cannot bridge that gap, which is why OpenAI is simultaneously building out an internal advertising infrastructure and leaning on partners like Criteo — and reportedly The Trade Desk — to bring advertisers into ChatGPT.
The company is hiring aggressively for this effort: a monetization infrastructure engineer, an engineering manager, a product designer for the ads experience, a senior manager for ad revenue accounting, and a trust and safety specialist dedicated to the ads product, all based at headquarters in San Francisco. The compensation bands run as high as $385,000 — the kind of investment a company makes when it plans to own its ad stack, not rent it.
But advertising inside ChatGPT introduces a trust problem that compounds the ones OpenAI is already managing. Users who abandoned the app over the Pentagon deal demonstrated that loyalty to ChatGPT is thinner than its market share suggests. Adding commercial messages to a product already under fire for its military ties and its handling of a mass shooter’s data will require OpenAI to navigate user sentiment with a precision it has not recently demonstrated.
The infrastructure picture is equally unsettled. Oracle and OpenAI recently scrapped plans to expand a flagship AI data center in Abilene, Texas, after negotiations stalled over financing and OpenAI’s evolving needs. Meta and Nvidia moved quickly to explore the site — a reminder that in the current AI arms race, any gap in execution gets filled by a competitor within days.
This is where the education feature becomes more than a product announcement.
Education has always been ChatGPT’s cleanest use case — the application where the technology most obviously augments human capability rather than surveilling it, weaponizing it, or monetizing the attention of people who came looking for help. It is the use case that resonates across demographics: students prepping for the SAT, parents revisiting algebra at the kitchen table, adults circling back to concepts they never quite understood. And it is the use case where ChatGPT still holds a clear lead. Google’s Gemini, Anthropic’s Claude, and xAI’s Grok are all investing in education, but none has shipped anything comparable to real-time interactive formula visualization embedded in a conversational interface.
OpenAI acknowledged that the “research landscape on how AI affects learning is still taking shape,” but pointed to its own early findings on study mode as showing “promising early signals.” The company said it will continue working with educators and researchers through its NextGenAI initiative and OpenAI Learning Lab, and plans to publish findings and expand into additional subjects.
Somewhere tonight, a ninth-grader will open ChatGPT, drag a slider, and watch a hypotenuse lengthen across her screen. The Pythagorean theorem will make sense for the first time. She will not know about the Pentagon deal, or the Tumbler Ridge lawsuit, or the 295% spike in uninstalls, or the $15 billion cash burn underwriting the server that just rendered her triangle. She will only know that it worked. For OpenAI, that may have to be enough — for now.
Anthropic on Monday released Code Review, a multi-agent code review system built into Claude Code that dispatches teams of AI agents to scrutinize every pull request for bugs that human reviewers routinely miss. The feature, now available in research preview for Team and Enterprise customers, arrives on what may be the most consequential day in the company’s history: Anthropic simultaneously filed lawsuits against the Trump administration over a Pentagon blacklisting, while Microsoft announced a new partnership embedding Claude into its Microsoft 365 Copilot platform.
The convergence of a major product launch, a federal legal battle, and a landmark distribution deal with the world’s largest software company captures the extraordinary tension defining Anthropic’s current moment. The San Francisco-based AI lab is simultaneously trying to grow a developer tools business approaching $2.5 billion in annualized revenue, defend itself against an unprecedented government designation as a national security threat, and expand its commercial footprint through the very cloud platforms now navigating the fallout.
Code Review is Anthropic’s most aggressive bet yet that engineering organizations will pay significantly more — $15 to $25 per review — for AI-assisted code quality assurance that prioritizes thoroughness over speed. It also signals a broader strategic pivot: the company isn’t just building models, it’s building opinionated developer workflows around them.
Code Review works differently from the lightweight code review tools most developers are accustomed to. When a developer opens a pull request, the system dispatches multiple AI agents that operate in parallel. These agents independently search for bugs, then cross-verify each other’s findings to filter out false positives, and finally rank the remaining issues by severity. The output appears as a single overview comment on the PR along with inline annotations for specific bugs.
Anthropic designed the system to scale dynamically with the complexity of the change. Large or intricate pull requests receive more agents and deeper analysis; trivial changes get a lighter pass. The company says the average review takes approximately 20 minutes — far slower than the near-instant feedback of tools like GitHub Copilot’s built-in review, but deliberately so.
“We built Code Review based on customer and internal feedback,” an Anthropic spokesperson told VentureBeat. “In our testing, we’ve found it provides high-value feedback and has helped catch bugs that we may have missed otherwise. Developers and engineering teams use a range of tools, and we build for that reality. The goal is to give teams a capable option at every stage of the development process.”
The system emerged from Anthropic’s own engineering practices, where the company says code output per engineer has grown 200% over the past year. That surge in AI-assisted code generation created a review bottleneck that the company says it now hears about from customers on a weekly basis. Before Code Review, only 16% of Anthropic’s internal PRs received substantive review comments. That figure has jumped to 54%.
Crucially, Code Review does not approve pull requests. That decision remains with human reviewers. Instead, the system functions as a force multiplier, surfacing issues so that human reviewers can focus on architectural decisions and higher-order concerns rather than line-by-line bug hunting.
The pricing will draw immediate scrutiny. At $15 to $25 per review, billed on token usage and scaling with PR size, Code Review is substantially more expensive than alternatives. GitHub Copilot offers code review natively as part of its existing subscription, and startups like CodeRabbit operate at significantly lower price points. Anthropic’s more basic code review GitHub Action — which remains open source — is itself a lighter-weight and cheaper option.
Anthropic frames the cost not as a productivity expense but as an insurance product. “For teams shipping to production, the cost of a shipped bug dwarfs $20/review,” the company’s spokesperson told VentureBeat. “A single production incident — a rollback, a hotfix, an on-call page — can cost more in engineer hours than a month of Code Review. Code Review is an insurance product for code quality, not a productivity tool for churning through PRs faster.”
That framing is deliberate and revealing. Rather than competing on speed or price — the dimensions where lightweight tools have an advantage — Anthropic is positioning Code Review as a depth-first tool aimed at engineering leaders who manage production risk. The implicit argument is that the real cost comparison isn’t Code Review versus CodeRabbit, but Code Review versus the fully loaded cost of a production outage, including engineer time, customer impact, and reputational damage.
Whether that argument holds up will depend on the data. Anthropic has not yet published external benchmarks comparing Code Review’s bug-detection rates against competitors, and the spokesperson did not provide specific figures on bugs caught per dollar or developer hours saved when asked directly. For engineering leaders evaluating the tool, that gap in publicly available comparative data may slow adoption, even if the theoretical ROI case is compelling.
Anthropic’s internal usage data provides an early window into the system’s performance characteristics. On large pull requests exceeding 1,000 lines changed, 84% receive findings, averaging 7.5 issues per review. On small PRs under 50 lines, that drops to 31% with an average of 0.5 issues. The company reports that less than 1% of findings are marked incorrect by engineers.
That sub-1% figure is the kind of stat that demands careful unpacking. When asked how “marked incorrect” is defined, the Anthropic spokesperson explained that it means “an engineer actively resolving the comment without fixing it. We’ll continue to monitor feedback and engagement while Code Review is in research preview.”
The methodology matters. This is an opt-in disagreement metric — an engineer has to take the affirmative step of dismissing a finding. In practice, developers under time pressure may simply ignore irrelevant findings rather than actively marking them as wrong, which would cause false positives to go uncounted. Anthropic acknowledged the limitation implicitly by noting the system is in research preview and that it will continue monitoring engagement data. The company has not yet conducted or published a controlled evaluation comparing agent findings against a ground-truth baseline established by expert human reviewers.
The anecdotal evidence is nonetheless striking. Anthropic described a case where a one-line change to a production service — the kind of diff that typically receives a cursory approval — was flagged as critical by Code Review because it would have broken authentication for the service. In another example involving TrueNAS’s open-source middleware, Code Review surfaced a pre-existing bug in adjacent code during a ZFS encryption refactor: a type mismatch that was silently wiping the encryption key cache on every sync. These are precisely the categories of bugs — latent issues in touched-but-unchanged code, and subtle behavioral changes hiding in small diffs — that human reviewers are statistically most likely to miss.
The Code Review launch does not exist in a vacuum. On the same day, Anthropic filed two lawsuits — one in the U.S. District Court for the Northern District of California and another in the D.C. Circuit Court of Appeals — challenging the Trump administration’s decision to label the company a supply chain risk to national security, a designation historically reserved for foreign adversaries.
The legal confrontation stems from a breakdown in contract negotiations between Anthropic and the Pentagon. As CNN reported, the Defense Department wanted unrestricted access to Claude for “all lawful purposes,” while Anthropic insisted on two redlines: that its AI would not be used for fully autonomous weapons or mass domestic surveillance. When talks collapsed by a Pentagon-set deadline on February 27, President Trump directed all federal agencies to cease using Anthropic’s technology, and Defense Secretary Pete Hegseth formally designated the company a supply chain risk.
According to CNBC, the complaint alleges that these actions are “unprecedented and unlawful” and are “harming Anthropic irreparably,” with the company stating that contracts are already being cancelled and “hundreds of millions of dollars” in near-term revenue are in jeopardy.
“Seeking judicial review does not change our longstanding commitment to harnessing AI to protect our national security,” the Anthropic spokesperson told VentureBeat, “but this is a necessary step to protect our business, our customers, and our partners. We will continue to pursue every path toward resolution, including dialogue with the government.”
For enterprise buyers evaluating Code Review and other Claude-based tools, the lawsuit introduces a novel category of vendor risk. The supply chain risk designation doesn’t just affect Anthropic’s government contracts — as CNBC reported, it requires defense contractors to certify they don’t use Claude in their Pentagon-related work. That creates a chilling effect that could extend well beyond the defense sector, even as the company’s commercial momentum accelerates.
The market’s response to the Pentagon crisis has been notably bifurcated. While the government moved to isolate Anthropic, the company’s three largest cloud distribution partners moved in the opposite direction.
Microsoft on Monday announced it is integrating Claude into Microsoft 365 Copilot through a new product called Copilot Cowork, developed in close collaboration with Anthropic. As Yahoo Finance reported, the service enables enterprise users to perform tasks like building presentations, pulling data into Excel spreadsheets, and coordinating meetings — the kind of agentic productivity capabilities that sent shares of SaaS companies like Salesforce, ServiceNow, and Intuit tumbling when Anthropic first debuted its Cowork product on January 30.
The timing is not coincidental. As TechCrunch reported last week, Microsoft, Google, and Amazon Web Services all confirmed that Claude remains available to their customers for non-defense workloads. Microsoft’s legal team specifically concluded that “Anthropic products, including Claude, can remain available to our customers — other than the Department of War — through platforms such as M365, GitHub, and Microsoft’s AI Foundry.”
That three of the world’s most powerful technology companies publicly reaffirmed their commitment to distributing Anthropic’s models — on the same day the company sued the federal government — tells enterprise customers something important about the market’s assessment of both Claude’s technical value and the legal durability of the supply chain risk designation.
For organizations considering Code Review, the data handling question looms especially large. The system necessarily ingests proprietary source code to perform its analysis. Anthropic’s spokesperson addressed this directly: “Anthropic does not train models on our customers’ data. This is part of why customers in highly regulated industries, from Novo Nordisk to Intuit, trust us to deploy AI safely and effectively.”
The spokesperson did not detail specific retention policies or compliance certifications when asked, though the company’s reference to pharmaceutical and financial services clients suggests it has undergone the kind of security review those industries require.
Administrators get several controls for managing costs and scope, including monthly organization-wide spending caps, repository-level enablement, and an analytics dashboard tracking PRs reviewed, acceptance rates, and total costs. Once enabled, reviews run automatically on new pull requests with no per-developer configuration required.
The revenue figure Anthropic confirmed — a $2.5 billion run rate as of February 12 for Claude Code — underscores just how quickly developer tooling has become a material revenue line for the company. The spokesperson pointed to Anthropic’s recent Series G fundraise for additional context but did not break out what share of total company revenue Claude Code now represents.
Code Review is available now in research preview for Claude Code Team and Enterprise plans. Whether it can justify its premium in a market already crowded with cheaper alternatives will depend on whether Anthropic can convert anecdotal bug catches and internal usage stats into the kind of rigorous, externally validated evidence that engineering leaders with production budgets require — all while navigating a legal and political environment unlike anything the AI industry has previously faced.
If you’ve been considering a switch from traditional cable, T-Mobile 5G Home Internet’s newest promotion may be the most compelling reason yet to make the move. The carrier is offering new customers up to $300 back depending on the plan you choose. Com…
Most enterprise RAG pipelines are optimized for one search behavior. They fail silently on the others. A model trained to synthesize cross-document reports handles constraint-driven entity search poorly. A model tuned for simple lookup tasks falls apart on multi-step reasoning over internal notes. Most teams find out when something breaks.
Databricks set out to fix that with KARL, short for Knowledge Agents via Reinforcement Learning. The company trained an agent across six distinct enterprise search behaviors simultaneously using a new reinforcement learning algorithm. The result, the company claims, is a model that matches Claude Opus 4.6 on a purpose-built benchmark at 33% lower cost per query and 47% lower latency, trained entirely on synthetic data the agent generated itself with no human labeling required. That comparison is based on KARLBench, which Databricks built to evaluate enterprise search behaviors.
“A lot of the big reinforcement learning wins that we’ve seen in the community in the past year have been on verifiable tasks where there is a right and a wrong answer,” Jonathan Frankle, Chief AI Scientist at Databricks, told VentureBeat in an exclusive interview. “The tasks that we’re working on for KARL, and that are just normal for most enterprises, are not strictly verifiable in that same way.”
Those tasks include synthesizing intelligence across product manager meeting notes, reconstructing competitive deal outcomes from fragmented customer records, answering questions about account history where no single document has the full answer and generating battle cards from unstructured internal data. None of those has a single correct answer that a system can check automatically.
“Doing reinforcement learning in a world where you don’t have a strict right and wrong answer, and figuring out how to guide the process and make sure reward hacking doesn’t happen — that’s really non-trivial,” Frankle said. “Very little of what companies do day to day on knowledge tasks are verifiable.”
Standard RAG breaks down on ambiguous, multi-step queries drawing on fragmented internal data that was never designed to be queried.
To evaluate KARL, Databricks built the KARLBench benchmark to measure performance across six enterprise search behaviors: constraint-driven entity search, cross-document report synthesis, long-document traversal with tabular numerical reasoning, exhaustive entity retrieval, procedural reasoning over technical documentation and fact aggregation over internal company notes. That last task is PMBench, built from Databricks’ own product manager meeting notes — fragmented, ambiguous and unstructured in ways that frontier models handle poorly.
Training on any single task and testing on the others produces poor results. The KARL paper shows that multi-task RL generalizes in ways single-task training does not. The team trained KARL on synthetic data for two of the six tasks and found it performed well on all four it had never seen.
To build a competitive battle card for a financial services customer, for example, the agent has to identify relevant accounts, filter for recency, reconstruct past competitive deals and infer outcomes — none of which is labeled anywhere in the data.
Frankle calls what KARL does “grounded reasoning”: running a difficult reasoning chain while anchoring every step in retrieved facts. “You can think of this as RAG,” he said, “but like RAG plus plus plus plus plus plus, all the way up to 200 vector database calls.”
KARL’s training is powered by OAPL, short for Optimal Advantage-based Policy Optimization with Lagged Inference policy. It’s a new approach, developed jointly by researchers from Cornell, Databricks and Harvard and published in a separate paper the week before KARL.
Standard LLM reinforcement learning uses on-policy algorithms like GRPO (Group Relative Policy Optimization), which assume the model generating training data and the model being updated are in sync. In distributed training, they never are. Prior approaches corrected for this with importance sampling, introducing variance and instability. OAPL embraces the off-policy nature of distributed training instead, using a regression objective that stays stable with policy lags of more than 400 gradient steps, 100 times more off-policy than prior approaches handled. In code generation experiments, it matched a GRPO-trained model using roughly three times fewer training samples.
OAPL’s sample efficiency is what keeps the training budget accessible. Reusing previously collected rollouts rather than requiring fresh on-policy data for every update meant the full KARL training run stayed within a few thousand GPU hours. That is the difference between a research project and something an enterprise team can realistically attempt.
There has been a lot of discussion in the industry in recent months about how RAG can be replaced with contextual memory, also sometimes referred to as agentic memory.
For Frankle, it’s not an either/or discussion, rather he sees it as a layered stack. A vector database with millions of entries sits at the base, which is too large for context. The LLM context window sits at the top. Between them, compression and caching layers are emerging that determine how much of what an agent has already learned it can carry forward.
For KARL, this is not abstract. Some KARLBench tasks required 200 sequential vector database queries, with the agent refining searches, verifying details and cross-referencing documents before committing to an answer, exhausting the context window many times over. Rather than training a separate summarization model, the team let KARL learn compression end-to-end through RL: when context grows too large, the agent compresses it and continues, with the only training signal being the reward at the end of the task. Removing that learned compression dropped accuracy on one benchmark from 57% to 39%.
“We just let the model figure out how to compress its own context,” Frankle said. “And this worked phenomenally well.”
Frankle was candid about the failure modes. KARL struggles most on questions with significant ambiguity, where multiple valid answers exist and the model can’t determine whether the question is genuinely open-ended or just hard to answer. That judgment call is still an unsolved problem.
The model also exhibits what Frankle described as giving up early on some queries — stopping before producing a final answer. He pushed back on framing this as a failure, noting that the most expensive queries are typically the ones the model gets wrong anyway. Stopping is often the right call.
KARL was also trained and evaluated exclusively on vector search. Tasks requiring SQL queries, file search, or Python-based calculation are not yet in scope. Frankle said those capabilities are next on the roadmap, but they are not in the current system.
KARL surfaces three decisions worth revisiting for teams evaluating their retrieval infrastructure.
The first is pipeline architecture. If your RAG agent is optimized for one search behavior, the KARL results suggest it is failing on others. Multi-task training across diverse retrieval behaviors produces models that generalize. Narrow pipelines do not.
The second is why RL matters here — and it’s not just a training detail. Databricks tested the alternative: distilling from expert models via supervised fine-tuning. That approach improved in-distribution performance but produced negligible gains on tasks the model had never seen. RL developed general search behaviors that transferred. For enterprise teams facing heterogeneous data and unpredictable query types, that distinction is the whole game.
The third is what RL efficiency actually means in practice. A model trained to search better completes tasks in fewer steps, stops earlier on queries it cannot answer, diversifies its search rather than repeating failed queries, and compresses its own context rather than running out of room. The argument for training purpose-built search agents rather than routing everything through general-purpose frontier APIs is not primarily about cost. It is about building a model that knows how to do the job.
When an OpenAI finance analyst needed to compare revenue across geographies and customer cohorts last year, it took hours of work — hunting through 70,000 datasets, writing SQL queries, verifying table schemas. Today, the same analyst types a plain-English question into Slack and gets a finished chart in minutes.
The tool behind that transformation was built by two engineers in three months. Seventy percent of its code was written by AI. And it is now used by more than 4,000 of OpenAI’s roughly 5,000 employees every day — making it one of the most aggressive deployments of an AI data agent inside any company, anywhere.
In an exclusive interview with VentureBeat, Emma Tang, the head of data infrastructure at OpenAI whose team built the agent, offered a rare look inside the system — how it works, how it fails, and what it signals about the future of enterprise data. The conversation, paired with the company’s blog post announcing the tool, paints a picture of a company that turned its own AI on itself and discovered something that every enterprise will soon confront: the bottleneck to smarter organizations isn’t better models. It’s better data.
“The agent is used for any kind of analysis,” Tang said. “Almost every team in the company uses it.”
To understand why OpenAI built this system, consider the scale of the problem. The company’s data platform spans more than 600 petabytes across 70,000 datasets. Even locating the correct table can consume hours of a data scientist’s time. Tang’s Data Platform team — which sits under infrastructure and oversees big data systems, streaming, and the data tooling layer — serves a staggering internal user base. “There are 5,000 employees at OpenAI right now,” Tang said. “Over 4,000 use data tools that our team provides.”
The agent, built on GPT-5.2 and accessible wherever employees already work — Slack, a web interface, IDEs, the Codex CLI, and OpenAI’s internal ChatGPT app — accepts plain-English questions and returns charts, dashboards, and long-form analytical reports. In follow-up responses shared with VentureBeat on background, the team estimated it saves two to four hours of work per query. But Tang emphasized that the larger win is harder to measure: the agent gives people access to analysis they simply couldn’t have done before, regardless of how much time they had.
“Engineers, growth, product, as well as non-technical teams, who may not know all the ins and outs of the company data systems and table schemas” can now pull sophisticated insights on their own, her team noted.
Tang walked through several concrete use cases that illustrate the agent’s range. OpenAI’s finance team queries it for revenue comparisons across geographies and customer cohorts. “It can, just literally in plain text, send the agent a query, and it will be able to respond and give you charts and give you dashboards, all of these things,” she said.
But the real power lies in strategic, multi-step analysis. Tang described a recent case where a user spotted discrepancies between two dashboards tracking Plus subscriber growth. “The data agent can give you a chart and show you, stack rank by stack rank, exactly what the differences are,” she said. “There turned out to be five different factors. For a human, that would take hours, if not days, but the agent can do it in a few minutes.”
Product managers use it to understand feature adoption. Engineers use it to diagnose performance regressions — asking, for instance, whether a specific ChatGPT component really is slower than yesterday, and if so, which latency components explain the change. The agent can break it all down and compare prior periods from a single prompt.
What makes this especially unusual is that the agent operates across organizational boundaries. Most enterprise AI agents today are siloed within departments — a finance bot here, an HR bot there. OpenAI’s cuts horizontally across the company. Tang said they launched department by department, curating specific memory and context for each group, but “at some point it’s all in the same database.” A senior leader can combine sales data with engineering metrics and product analytics in a single query. “That’s a really unique feature of ours,” Tang said.
Finding the right table among 70,000 datasets is, by Tang’s own admission, the single hardest technical challenge her team faces. “That’s the biggest problem with this agent,” she said. And it’s where Codex — OpenAI’s AI coding agent — plays its most inventive role.
Codex serves triple duty in the system. Users access the data agent through Codex via MCP. The team used Codex to generate more than 70% of the agent’s own code, enabling two engineers to ship in three months. But the third role is the most technically fascinating: a daily asynchronous process where Codex examines important data tables, analyzes the underlying pipeline code, and determines each table’s upstream and downstream dependencies, ownership, granularity, join keys, and similar tables.
“We give it a prompt, have Codex look at the code and respond with what we need, and then persist that to the database,” Tang explained. When a user later asks about revenue, the agent searches a vector database to find which tables Codex has already mapped to that concept.
This “Codex Enrichment” is one of six context layers the agent uses. The layers range from basic schema metadata and curated expert descriptions to institutional knowledge pulled from Slack, Google Docs, and Notion, plus a learning memory that stores corrections from previous conversations. When no prior information exists, the agent falls back to live queries against the data warehouse.
The team also tiers historical query patterns. “All query history is everybody’s ‘select star, limit 10.’ It’s not really helpful,” Tang said. Canonical dashboards and executive reports — where analysts invested significant effort determining the correct representation — get flagged as “source of truth.” Everything else gets deprioritized.
Even with six context layers, Tang was remarkably candid about the agent’s biggest behavioral flaw: overconfidence. It’s a problem anyone who has worked with large language models will recognize.
“It’s a really big problem, because what the model often does is feel overconfident,” Tang said. “It’ll say, ‘This is the right table,’ and just go forth and start doing analysis. That’s actually the wrong approach.”
The fix came through prompt engineering that forces the agent to linger in a discovery phase. “We found that the more time it spends gathering possible scenarios and comparing which table to use — just spending more time in the discovery phase — the better the results,” she said. The prompt reads almost like coaching a junior analyst: “Before you run ahead with this, I really want you to do more validation on whether this is the right table. So please check more sources before you go and create actual data.”
The team also learned, through rigorous evaluation, that less context can produce better results. “It’s very easy to dump everything in and just expect it to do better,” Tang said. “From our evals, we actually found the opposite. The fewer things you give it, and the more curated and accurate the context is, the better the results.”
To build trust, the agent streams its intermediate reasoning to users in real time, exposes which tables it selected and why, and links directly to underlying query results. Users can interrupt the agent mid-analysis to redirect it. The system also checkpoints its progress, enabling it to resume after failures. And at the end of every task, the model evaluates its own performance. “We ask the model, ‘how did you think that went? Was that good or bad?'” Tang said. “And it’s actually fairly good at evaluating how well it’s doing.”
When it comes to safety, Tang took a pragmatic approach that may surprise enterprises expecting sophisticated AI alignment techniques.
“I think you just have to have even more dumb guardrails,” she said. “We have really strong access control. It’s always using your personal token, so whatever you have access to is only what you have access to.”
The agent operates purely as an interface layer, inheriting the same permissions that govern OpenAI’s data. It never appears in public channels — only in private channels or a user’s own interface. Write access is restricted to a temporary test schema that gets wiped periodically and can’t be shared. “We don’t let it randomly write to systems either,” Tang said.
User feedback closes the loop. Employees flag incorrect results directly, and the team investigates. The model’s self-evaluation adds another check. Longer term, Tang said, the plan is to move toward a multi-agent architecture where specialized agents monitor and assist each other. “We’re moving towards that eventually,” she said, “but right now, even as it is, we’ve gotten pretty far.”
Despite the obvious commercial potential, OpenAI told VentureBeat that the company has no plans to productize its internal data agent. The strategy is to provide building blocks and let enterprises construct their own. And Tang made clear that everything her team used to build the system is already available externally.
“We use all the same APIs that are available externally,” she said. “The Responses API, the Evals API. We don’t have a fine-tuned model. We just use 5.2. So you can definitely build this.”
That message aligns with OpenAI’s broader enterprise push. The company launched OpenAI Frontier in early February, an end-to-end platform for enterprises to build and manage AI agents. It has since enlisted McKinsey, Boston Consulting Group, Accenture, and Capgemini to help sell and implement the platform. AWS and OpenAI are jointly developing a Stateful Runtime Environment for Amazon Bedrock that mirrors some of the persistent context capabilities OpenAI built into its data agent. And Apple recently integrated Codex directly into Xcode.
According to information shared with VentureBeat by OpenAI, Codex is now used by 95% of engineers at OpenAI and reviews all pull requests before they’re merged. Its global weekly active user base has tripled since the start of the year, surpassing one million. Overall usage has grown more than fivefold.
Tang described a shift in how employees use Codex that transcends coding entirely. “Codex isn’t even a coding tool anymore. It’s much more than that,” she said. “I see non-technical teams use it to organize thoughts and create slides and to create daily summaries.” One of her engineering managers has Codex review her notes each morning, identify the most important tasks, pull in Slack messages and DMs, and draft responses. “It’s really operating on her behalf in a lot of ways,” Tang said.
When asked what other enterprises should take away from OpenAI’s experience, Tang didn’t point to model capabilities or clever prompt engineering. She pointed to something far more mundane.
“This is not sexy, but data governance is really important for data agents to work well,” she said. “Your data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to crawl.”
The underlying infrastructure — storage, compute, orchestration, and business intelligence layers — hasn’t been replaced by the agent. It still needs all of those tools to do its job. But it serves as a fundamentally new entry point for data intelligence, one that is more autonomous and accessible than anything that came before it.
Tang closed the interview with a warning for companies that hesitate. “Companies that adopt this are going to see the benefits very rapidly,” she said. “And companies that don’t are going to fall behind. It’s going to pull apart. The companies who use it are going to advance very, very quickly.”
Asked whether that acceleration worried her own colleagues — especially after a wave of recent layoffs at companies like Block — Tang paused. “How much we’re able to do as a company has accelerated,” she said, “but it still doesn’t match our ambitions, not even one bit.”
Gong, the revenue intelligence company that has spent a decade turning recorded sales calls into data, today launched what it calls Mission Andromeda — its most ambitious platform release to date, bundling a new AI-powered coaching product, a sales-focused chatbot, unified account management tools, and open interoperability with rival AI systems through the Model Context Protocol.
The release arrives at a pivotal moment. The revenue technology market is consolidating at a pace that would have been unthinkable two years ago, and Gong — still a private company with roughly $300 million in annual recurring revenue — finds itself at the center of a category that Gartner only formally defined three months ago. Mission Andromeda is Gong’s answer to a basic question facing every enterprise AI vendor in 2026: Can you move beyond surfacing insights and actually change how people work?
“The whole show, Andromeda, is basically a collection of very significant capabilities that take us a huge step forward,” Eilon Reshef, Gong’s co-founder and chief product officer, told VentureBeat in an interview ahead of the launch. He described it as an effort to make revenue teams “more productive as individuals” and to give leaders “better decisions” — positioning the release not as a feature dump, but as an operating system upgrade.
Mission Andromeda contains four main components, each targeting a different layer of the sales workflow.
The headliner is Gong Enable, a brand-new product with its own pricing tier — Reshef described it as “in the tens of dollars per seat per month” — that attacks what the company sees as a gaping hole in most sales organizations: the disconnect between training and performance. Highspot and Seismic announced their intent to merge in February 2026, creating a combined enablement giant, and Gong is now moving directly onto their turf.
Gong Enable has three pieces. The first, AI Call Reviewer, analyzes completed customer calls and grades reps based on their organization’s own methodology. When asked whether this operates in real time, Reshef was direct: “For that particular agent, it’s post-call, because obviously you want to grade the whole call as a whole — maybe you didn’t do anything in minute one, minute 30.” The second piece, AI Trainer, lets reps practice high-stakes conversations — pricing objections, renewal risk scenarios — against AI-generated simulations built from the company’s own winning call patterns. The third, Initiative Tracking, links coaching programs to revenue metrics so leaders can see whether new behaviors actually show up in live deals.
Beyond Enable, the launch includes Gong Assistant, a conversational AI chatbot purpose-built for revenue teams that lets users ask questions about customer calls inside the platform. The release also introduces Account Console and Account Boards, which unify customer activity, risk signals, and next steps into a single view for sales and post-sales teams. And rounding out the package is built-in support for the Model Context Protocol, the open standard originally developed by Anthropic, enabling Gong to exchange data with AI systems from Microsoft, Salesforce, HubSpot, and others.
In a market where every company wants to claim proprietary AI supremacy, Reshef described a notably pragmatic approach to the models powering the new features. Gong uses both internal models and foundation models from external providers, he said, noting that “four out of the five leading AI companies, LLM, are basically Gong customers.”
The company picks models task by task. “Based on the product or task at hand, we pick the right model,” he said. “We would sometimes swap in and out a model if we feel it’s best for our customers and they get more and more power.” Reshef drew a clear line between what needs a large language model and what does not: “Our revenue prediction models are not using LLMs, but kind of the core interaction chatbots — of course, you’re going to use the foundation model.”
This approach contrasts with competitors that have hitched their wagon to a single AI provider. It also reflects a philosophical choice: Gong’s real moat, Reshef suggested, is not the models themselves but the data underneath — what the company calls the Revenue Graph, its proprietary layer that captures phone calls, Zoom meetings, emails, text messages, WhatsApp conversations, and more, stitching them together into a connected intelligence layer.
Storing and analyzing every customer conversation a sales team has raises obvious questions about privacy and data governance. Reshef was eager to address them head-on.
“We’ve been around the block for a long while — a little bit over a decade — with AI first,” he said. “Over the years, we’ve developed exactly those capabilities that are the most boring pieces of AI, which is: how do you collect the right data? How do you manage it? How do you manage permissions about it, retention policies, right to be forgotten?”
On the sensitive question of whether Gong trains its AI on customer data across accounts, Reshef drew a firm boundary. Training, he explained, happens per customer: “The majority of the training happens based on each customer’s data.” He pointed to large accounts like Cisco, which he said has 20,000 Gong users — enough data to train the AI Trainer from within their own environment. “AI Trainer can go mine what’s working in their environment. It might not work in their competitor’s environment — maybe their benefits are different, their objections are different.”
Cross-customer training, he said, happens “only in very, very rare cases, very safe based — like transcription. But we don’t do it for business-specific processes.”
Gong’s support for Model Context Protocol is perhaps the most strategically significant piece of the launch. The company now offers built-in client and server support for MCP, enabling organizations to connect Gong with other AI systems while maintaining clear controls over data access, usage, and provenance. Gong first announced MCP support in October 2025 at its Celebrate conference, where it revealed initial integrations with Microsoft Dynamics 365, Microsoft 365 Copilot, Salesforce Agentforce, and HubSpot CRM. Today’s launch builds on that foundation.
But Reshef did not sugarcoat MCP’s limitations. “MCP is very immature when it comes to security,” he told VentureBeat. The protocol lets enterprise AI systems share data and context, but trust remains the enterprise’s responsibility. He explained a two-sided model: Gong can pull data from partners like Zendesk through certified integrations, and simultaneously makes its own MCP server available so that tools like Microsoft Copilot can query Gong’s data. “It’s up to the company which connections they actually feel are secure enough,” he said. “The safest ones are the ones that we’ve kind of like certified in a way. But MCP is an open protocol. They can connect it to their own systems. We have no control over this.”
That candor matters. As MCP adoption accelerates across the enterprise software stack, security teams are scrambling to understand what happens when agentic AI systems start talking to each other without humans in the loop. Gong appears to be betting that transparency about the protocol’s immaturity will build more trust than marketing bravado.
When asked for hard numbers, Reshef offered a mix of platform-wide results and measured candor about the newest features. Existing Gong customers report roughly a 50 percent reduction in sales rep ramp time and 10 to 15 percent improvements in win rates, he said.
But on Gong Enable specifically, he acknowledged the product is still brand new. “The trainer has been in the market for literally, you know, days, a week,” he said. “I would probably lie to you if I said, ‘Hey, we’re already seeing people crushing it after taking three or four courses.'” For the earlier version of Enable that includes the AI Call Reviewer, however, he said customers are “definitely seeing a very high kind of skill improvement” and are attributing increases in win rates and quota attainment to those gains — though he conceded that “it’s always hard to do 100 percent attribution.”
Morningstar, one of Gong’s early adopters, offered a pre-launch endorsement. Rae Cheney, Director of Sales Enablement Technology at Morningstar, said in a statement that Gong Enable helped the firm “spend less time on status updates and more time on the work that actually moves deals.”
One of the more interesting threads in Reshef’s remarks concerned his view of AI autonomy — or rather, its limits. He pushed back on what he called a “common misperception about AI” — that it operates completely autonomously.
“There has to be a person in the middle, which I call operator,” he said. “It could be RevOps. It could be enablement. In the case of training, it could be analysts. Sometimes it could be even business leaders.” Those operators, he argued, are responsible for a “repeatable process of AI doing something, measuring the AI” and adjusting over time.
This philosophy extends to the AI Call Reviewer’s feedback. Gong does not dictate what the system trains on — enablement leaders choose. “We don’t decide what they want to train on. We let them choose,” Reshef said. “You iterate, you optimize, you see how it goes, and there has to be somebody in the organization who’s responsible for making sure this aligns with the business needs.”
That stance puts Gong at odds with the more aggressive “autonomous agent” rhetoric emerging from some competitors, and it may resonate with enterprise buyers who remain cautious about letting AI run unsupervised in revenue-critical workflows.
Mission Andromeda does not exist in a vacuum. The revenue AI landscape has been reshaped by a remarkable wave of consolidation over the past six months.
In a category-defining move, Clari and Salesloft merged in December 2025 to form what they called a “Revenue AI powerhouse,” combining roughly $450 million in ARR under new CEO Steve Cox. Just two weeks ago, Highspot and Seismic signed a definitive agreement to merge, creating a combined entity worth more than $6 billion focused on AI-powered sales enablement — the very same territory Gong is now invading with Enable.
Meanwhile, Gong was named a Leader in the inaugural 2025 Gartner Magic Quadrant for Revenue Action Orchestration, published in December. The company placed highest among the 12 vendors evaluated on both the “Ability to Execute” and “Completeness of Vision” axes and ranked first in all four evaluated use cases in Gartner’s companion Critical Capabilities report.
In his interview, Reshef did not name competitors directly, but he drew a clear contrast. “We’ve built a product from the ground up. It’s all organic,” he said. “All of the other players in the field have sort of stitched together tools. And obviously you can’t just get it to be a coherent product if you just stitch together tools. Some of them even have multiple logins.” That is a thinly veiled shot at the merged Clari-Salesloft entity, which Forrester has described as presenting a “bifurcated approach” — Salesloft serving frontline users while Clari supports management insights.
Reshef also pointed to growth as a competitive weapon. “We’re growing at the top deck side in terms of SaaS companies,” he said, adding that Gong hired roughly 200 R&D employees this year and plans to hire another 200. “It’s kind of a flywheel where we can invest more in R&D, we make the product better, we get more capabilities, more flexibility, more enterprise customers.”
Any discussion of Gong’s trajectory inevitably raises the question of a public offering. When asked directly, Reshef declined to comment: “I wouldn’t comment on IPO at this stage. No.”
The company has been on a clear growth arc. Gong has raised approximately $584 million to date, with its last official funding round valuing it at $7.25 billion — the culmination of a series of rapid jumps from $750 million in 2019 to $2.2 billion in 2020. The company reached an annual sales run rate of approximately $300 million in January 2025, driven largely by the adoption of AI, according to Calcalist.
But that valuation has since slipped. As Calcalist reported in November 2025, Gong is conducting a secondary round for company employees and investors at a valuation of roughly $4.5 billion — well below its 2021 peak. The offering is being conducted through Nasdaq’s private market platform and was in advanced stages at the time of the report. It is not yet clear whether the company has repriced employee options, some of which were issued at significantly higher valuations than the current secondary round. Gong told Calcalist that it “regularly receives inquiries from potential investors” but as a private company does not “engage in speculation.”
The structured quarterly launch cadence that Mission Andromeda inaugurates — complete with galactic naming conventions and coordinated product narratives — certainly resembles the kind of predictable, story-driven approach that public market investors reward. Reshef framed it differently: “We felt like having quarterly launches with a name, a mission, and a story around it makes it easier to work… It’s a good way to educate the market on a regular basis.”
Reshef’s most revealing comment came when he laid out the company’s long-term thesis: Gong aims to increase productivity for revenue professionals by 50 percent. “We’re not there yet,” he admitted. “I think we’re like at 20 to 30 — whatever, hard to measure.”
He broke the productivity gain into two categories. The first is making high-complexity human tasks — like conducting a live Zoom sales call — better, through coaching, training, and review. “I think there’s going to be a long while, if ever, that Zoom conversations are going to get replaced by bots,” he said. The second is automating the manual drudgery: call preparation, post-meeting summaries, follow-up emails, account research briefs.
The distinction matters because it frames Gong’s ambition not as replacing salespeople but as making them dramatically more effective — a message calibrated to appeal to the thousands of revenue leaders who control Gong’s buying decisions. Whether that thesis holds will depend on whether Gong Enable, the AI Trainer, and the rest of Mission Andromeda can deliver measurable gains in a market that has been burned before by tools that promise insight but struggle to change behavior.
Gong currently serves more than 5,000 companies worldwide. The Clari-Salesloft merger has produced a rival with deeper combined resources. The Highspot-Seismic combination is assembling a sales enablement colossus. And a new Gartner category means every enterprise buyer now has a framework for comparison shopping. The next twelve months will test whether Mission Andromeda is the release that cements Gong’s position at the center of the revenue AI category — or the last big swing before the consolidated giants close in.
“Our mission is to be at the forefront,” Reshef said. “If everybody else is doing 20 percent, we’re going to do 50. If everybody is going to do 50, we’re going to do 80.”
In the revenue AI wars, that kind of confidence is easy to project. Delivering on it, with brand-new products still days old and a market being remade around you in real time, is something else entirely.
Anthropic opened its virtual “Briefing: Enterprise Agents” event on Tuesday with a provocation. Kate Jensen, the company’s head of Americas, told viewers that the hype around enterprise AI agents in 2025 “turned out to be mostly premature,” with many pilots failing to reach production. “It wasn’t a failure of effort, it was a failure of approach, and it’s something we heard directly from our customers,” Jensen said.
The implicit promise: Anthropic has figured out the right approach, and it starts with the playbook that made Claude Code one of the most consequential developer tools of the past year. “In 2025 Claude transformed how developers work, and in 2026 it will do the same for knowledge work,” Jensen said. “The magic behind Claude Code is simple. When you can delegate hard challenges, you can focus on the work that actually matters. Cowork brings that same power to knowledge workers.”
That framing is central to understanding what Anthropic announced on Tuesday. The company rolled out a sweeping set of enterprise capabilities for Claude Cowork, the AI productivity platform it first released in research preview in January. Scott White, head of product for Claude Enterprise, described the ambition plainly during the keynote: “Cowork makes it possible for Claude to deliver polished, near final work. It goes beyond drafts and suggestions — actual completed projects and deliverables.”
The product updates are dense but consequential. Enterprise administrators can now build private plugin marketplaces tailored to their organizations, connecting to private GitHub repositories as plugin sources and controlling which plugins employees can access. Anthropic introduced new prebuilt plugin templates spanning HR, design, engineering, operations, financial analysis, investment banking, equity research, private equity, and wealth management. The company also shipped new MCP connectors for Google Drive, Google Calendar, Gmail, DocuSign, Apollo, Clay, Outreach, SimilarWeb, MSCI, LegalZoom, FactSet, WordPress, and Harvey — dramatically extending Claude’s reach into the software ecosystem that enterprises already use. And Claude can now pass context seamlessly between Cowork, Excel, and PowerPoint, including across multiple files, without requiring users to restart when switching applications.
White emphasized that the system is designed to feel native to each organization rather than generic. “We’ve heard loud and clear from enterprises — you want Claude to work the way that your company works, not just Claude for legal, but Cowork for legal at your company,” he said. “That’s exactly what today’s launches deliver.”
To ground the product announcements in measurable outcomes, Anthropic showcased three enterprise deployments that illustrate both the scale and the variety of impact the company claims Claude can deliver.
At Spotify, engineers had long struggled with code migrations — the slow, manual work of updating and modernizing code across thousands of services. Jensen explained that after integrating Claude directly into the system Spotify’s engineers use daily, “any engineer can kick off a large-scale migration just by describing what they need in plain English.” The company reports up to a 90% reduction in engineering time, over 650 AI-generated code changes shipped per month, and roughly half of all Spotify updates now flowing through the system.
At Novo Nordisk, the pharmaceutical giant built an AI-powered platform called NovoScribe with Claude as its intelligence layer, targeting the grueling process of producing regulatory documentation for new medicines. Staff writers had previously averaged just over two reports per year. After deploying Claude, Jensen said, “documentation creation went from 10 plus weeks to 10 minutes. That’s a 95% reduction in resources for verification checks. Medicines are reaching patients faster.” Jensen also noted that Novo Nordisk used Claude Code to build the platform itself, enabling contributions from non-engineers — their digitalization strategy director, who holds a PhD in molecular biology rather than engineering, now prototypes features using natural language. “A team of 11 is operating like a team many times its size,” Jensen said.
Salesforce, meanwhile, uses Claude models to help power AI in Slack, reporting a 96% satisfaction rate for tools like its Slack bot and saving customers an estimated 97 minutes per week through summarization and recap features. The partnership reflects Anthropic’s broader ecosystem strategy: Jensen described the companies featured at the event as “Claude partners and domain experts with the data and trusted relationships that make Claude work in the real world.”
Perhaps the most illuminating segment of the event was a panel discussion featuring executives from Thomson Reuters, the New York Stock Exchange, and Epic, who provided candid assessments of AI’s enterprise reality that went well beyond the polished case studies.
Sridhar Masam, CTO of the New York Stock Exchange, described his organization as “rewiring our engineering process” with Claude Code and building internal AI agents using the Claude Agent SDK that can take instructions from a Jira ticket all the way to a committed piece of code. But he also identified fundamental shifts in how leaders must think. “The accountability is shifting,” he said. “Traditionally, we are so used to building deterministic platforms. You write code requirements and build. And now, with AI being probabilistic, the accountability doesn’t end when the project goes live, but on a daily basis, monitoring the behavior and outcomes.” He described a new paradigm beyond “buy versus build” — what he called “assembly,” the practice of combining multiple models, multiple vendors, platforms, data, and internal capabilities into solutions. And he noted that highly regulated industries must shift “from risk avoidance to risk calibration,” because simply avoiding AI is no longer a competitive option.
Steve Haske from Thomson Reuters, whose Co-Counsel product has reached a million users, was frank about the gap between what the technology can do and what organizations are ready for. “The tools are in many senses ahead of the change management,” he said. “A general counsel’s office, a law firm, a tax and accounting firm, an audit firm, need to rewire the processes to be able to take advantage of the benefits that the tools provide. And I think it’s 18 months away before that sort of change management catches up with the standard of the tool.” He also stressed an “ironclad guarantee” to Co-Counsel customers that “their input will not be part of our AI output,” and urged enterprise leaders to be “feverish” about protecting institutional intellectual property.
Seth Hain from Epic — the healthcare technology company behind MyChart — offered a finding that may foreshadow where enterprise AI adoption is truly heading. “Over half of our use of Claude Code is by non-developer roles across the company,” Hain said, describing how support and implementation staff had adopted the tool in ways the company never anticipated. Hain also described a deliberate trust-building strategy: Epic’s first AI capability was a medical record summarization that included links to the underlying source material, giving clinicians the ability to verify and build confidence before the company introduced more autonomous agent capabilities.
Tuesday’s announcements cannot be understood in isolation. They are essentially the culmination of a year in which Anthropic transformed itself from a research-focused AI lab into a company with genuine enterprise distribution and developer ecosystem gravity.
The trajectory began with Claude Code, which Jensen noted had taken coding use cases “from assisting on tiny tasks to AI writing 90 or sometimes even 100% of the code, with enterprises shipping in weeks what once took many quarters.” But the deeper structural shift was the adoption of MCP — the Model Context Protocol — which has become the connective tissue allowing Claude to reach into and act upon data across an organization’s entire technology stack. Where previous AI tools were constrained to the information users manually fed them, MCP-connected Claude can pull context from Slack threads, Google Drive documents, CRM records, and financial systems simultaneously. This is what makes the plugin architecture announced Tuesday fundamentally different from earlier chatbot-style enterprise AI: it turns Claude into a reasoning layer that sits across an organization’s existing infrastructure rather than alongside it.
The implications for the broader AI industry are profound. Anthropic is effectively building a platform play — private plugin marketplaces, portable file-based plugins, and an expanding library of MCP connectors — that echoes the ecosystem strategies of earlier platform giants like Salesforce and Microsoft. The difference is velocity: Anthropic is compressing into months the kind of ecosystem development that previously took years. The company’s willingness to ship sector-specific plugin templates for investment banking, equity research, and wealth management alongside general-purpose tools signals that it sees no bright line between platform and application, between enabling partners and competing with them.
This strategic ambiguity is precisely what has spooked Wall Street. IBM shares suffered their worst single-day loss since October 2000 — down nearly 13.2% — on Monday after Anthropic published a blog post about using Claude Code to modernize COBOL, the decades-old programming language that runs on IBM’s mainframe systems. Enterprise software stocks had already been under heavy pressure since the initial Cowork announcement on January 30, with companies like ServiceNow, Salesforce, Snowflake, Intuit, and Thomson Reuters all experiencing steep declines. Cybersecurity companies tumbled after the company unveiled Claude Code Security on February 20.
Yet Tuesday’s event triggered a partial reversal that revealed something important about how markets are processing AI disruption. Companies named as Anthropic partners and integration targets — Salesforce, DocuSign, LegalZoom, Thomson Reuters, FactSet — all rallied, some sharply. Thomson Reuters surged more than 11%. The market appears to be drawing a new distinction: companies integrated into Anthropic’s ecosystem may benefit, while those standing outside it face existential risk.
Peter McCrory, Anthropic’s head of economics, presented data from the Anthropic Economic Index that offered a sober counterweight to the event’s product optimism. Using privacy-preserving methods to analyze how people and businesses use Claude, McCrory’s team has tracked AI’s diffusion across more than 150 countries and every US state.
The headline finding is striking: a year ago, roughly a third of all US jobs had at least a quarter of their associated tasks appearing in Claude usage data. That figure has now risen to approximately one in every two jobs. “The scope of impact is broadening out throughout the economy as the tools and as the technology becomes more capable,” McCrory said. He characterized AI as a “general purpose technology” in the economic sense — meaning virtually no facet of the economy will be unaffected.
McCrory drew a critical distinction between automation, where Claude simply executes a task, and augmentation, where it collaborates with a human on more complex work. When businesses embed Claude through the API, he noted, “we see overwhelmingly Claude is being embedded in automated ways” — a pattern consistent with how transformative technologies have historically diffused through the economy.
On the question of job displacement, McCrory was measured but direct. He noted that “roles that typically require more years of schooling have the largest productivity or efficiency gains,” suggesting a dynamic economists call skill-biased technical change. He expressed concern about “jobs that are pure implementation” — citing data entry workers and technical writers as examples where Claude is already being used for tasks central to those occupations. But he emphasized that no evidence of widespread labor displacement has materialized yet, and pointed to forthcoming research that would introduce methodology for monitoring whether highly exposed workers are beginning to experience it.
His advice to enterprise leaders cut to the heart of the organizational challenge. “It might not just be about fundamental capabilities of the model,” McCrory said. “Do you have the right sort of data ecosystem, data infrastructure to provide the right information at the right time?” If the knowledge Claude needs to execute a sophisticated task exists only in a coworker’s head, he argued, “that’s not a technical problem, per se. That’s an organizational problem.”
Jensen described a concept Anthropic calls “the thinking divide” — the growing gap between organizations that embed AI across employees, processes, and products simultaneously, and those that treat it as a point solution. The companies on the right side of that divide, she argued, will compound their advantage over time. Those on the wrong side “will find themselves falling further and further behind.”
Whether Anthropic ultimately functions as the rising tide that lifts the enterprise software ecosystem or the wave that swamps it remains genuinely uncertain. The same event that triggered a rally in shares of Anthropic’s named partners has also accelerated a broader reckoning for legacy software companies that cannot yet articulate how they fit into an AI-native world. McCrory, the economist, counseled humility. “Capabilities are moving very, very quickly,” he said. “It might represent an innovation in the method of innovation. So it’s not just making us better at the things that we do — it’s helping us discover new ways to do things.”
Thomson Reuters’ Haske perhaps put it most practically. “As leaders, we all have to get personally involved and personally invested in using the tools,” he said. “We’ve got to move fast. This environment is changing quickly. We cannot afford to get left behind.”
A Fortune 10 CIO recently told Jensen that enterprises would need to fit a decade of innovation into the next few years. The CIO smiled and said: “We’re going to do it in one with you.” Whether that confidence proves prescient or premature, one thing is clear from Tuesday’s event — the window for figuring it out is closing faster than most boardrooms realize.
Privacy-focused hackers are being offered cash to modify Ring cameras so they work locally without sending data to Amazon, reflecting growing unease over how home surveillance data is collected and used.
The post A cash bounty is daring hackers to stop…
Anthropic dropped a bombshell on the artificial intelligence industry Monday, publicly accusing three prominent Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — of orchestrating coordinated, industrial-scale campaigns to siphon capabilities from its Claude models using tens of thousands of fraudulent accounts.
The San Francisco-based company said the three labs collectively generated more than 16 million exchanges with Claude through approximately 24,000 fake accounts, all in violation of Anthropic’s terms of service and regional access restrictions. The campaigns, Anthropic said, are the most concrete and detailed public evidence to date of a practice that has haunted Silicon Valley for months: foreign competitors systematically using a technique called distillation to leapfrog years of research and billions of dollars in investment.
“These campaigns are growing in intensity and sophistication,” Anthropic wrote in a technical blog post published Monday. “The window to act is narrow, and the threat extends beyond any single company or region. Addressing it will require rapid, coordinated action among industry players, policymakers, and the global AI community.”
The disclosure marks a dramatic escalation in the simmering tensions between American and Chinese AI developers — and it arrives at a moment when Washington is actively debating whether to tighten or loosen export controls on the advanced chips that power AI training. Anthropic, led by CEO Dario Amodei, has been among the most vocal advocates for restricting chip sales to China, and the company explicitly connected Monday’s revelations to that policy fight.
To understand what Anthropic alleges, it helps to understand what distillation actually is — and how it evolved from an academic curiosity into the most contentious issue in the global AI race.
At its core, distillation is a process of extracting knowledge from a larger, more powerful AI model — the “teacher” — to create a smaller, more efficient one — the “student.” The student model learns not from raw data, but from the teacher’s outputs: its answers, reasoning patterns, and behaviors. Done correctly, the student can achieve performance remarkably close to the teacher’s while requiring a fraction of the compute to train.
As Anthropic itself acknowledged, distillation is “a widely used and legitimate training method.” Frontier AI labs, including Anthropic, routinely distill their own models to create smaller, cheaper versions for customers. But the same technique can be weaponized. A competitor can pose as a legitimate customer, bombard a frontier model with carefully crafted prompts, collect the outputs, and use those outputs to train a rival system — capturing capabilities that took years and hundreds of millions of dollars to develop.
The technique burst into public consciousness in January 2025 when DeepSeek released its R1 reasoning model, which appeared to match or approach the performance of leading American models at dramatically lower cost. Databricks CEO Ali Ghodsi captured the industry’s anxiety at the time, telling CNBC: “This distillation technique is just so extremely powerful and so extremely cheap, and it’s just available to anyone.” He predicted the technique would usher in an era of intense competition for large language models.
That prediction proved prescient. In the weeks following DeepSeek’s release, researchers at UC Berkeley said they recreated OpenAI’s reasoning model for just $450 in 19 hours. Researchers at Stanford and the University of Washington followed with their own version built in 26 minutes for under $50 in compute credits. The startup Hugging Face replicated OpenAI’s Deep Research feature as a 24-hour coding challenge. DeepSeek itself openly released a family of distilled models on Hugging Face — including versions built on top of Qwen and Llama architectures — under the permissive MIT license, with the model card explicitly stating that the DeepSeek-R1 series supports commercial use and allows for any modifications and derivative works, “including, but not limited to, distillation for training other LLMs.”
But what Anthropic described Monday goes far beyond academic replication or open-source experimentation. The company detailed what it characterized as deliberate, covert, and large-scale intellectual property extraction by well-resourced commercial laboratories operating under the jurisdiction of the Chinese government.
Anthropic attributed each campaign “with high confidence” through IP address correlation, request metadata, infrastructure indicators, and corroboration from unnamed industry partners who observed the same actors on their own platforms. Each campaign specifically targeted what Anthropic described as Claude’s most differentiated capabilities: agentic reasoning, tool use, and coding.
DeepSeek, the company that ignited the distillation debate, conducted what Anthropic described as the most technically sophisticated of the three operations, generating over 150,000 exchanges with Claude. Anthropic said DeepSeek’s prompts targeted reasoning capabilities, rubric-based grading tasks designed to make Claude function as a reward model for reinforcement learning, and — in a detail likely to draw particular political attention — the creation of “censorship-safe alternatives to policy sensitive queries.”
Anthropic alleged that DeepSeek “generated synchronized traffic across accounts” with “identical patterns, shared payment methods, and coordinated timing” that suggested load balancing to maximize throughput while evading detection. In one particularly notable technique, Anthropic said DeepSeek’s prompts “asked Claude to imagine and articulate the internal reasoning behind a completed response and write it out step by step — effectively generating chain-of-thought training data at scale.” The company also alleged it observed tasks in which Claude was used to generate alternatives to politically sensitive queries about “dissidents, party leaders, or authoritarianism,” likely to train DeepSeek’s own models to steer conversations away from censored topics. Anthropic said it was able to trace these accounts to specific researchers at the lab.
Moonshot AI, the Beijing-based creator of the Kimi models, ran the second-largest operation by volume at over 3.4 million exchanges. Anthropic said Moonshot targeted agentic reasoning and tool use, coding and data analysis, computer-use agent development, and computer vision. The company employed “hundreds of fraudulent accounts spanning multiple access pathways,” making the campaign harder to detect as a coordinated operation. Anthropic attributed the campaign through request metadata that “matched the public profiles of senior Moonshot staff.” In a later phase, Anthropic said, Moonshot adopted a more targeted approach, “attempting to extract and reconstruct Claude’s reasoning traces.”
MiniMax, the least publicly known of the three but the most prolific by volume, generated over 13 million exchanges — more than three-quarters of the total. Anthropic said MiniMax’s campaign focused on agentic coding, tool use, and orchestration. The company said it detected MiniMax’s campaign while it was still active, “before MiniMax released the model it was training,” giving Anthropic “unprecedented visibility into the life cycle of distillation attacks, from data generation through to model launch.” In a detail that underscores the urgency and opportunism Anthropic alleges, the company said that when it released a new model during MiniMax’s active campaign, MiniMax “pivoted within 24 hours, redirecting nearly half their traffic to capture capabilities from our latest system.”
Anthropic does not currently offer commercial access to Claude in China, a policy it maintains for national security reasons. So how did these labs access the models at all?
The answer, Anthropic said, lies in commercial proxy services that resell access to Claude and other frontier AI models at scale. Anthropic described these services as running what it calls “hydra cluster” architectures — sprawling networks of fraudulent accounts that distribute traffic across Anthropic’s API and third-party cloud platforms. “The breadth of these networks means that there are no single points of failure,” Anthropic wrote. “When one account is banned, a new one takes its place.” In one case, Anthropic said, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder.
The description suggests a mature and well-resourced infrastructure ecosystem dedicated to circumventing access controls — one that may serve many more clients than just the three labs Anthropic named.
Anthropic did not treat this as a mere terms-of-service violation. The company embedded its technical disclosure within an explicit national security argument, warning that “illicitly distilled models lack necessary safeguards, creating significant national security risks.”
The company argued that models built through illicit distillation are “unlikely to retain” the safety guardrails that American companies build into their systems — protections designed to prevent AI from being used to develop bioweapons, carry out cyberattacks, or enable mass surveillance. “Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems,” Anthropic wrote, “enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance.”
This framing directly connects to the chip export control debate that Amodei has made a centerpiece of his public advocacy. In a detailed essay published in January 2025, Amodei argued that export controls are “the most important determinant of whether we end up in a unipolar or bipolar world” — a world where either only the U.S. and its allies possess the most powerful AI, or one where China achieves parity. He specifically noted at the time that he was “not taking any position on reports of distillation from Western models” and would “just take DeepSeek at their word that they trained it the way they said in the paper.”
Monday’s disclosure is a sharp departure from that earlier restraint. Anthropic now argues that distillation attacks “undermine” export controls “by allowing foreign labs, including those subject to the control of the Chinese Communist Party, to close the competitive advantage that export controls are designed to preserve through other means.” The company went further, asserting that “without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective.” In other words, Anthropic is arguing that what some observers interpreted as proof that Chinese labs can innovate around chip restrictions was actually, in significant part, the result of stealing American capabilities.
Anthropic’s decision to frame this as a national security issue rather than a legal dispute may reflect the difficult reality that intellectual property law offers limited recourse against distillation.
As a March 2025 analysis by the law firm Winston & Strawn noted, “the legal landscape surrounding AI distillation is unclear and evolving.” The firm’s attorneys observed that proving a copyright claim in this context would be challenging, since it remains unclear whether the outputs of AI models qualify as copyrightable creative expression. The U.S. Copyright Office affirmed in January 2025 that copyright protection requires human authorship, and that “mere provision of prompts does not render the outputs copyrightable.”
The legal picture is further complicated by the way frontier labs structure output ownership. OpenAI’s terms of use, for instance, assign ownership of model outputs to the user — meaning that even if a company can prove extraction occurred, it may not hold copyrights over the extracted data. Winston & Strawn noted that this dynamic means “even if OpenAI can present enough evidence to show that DeepSeek extracted data from its models, OpenAI likely does not have copyrights over the data.” The same logic would almost certainly apply to Anthropic’s outputs.
Contract law may offer a more promising avenue. Anthropic’s terms of service prohibit the kind of systematic extraction the company describes, and violation of those terms is a more straightforward legal claim than copyright infringement. But enforcing contractual terms against entities operating through proxy services and fraudulent accounts in a foreign jurisdiction presents its own formidable challenges.
This may explain why Anthropic chose the national security frame over a purely legal one. By positioning distillation attacks as threats to export control regimes and democratic security rather than as intellectual property disputes, Anthropic appeals to policymakers and regulators who have tools — sanctions, entity list designations, enhanced export restrictions — that go far beyond what civil litigation could achieve.
Anthropic outlined a multipronged defensive response. The company said it has built classifiers and behavioral fingerprinting systems designed to identify distillation attack patterns in API traffic, including detection of chain-of-thought elicitation used to construct reasoning training data. It is sharing technical indicators with other AI labs, cloud providers, and relevant authorities to build what it described as a more holistic picture of the distillation landscape. The company has also strengthened verification for educational accounts, security research programs, and startup organizations — the pathways most commonly exploited for setting up fraudulent accounts — and is developing model-level safeguards designed to reduce the usefulness of outputs for illicit distillation without degrading the experience for legitimate customers.
But the company acknowledged that “no company can solve this alone,” calling for coordinated action across the industry, cloud providers, and policymakers.
The disclosure is likely to reverberate through multiple ongoing policy debates. In Congress, the bipartisan No DeepSeek on Government Devices Act has already been introduced. Federal agencies including NASA have banned DeepSeek from employee devices. And the broader question of chip export controls — which the Trump administration has been weighing amid competing pressures from Nvidia and national security hawks — now has a new and vivid data point.
For the AI industry’s technical decision-makers, the implications are immediate and practical. If Anthropic’s account is accurate, the proxy infrastructure enabling these attacks is vast, sophisticated, and adaptable — and it is not limited to targeting a single company. Every frontier AI lab with an API is a potential target. The era of treating model access as a simple commercial transaction may be coming to an end, replaced by one in which API security is as strategically important as the model weights themselves.
Anthropic has now put names, numbers, and forensic detail behind accusations that the industry had only whispered about for months. Whether that evidence galvanizes the coordinated response the company is calling for — or simply accelerates an arms race between distillers and defenders — may depend on a question no classifier can answer: whether Washington sees this as an act of espionage or just the cost of doing business in an era when intelligence itself has become a commodity.