It’s always the same story: A new technology appears and everyone starts talking about how it’ll change everything. Then capital rushes in, companies form overnight, and valuations climb faster than anyone can justify. Then, many many months later, the…
Intuit has lost more than 40% of its market cap since the beginning of the year. It’s not alone. Many established SaaS players have seen their stock prices fall in recent months, including Adobe and IBM — the latter experiencing its most significant one-day drop (roughly $40 billion) with Anthropic’s announcement that Claude could now read, analyze and translate legacy COBOL into modern languages like Java and Python. The market has a name for it: the SaaSpocalypse.
The argument from investors and market watchers: AI agents can now do bookkeeping, file taxes and reconcile accounts — without a human ever touching software. For instance, instead of a human using QuickBooks to categorize transactions, Claude Cowork can access financial data, apply tax logic and autonomously prepare documents. Rather than using TurboTax, agentic AI tools can handle complex tax logic and even file taxes. In lieu of QuickBooks, automated agents can handle multi-step bookkeeping tasks (like lining up receipts).
Intuit has been among the hardest-hit, with its market capitalization now sitting at around $106 billion.
The catalyst has been the emergence of fully agentic, no-code AI assistants like Claude Cowork and open-source tools like OpenClaw, whose founder was recently acqui-hired by OpenAI. Fears are that these cheaper service-as-a-service offerings (or service-as-software, or results-as-a-service, depending on who you ask) will upend pay-per-seat subscriptions; whereas traditional SaaS delivers a tool (software) for users to complete a task, service-as-a-service delivers a fully-automated outcome.
For instance, Anthropic’s Cowork platform includes finance capabilities that allow the agent to read financial files and turn them into structured models, tables and reports.
“The advantage is that I am abstracting away the complexity of my business operations,” said Brian Jackson, principal research director at Info-Tech Research Group (who prefers to call it “service-as-software”). “To hear about a model where you only pay when you get the outcome that you want, that’s very appealing.”
This emerging capability is in line with past technological advancements, he pointed out: IT departments used to be in charge of running infrastructure, but cloud computing came along to abstract away that management. Then, SaaS tools emerged to orchestrate the application layer. Now users manage their work — inputting data, filling out forms, creating analytics dashboards — within SaaS apps.
“So the next step is automated intelligence,” Jackson said. “Instead of having people do those things, we’ll just have AI do them.” Essentially, it could become a headless system without a UI; users simply let it run and don’t think about it.
This new concept comes at a time when enterprises are becoming fed up with the SaaS business model, he noted. Lock-in is frustrating, fees continue to go up, seats expand, and “it becomes this unwieldy operating cost,” Jackson said. “And it’s not always guaranteed to drive value, it doesn’t guarantee ROI at all.”
Intuit, which was founded in 1983, now serves around 100 million customers with a suite of products that, in addition to QuickBooks and TurboTax, include Mailchimp and Credit Karma. But these core offerings are now considered low-hanging fruit for AI, potentially endangering the company whose revenue model relies heavily on per-seat/per-user subscriptions.
Intuit’s CEO Sasan Goodarzi has recently shrugged off SaaSpocalypse claims, calling data the “most important moat” in a Semafor interview.
Marianna Tessel, EVP and GM for Intuit’s small business group, takes the same stance. Yes, Claude Cowork and similar agentic tools are “robust” tools, she noted, but Intuit has “persistent” and “durable” advantages.
Notably: First-party data. Customers generate various types of data on Intuit’s systems, whether it’s by creating an invoice, importing ledgers or performing various finance projects. Then there’s third-party data, which is generated through Intuit’s connections with 24,000-plus banks, e-commerce sites and other entities, Tessel pointed out.
AI agents simply do not have access to this “vastness” of data, she contended. Further, Intuit knows how to organize and use data, such as stitching together information across customer segments to provide market snapshots. “We understand this data, we know how to turn it into action,” Tessel argued.
She also doubled down on Intuit’s deep understanding of its customers. Rather than a chatbot that can process and act on numbers and figures, “we know what small businesses face,” she said, whether it’s their concerns around bookkeeping and payroll, or their struggles with hiring.
“We’ve been in business for over 40 years,” Tessel noted. “We have a lot of know-how that is very specific.”
Other SaaS companies stand staunchly behind this argument. Jon Aniano, Zendesk’s SVP of product and CRM applications, pointed out that his company serves 80,000 customers and deeply understands their needs. “We actually see [general purpose agentic tools] at a disadvantage because they’ve gotta go customer by customer and learn things that we’ve learned over the course of 20 years,” he said at a recent VentureBeat event.
The data moat argument does hold up, noted Info-Tech’s Jackson. He also pointed out that, realistically, the SaaS market is projected to grow at a “pretty good clip” in the years ahead. “Could that change very quickly? It’s possible, but it’s unlikely,” he said.
Also, SaaS is so entrenched in modern business, and pivoting to something entirely new can be a challenge. Even disruptive and compelling technologies like AI can take time to deploy at scale because enterprises have to recraft their workflows, Jackson noted.
“You have workers in place. You have departments in place. It just takes effort and time to change the processes and the expectations around these things,” he said, although “the appetite will definitely be there.”
To get ahead of this, Intuit recently signed a multi-year partnership with Anthropic to bring AI agents to mid-market businesses. Using Anthropic’s Claude Agent SDK on the Intuit platform, enterprises will be able to build and customize agents. On the other end, Intuit’s tools can be surfaced directly inside Anthropic products such as Cowork, Claude for Enterprise, and Claude.ai through Model Context Protocol (MCP) integrations with TurboTax, Credit Karma, QuickBooks and Mailchimp.
This builds on Intuit’s previous rollout of Intuit Intelligence, which features specialized AI agents for sales, tax, payroll, accounting and project management. Users can query and interact with their financial data in natural language, automate tasks and generate dynamic reports or KPI scorecards.
“They have the data, they have the interface, and now they’re introducing themselves as an orchestration layer,” Jackson said of moves like this by large SaaS players. “We can be the place where you build your agents and manage them.”
To this point, Tessel calls Intuit “a well-run company” that can react with speed. Her team keeps up with orchestration advancements, reads academic papers and is “constantly learning” about new technologies. “We’re on it,” she said.”
Ultimately, companies must be “awake and aware right now,” she emphasized. As she put it: “What’s the pivot of the day? How many times did you pivot? Are you experimenting?”
Zendesk’s Aniano agreed that there are “cool new ways of developing software,” and acknowledged that he “lives” 90 to 120 minutes of his day inside Claude Code. Companies that can make the “mental shift” to building software in new ways can create a level playing field between incumbents and startups.
One thing that’ll be interesting to see is how quickly SaaS providers offer MCP plugins or build their own within their software suites, Jackson noted. “How good will these SaaS providers be at supporting AI interoperability?” he said. “And what ways will they try to create friction or make it harder for enterprises to abandon their interface?”
Traditional software governance often uses static compliance checklists, quarterly audits and after-the-fact reviews. But this method can’t keep up with AI systems that change in real time. A machine learning (ML) model might retrain or drift between quarterly operational syncs. This means that, by the time an issue is discovered, hundreds of bad decisions could already have been made. This can be almost impossible to untangle.
In the fast-paced world of AI, governance must be inline, not an after-the-fact compliance review. In other words, organizations must adopt what I call an “audit loop”: A continuous, integrated compliance process that operates in real-time alongside AI development and deployment, without halting innovation.
This article explains how to implement such continuous AI compliance through shadow mode rollouts, drift and misuse monitoring and audit logs engineered for direct legal defensibility.
When systems moved at the speed of people, it made sense to do compliance checks every so often. But AI doesn’t wait for the next review meeting. The change to an inline audit loop means audits will no longer occur just once in a while; they happen all the time. Compliance and risk management should be “baked in” to the AI lifecycle from development to production, rather than just post-deployment. This means establishing live metrics and guardrails that monitor AI behavior as it occurs and raise red flags as soon as something seems off.
For instance, teams can set up drift detectors that automatically alert when a model’s predictions go off course from the training distribution, or when confidence scores fall below acceptable levels. Governance is no longer just a set of quarterly snapshots; it’s a streaming process with alerts that go off in real time when a system goes outside of its defined confidence bands.
Cultural shift is equally important: Compliance teams must act less like after-the-fact auditors and more like AI co-pilots. In practice, this might mean compliance and AI engineers working together to define policy guardrails and continuously monitor key indicators. With the right tools and mindset, real-time AI governance can “nudge” and intervene early, helping teams course-correct without slowing down innovation.
In fact, when done well, continuous governance builds trust rather than friction, providing shared visibility into AI operations for both builders and regulators, instead of unpleasant surprises after deployment. The following strategies illustrate how to achieve this balance.
One effective framework for continuous AI compliance is “shadow mode” deployments with new models or agent features. This means a new AI system is deployed in parallel with the existing system, receiving real production inputs but not influencing real decisions or user-facing outputs. The legacy model or process continues to handle decisions, while the new AI’s outputs are captured only for analysis. This provides a safe sandbox to vet the AI’s behavior under real conditions.
According to global law firm Morgan Lewis: “Shadow-mode operation requires the AI to run in parallel without influencing live decisions until its performance is validated,” giving organizations a safe environment to test changes.
Teams can discover problems early by comparing the shadow model’s decisions to expectations (the current model’s decisions). For instance, when a model is running in shadow mode, they can check to see if its inputs and predictions differ from those of the current production model or the patterns seen in training. Sudden changes could indicate bugs in the data pipeline, unexpected bias or drops in performance.
In short, shadow mode is a way to check compliance in real time: It ensures that the model handles inputs correctly and meets policy standards (accuracy, fairness) before it is fully released. One AI security framework showed how this method worked: Teams first ran AI in shadow mode (AI makes suggestions but doesn’t act on its own), then compared AI and human inputs to determine trust. They only let the AI suggest actions with human approval after it was reliable.
For instance, Prophet Security eventually let the AI make low-risk decisions on its own. Using phased rollouts gives people confidence that an AI system meets requirements and works as expected, without putting production or customers at risk during testing.
Even after an AI model is fully deployed, the compliance job is never “done.” Over time, AI systems can drift, meaning that their performance or outputs change due to new data patterns, model retraining or bad inputs. They can also be misused or lead to results that go against policy (for example, inappropriate content or biased decisions) in unexpected ways.
To remain compliant, teams must set up monitoring signals and processes to catch these issues as they happen. In SLA monitoring, they may only check for uptime or latency. In AI monitoring, however, the system must be able to tell when outputs are not what they should be. For example, if a model suddenly starts giving biased or harmful results. This means setting “confidence bands” or quantitative limits for how a model should behave and setting automatic alerts when those limits are crossed.
Some signals to monitor include:
Data or concept drift: When input data distributions change significantly or model predictions diverge from training-time patterns. For example, a model’s accuracy on certain segments might drop as the incoming data shifts, a sign to investigate and possibly retrain.
Anomalous or harmful outputs: When outputs trigger policy violations or ethical red flags. An AI content filter might flag if a generative model produces disallowed content, or a bias monitor might detect if decisions for a protected group begin to skew negatively. Contracts for AI services now often require vendors to detect and address such noncompliant results promptly.
User misuse patterns: When unusual usage behavior suggests someone is trying to manipulate or misuse the AI. For instance, rapid-fire queries attempting prompt injection or adversarial inputs could be automatically flagged by the system’s telemetry as potential misuse.
When a drift or misuse signal crosses a critical threshold, the system should support “intelligent escalation” rather than waiting for a quarterly review. In practice, this could mean triggering an automated mitigation or immediately alerting a human overseer. Leading organizations build in fail-safes like kill-switches, or the ability to suspend an AI’s actions the moment it behaves unpredictably or unsafely.
For example, a service contract might allow a company to instantly pause an AI agent if it’s outputting suspect results, even if the AI provider hasn’t acknowledged a problem. Likewise, teams should have playbooks for rapid model rollback or retraining windows: If drift or errors are detected, there’s a plan to retrain the model (or revert to a safe state) within a defined timeframe. This kind of agile response is crucial; it recognizes that AI behavior may drift or degrade in ways that cannot be fixed with a simple patch, so swift retraining or tuning is part of the compliance loop.
By continuously monitoring and reacting to drift and misuse signals, companies transform compliance from a periodic audit to an ongoing safety net. Issues are caught and addressed in hours or days, not months. The AI stays within acceptable bounds, and governance keeps pace with the AI’s own learning and adaptation, rather than trailing behind it. This not only protects users and stakeholders; it gives regulators and executives peace of mind that the AI is under constant watchful oversight, even as it evolves.
Continuous compliance also means continuously documenting what your AI is doing and why. Robust audit logs demonstrate compliance, both for internal accountability and external legal defensibility. However, logging for AI requires more than simplistic logs. Imagine an auditor or regulator asking: “Why did the AI make this decision, and did it follow approved policy?” Your logs should be able to answer that.
A good AI audit log keeps a permanent, detailed record of every important action and decision AI makes, along with the reasons and context. Legal experts say these logs “provide detailed, unchangeable records of AI system actions with exact timestamps and written reasons for decisions.” They are important evidence in court. This means that every important inference, suggestion or independent action taken by AI should be recorded with metadata, such as timestamps, the model/version used, the input received, the output produced and (if possible) the reasoning or confidence behind that output.
Modern compliance platforms stress logging not only the result (“X action taken”) but also the rationale (“X action taken because conditions Y and Z were met according to policy”). These enhanced logs let an auditor see, for example, not just that an AI approved a user’s access, but that it was approved “based on continuous usage and alignment with the user’s peer group,” according to Attorney Aaron Hall.
Audit logs should also be well-organized and difficult to change if they are to be legally sound. Techniques like immutable storage or cryptographic hashing of logs ensure that records can’t be changed. Log data should be protected by access controls and encryption so that sensitive information, such as security keys and personal data, is hidden or protected while still being open.
In regulated industries, keeping these logs can show examiners that you are not only keeping track of AI’s outputs, but you are retaining records for review. Regulators are expecting companies to show more than that an AI was checked before it was released. They want to see that it is being monitored continuously and there is a forensic trail to analyze its behavior over time. This evidentiary backbone comes from complete audit trails that include data inputs, model versions and decision outputs. They make AI less of a “black box” and more of a system that can be tracked and held accountable.
If there is a disagreement or an event (for example, an AI made a biased choice that hurt a customer), these logs are your legal lifeline. They help you figure out what went wrong. Was it a problem with the data, a model drift or misuse? Who was in charge of the process? Did we stick to the rules we set?
Well-kept AI audit logs show that the company did its homework and had controls in place. This not only lowers the risk of legal problems but makes people more trusting of AI systems. With AI, teams and executives can be sure that every decision made is safe because it is open and accountable.
Implementing an “audit loop” of continuous AI compliance might sound like extra work, but in reality, it enables faster and safer AI delivery. By integrating governance into each stage of the AI lifecycle, from shadow mode trial runs to real-time monitoring to immutable logging, organizations can move quickly and responsibly. Issues are caught early, so they don’t snowball into major failures that require project-halting fixes later. Developers and data scientists can iterate on models without endless back-and-forth with compliance reviewers, because many compliance checks are automated and happen in parallel.
Rather than slowing down delivery, this approach often accelerates it: Teams spend less time on reactive damage control or lengthy audits, and more time on innovation because they are confident that compliance is under control in the background.
There are bigger benefits to continuous AI compliance, too. It gives end-users, business leaders and regulators a reason to believe that AI systems are being handled responsibly. When every AI decision is clearly recorded, watched and checked for quality, stakeholders are much more likely to accept AI solutions. This trust benefits the whole industry and society, not just individual businesses.
An audit-loop governance model can stop AI failures and ensure AI behavior is in line with moral and legal standards. In fact, strong AI governance benefits the economy and the public because it encourages innovation and protection. It can unlock AI’s potential in important areas like finance, healthcare and infrastructure without putting safety or values at risk. As national and international standards for AI change quickly, U.S. companies that set a good example by always following the rules are at the forefront of trustworthy AI.
People say that if your AI governance isn’t keeping up with your AI, it’s not really governance; it’s “archaeology.” Forward-thinking companies are realizing this and adopting audit loops. By doing so, they not only avoid problems but make compliance a competitive advantage, ensuring that faster delivery and better oversight go hand in hand.
Dhyey Mavani is working to accelerate gen AI and computational mathematics.
Editor’s note: The opinions expressed in this article are the authors’ personal opinions and do not reflect the opinions of their employers.
Startup founders are being pushed to move faster than ever, using AI while facing tighter funding, rising infrastructure costs, and more pressure to show real traction early. Cloud credits, access to GPUs, and foundation models have made it easier to g…
Typically, when building, training and deploying AI, enterprises prioritize accuracy. And that, no doubt, is important; but in highly complex, nuanced industries like law, accuracy alone isn’t enough. Higher stakes mean higher standards: Models outputs must be assessed for relevancy, authority, citation accuracy and hallucination rates.
To tackle this immense task, LexisNexis has evolved beyond standard retrieval-augmented generation (RAG) to graph RAG and agentic graphs; it has also built out “planner” and “reflection” AI agents that parse requests and criticize their own outputs.
“There’s no such [thing] as ‘perfect AI’ because you never get 100% accuracy or 100% relevancy, especially in complex, high stake domains like legal,” Min Chen, LexisNexis’ SVP and chief AI officer, acknowledges in a new VentureBeat Beyond the Pilot podcast.
The goal is to manage that uncertainty as much as possible and translate it into consistent customer value. “At the end of the day, what matters most for us is the quality of the AI outcome, and that is a continuous journey of experimentation, iteration and improvement,” Chen said.
To evaluate models and their outputs, Chen’s team has established more than a half-dozen “sub metrics” to measure “usefulness” based on several factors — authority, citation accuracy, hallucination rates — as well as “comprehensiveness.” This particular metric is designed to evaluate whether a gen AI response fully addressed all aspects of a users’ legal questions.
“So it’s not just about relevancy,” Chen said. “Completeness speaks directly to legal reliability.”
For instance, a user may ask a question that requires an answer covering five distinct legal considerations. Gen AI may provide a response that accurately addresses three of these. But, while relevant, this partial answer is incomplete and, from a user perspective, insufficient. This can be misleading and pose real-life risks.
Or, for example, some citations may be semantically relevant to a user’s question, but they may point to arguments or instances that were ultimately overruled in court. “Our lawyers will consider them not citable,” Chen said. “If they’re not citable, they’re not useful.”
LexisNexis launched its flagship gen AI product, Lexis+ AI — a legal AI tool for drafting, research and analysis — in 2023. It was built on a standard RAG framework and hybrid vector search that grounds responses in LexisNexis’ trusted, authoritative knowledge base.
The company then released its personal legal assistant, Protégé, in 2024. This agent incorporates a knowledge graph layer on top of vector search to overcome a “key limitation” of pure semantic search. Although “very good” at retrieving contextually relevant content, semantic search “doesn’t always guarantee authoritative answers,” Chen said.
Initial semantic search returns what it deems relevant content; Chen’s team then traverses those returns across a “point of law” graph to further filter the most highly authoritative documents.
Going beyond this, Chen’s team is developing agentic graphs and accelerating automation so agents can plan and execute complex multi-step tasks.
For instance, self-directed “planner agents” for research Q&A break user questions into multiple sub-questions. Human users can review and edit these to further refine and personalize final answers. Meanwhile, a “reflection agent” handles transactional document drafting. It can “automatically, dynamically” criticize its initial draft, then incorporate that feedback and refine in real time.
However, Chen said that all of this is not to cut humans out of the mix; human experts and AI agents can “learn, reason and grow together.” “I see the future [as] a deeper collaboration between humans and AI.”
Watch the podcast to hear more about:
How LexisNexis’ acquisition of Henchman helped ground AI models with proprietary LexisNexis data and customer data;
The difference between deterministic and non-deterministic evaluation;
Why enterprises should identify KPIs and definitions of success before rushing to experimentation;
The importance of focusing on a “triangle” of key components: Cost, speed and quality.
You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.
From miles away across the desert, the Great Pyramid looks like a perfect, smooth geometry — a sleek triangle pointing to the stars. Stand at the base, however, and the illusion of smoothness vanishes. You see massive, jagged blocks of limestone. It is not a slope; it is a staircase.
Remember this the next time you hear futurists talking about exponential growth.
Intel’s co-founder Gordon Moore (Moore’s Law) is famously quoted for saying in 1965 that the transistor count on a microchip would double every year. Another Intel executive, David House, later revised this statement to “compute power doubling every 18 months.” For a while, Intel’s CPUs were the poster child of this law. That is, until the growth in CPU performance flattened out like a block of limestone.
If you zoom out, though, the next limestone block was already there — the growth in compute merely shifted from CPUs to the world of GPUs. Jensen Huang, Nvidia’s CEO, played a long game and came out a strong winner, building his own stepping stones initially with gaming, then computer visioniand recently, generative AI.
Technology growth is full of sprints and plateaus, and gen AI is not immune. The current wave is driven by transformer architecture. To quote Anthropic’s President and co-founder Dario Amodei: “The exponential continues until it doesn’t. And every year we’ve been like, ‘Well, this can’t possibly be the case that things will continue on the exponential’ — and then every year it has.”
But just as the CPU plateaued and GPUs took the lead, we are seeing signs that LLM growth is shifting paradigms again. For example, late in 2024, DeepSeek surprised the world by training a world-class model on an impossibly small budget, in part by using the MoE technique.
Do you remember where you recently saw this technique mentioned? Nvidia’s Rubin press release: The technology includes “…the latest generations of Nvidia NVLink interconnect technology… to accelerate agentic AI, advanced reasoning and massive-scale MoE model inference at up to 10x lower cost per token.”
Jensen knows that achieving that coveted exponential growth in compute doesn’t come from pure brute force anymore. Sometimes you need to shift the architecture entirely to place the next stepping stone.
This long introduction brings us to Groq.
The biggest gains in AI reasoning capabilities in 2025 were driven by “inference time compute” — or, in lay terms, “letting the model think for a longer period of time.” But time is money. Consumers and businesses do not like waiting.
Groq comes into play here with its lightning-speed inference. If you bring together the architectural efficiency of models like DeepSeek and the sheer throughput of Groq, you get frontier intelligence at your fingertips. By executing inference faster, you can “out-reason” competitive models, offering a “smarter” system to customers without the penalty of lag.
For the last decade, the GPU has been the universal hammer for every AI nail. You use H100s to train the model; you use H100s (or trimmed-down versions) to run the model. But as models shift toward “System 2” thinking — where the AI reasons, self-corrects and iterates before answering — the computational workload changes.
Training requires massive parallel brute force. Inference, especially for reasoning models, requires faster sequential processing. It must generate tokens instantly to facilitate complex chains of thought without the user waiting minutes for an answer. Groq’s LPU (Language Processing Unit) architecture removes the memory bandwidth bottleneck that plagues GPUs during small-batch inference, delivering lightning-fast inference.
For the C-Suite, this potential convergence solves the “thinking time” latency crisis. Consider the expectations from AI agents: We want them to autonomously book flights, code entire apps and research legal precedent. To do this reliably, a model might need to generate 10,000 internal “thought tokens” to verify its own work before it outputs a single word to the user.
On a standard GPU: 10,000 thought tokens might take 20 to 40 seconds. The user gets bored and leaves.
On Groq: That same chain of thought happens in less than 2 seconds.
If Nvidia integrates Groq’s technology, they solve the “waiting for the robot to think” problem. They preserve the magic of AI. Just as they moved from rendering pixels (gaming) to rendering intelligence (gen AI), they would now move to rendering reasoning in real-time.
Furthermore, this creates a formidable software moat. Groq’s biggest hurdle has always been the software stack; Nvidia’s biggest asset is CUDA. If Nvidia wraps its ecosystem around Groq’s hardware, they effectively dig a moat so wide that competitors cannot cross it. They would offer the universal platform: The best environment to train and the most efficient environment to run (Groq/LPU).
Consider what happens when you couple that raw inference power with a next-generation open source model (like the rumored DeepSeek 4): You get an offering that would rival today’s frontier models in cost, performance and speed. That opens up opportunities for Nvidia, from directly entering the inference business with its own cloud offering, to continuing to power a growing number of exponentially growing customers.
Returning to our opening metaphor: The “exponential” growth of AI is not a smooth line of raw FLOPs; it is a staircase of bottlenecks being smashed.
Block 1: We couldn’t calculate fast enough. Solution: The GPU.
Block 2: We couldn’t train deep enough. Solution: Transformer architecture.
Block 3: We can’t “think” fast enough. Solution: Groq’s LPU.
Jensen Huang has never been afraid to cannibalize his own product lines to own the future. By validating Groq, Nvidia wouldn’t just be buying a faster chip; they would be bringing next-generation intelligence to the masses.
Andrew Filev, founder and CEO of Zencoder
The average Fortune 1000 company has more than 30,000 employees and engineering, sales and marketing teams with hundreds of members. Equally large teams exist in government, science and defense organizations. And yet, research shows that the ideal size…
When an AI agent visits a website, it’s essentially a tourist who doesn’t speak the local language. Whether built on LangChain, Claude Code, or the increasingly popular OpenClaw framework, the agent is reduced to guessing which buttons to press: scraping raw HTML, firing off screenshots to multimodal models, and burning through thousands of tokens just to figure out where a search bar is.
That era may be ending. Earlier this week, the Google Chrome team launched WebMCP — Web Model Context Protocol — as an early preview in Chrome 146 Canary. WebMCP, which was developed jointly by engineers at Google and Microsoft and incubated through the W3C’s Web Machine Learning community group, is a proposed web standard that lets any website expose structured, callable tools directly to AI agents through a new browser API: navigator.modelContext.
The implications for enterprise IT are significant. Instead of building and maintaining separate back-end MCP servers in Python or Node.js to connect their web applications to AI platforms, development teams can now wrap their existing client-side JavaScript logic into agent-readable tools — without re-architecting a single page.
The cost and reliability issues with current approaches to web-agent (browser agents) interaction are well understood by anyone who has deployed them at scale. The two dominant methods — visual screen-scraping and DOM parsing — both suffer from fundamental inefficiencies that directly affect enterprise budgets.
With screenshot-based approaches, agents pass images into multimodal models (like Claude and Gemini) and hope the model can identify not only what is on the screen, but where buttons, form fields, and interactive elements are located. Each image consumes thousands of tokens and can have a long latency. With DOM-based approaches, agents ingest raw HTML and JavaScript — a foreign language full of various tags, CSS rules, and structural markup that is irrelevant to the task at hand but still consumes context window space and inference cost.
In both cases, the agent is translating between what the website was designed for (human eyes) and what the model needs (structured data about available actions). A single product search that a human completes in seconds can require dozens of sequential agent interactions — clicking filters, scrolling pages, parsing results — each one an inference call that adds latency and cost.
WebMCP proposes two complementary APIs that serve as a bridge between websites and AI agents.
The Declarative API handles standard actions that can be defined directly in existing HTML forms. For organizations with well-structured forms already in production, this pathway requires minimal additional work; by adding tool names and descriptions to existing form markup, developers can make those forms callable by agents. If your HTML forms are already clean and well-structured, you are probably already 80% of the way there.
The Imperative API handles more complex, dynamic interactions that require JavaScript execution. This is where developers define richer tool schemas — conceptually similar to the tool definitions sent to the OpenAI or Anthropic API endpoints, but running entirely client-side in the browser. Through the registerTool(), a website can expose functions like searchProducts(query, filters) or orderPrints(copies, page_size) with full parameter schemas and natural language descriptions.
The key insight is that a single tool call through WebMCP can replace what might have been dozens of browser-use interactions. An e-commerce site that registers a searchProducts tool lets the agent make one structured function call and receive structured JSON results, rather than having the agent click through filter dropdowns, scroll through paginated results, and screenshot each page.
For IT decision makers evaluating agentic AI deployments, WebMCP addresses three persistent pain points simultaneously.
Cost reduction is the most immediately quantifiable benefit. By replacing sequences of screenshot captures, multimodal inference calls, and iterative DOM parsing with single structured tool calls, organizations can expect significant reductions in token consumption.
Reliability improves because agents are no longer guessing about page structure. When a website explicitly publishes a tool contract — “here are the functions I support, here are their parameters, here is what they return” — the agent operates with certainty rather than inference. Failed interactions due to UI changes, dynamic content loading, or ambiguous element identification are largely eliminated for any interaction covered by a registered tool.
Development velocity accelerates because web teams can leverage their existing front-end JavaScript rather than standing up separate backend infrastructure. The specification emphasizes that any task a user can accomplish through a page’s UI can be made into a tool by reusing much of the page’s existing JavaScript code. Teams do not need to learn new server frameworks or maintain separate API surfaces for agent consumers.
A critical architectural decision separates WebMCP from the fully autonomous agent paradigm that has dominated recent headlines. The standard is explicitly designed around cooperative, human-in-the-loop workflows — not unsupervised automation.
According to Khushal Sagar, a staff software engineer for Chrome, the WebMCP specification identifies three pillars that underpin this philosophy.
Context: All the data agents need to understand what the user is doing, including content that is often not currently visible on screen.
Capabilities: Actions the agent can take on the user’s behalf, from answering questions to filling out forms.
Coordination: Controlling the handoff between user and agent when the agent encounters situations it cannot resolve autonomously.
The specification’s authors at Google and Microsoft illustrate this with a shopping scenario: a user named Maya asks her AI assistant to help find an eco-friendly dress for a wedding. The agent suggests vendors, opens a browser to a dress site, and discovers the page exposes WebMCP tools like getDresses() and showDresses(). When Maya’s criteria go beyond the site’s basic filters, the agent calls those tools to fetch product data, uses its own reasoning to filter for “cocktail-attire appropriate,” and then calls showDresses()to update the page with only the relevant results. It’s a fluid loop of human taste and agent capability, exactly the kind of collaborative browsing that WebMCP is designed to enable.
This is not a headless browsing standard. The specification explicitly states that headless and fully autonomous scenarios are non-goals. For those use cases, the authors point to existing protocols like Google’s Agent-to-Agent (A2A) protocol. WebMCP is about the browser — where the user is present, watching, and collaborating.
WebMCP is not a replacement for Anthropic’s Model Context Protocol, despite sharing a conceptual lineage and a portion of its name. It does not follow the JSON-RPC specification that MCP uses for client-server communication. Where MCP operates as a back-end protocol connecting AI platforms to service providers through hosted servers, WebMCP operates entirely client-side within the browser.
The relationship is complementary. A travel company might maintain a back-end MCP server for direct API integrations with AI platforms like ChatGPT or Claude, while simultaneously implementing WebMCP tools on its consumer-facing website so that browser-based agents can interact with its booking flow in the context of a user’s active session. The two standards serve different interaction patterns without conflict.
The distinction matters for enterprise architects. Back-end MCP integrations are appropriate for service-to-service automation where no browser UI is needed. WebMCP is appropriate when the user is present and the interaction benefits from shared visual context — which describes the majority of consumer-facing web interactions that enterprises care about.
WebMCP is currently available in Chrome 146 Canary behind the “WebMCP for testing” flag at chrome://flags. Developers can join the Chrome Early Preview Program for access to documentation and demos. Other browsers have not yet announced implementation timelines, though Microsoft’s active co-authorship of the specification suggests Edge support is likely.
Industry observers expect formal browser announcements by mid-to-late 2026, with Google Cloud Next and Google I/O as probable venues for broader rollout announcements. The specification is transitioning from community incubation within the W3C to a formal draft — a process that historically takes months but signals serious institutional commitment.
The comparison that Sagar has drawn is instructive: WebMCP aims to become the USB-C of AI agent interactions with the web. A single, standardized interface that any agent can plug into, replacing the current tangle of bespoke scraping strategies and fragile automation scripts.
Whether that vision is realized depends on adoption — by both browser vendors and web developers. But with Google and Microsoft jointly shipping code, the W3C providing institutional scaffolding, and Chrome 146 already running the implementation behind a flag, WebMCP has cleared the most difficult hurdle any web standard faces: getting from proposal to working software.
Lowering the cost of inference is typically a combination of hardware and software. A new analysis released Thursday by Nvidia details how four leading inference providers are reporting 4x to 10x reductions in cost per token.
The dramatic cost reductions were achieved using Nvidia’s Blackwell platform with open-source models. Production deployment data from Baseten, DeepInfra, Fireworks AI and Together AI shows significant cost improvements across healthcare, gaming, agentic chat, and customer service as enterprises scale AI from pilot projects to millions of users.
The 4x to 10x cost reductions reported by inference providers required combining Blackwell hardware with two other elements: optimized software stacks and switching from proprietary to open-source models that now match frontier-level intelligence. Hardware improvements alone delivered 2x gains in some deployments, according to the analysis. Reaching larger cost reductions required adopting low-precision formats like NVFP4 and moving away from closed source APIs that charge premium rates.
The economics prove counterintuitive. Reducing inference costs requires investing in higher-performance infrastructure because throughput improvements translate directly into lower per-token costs.
“Performance is what drives down the cost of inference,” Dion Harris, senior director of HPC and AI hyperscaler solutions at Nvidia, told VentureBeat in an exclusive interview. “What we’re seeing in inference is that throughput literally translates into real dollar value and driving down the cost.”
Nvidia detailed four customer deployments in a blog post showing how the combination of Blackwell infrastructure, optimized software stacks and open-source models delivers cost reductions across different industry workloads. The case studies span high-volume applications where inference economics directly determines business viability.
Sully.ai cut healthcare AI inference costs by 90% (a 10x reduction) while improving response times 65% by switching from proprietary models to open-source models running on Baseten’s Blackwell-powered platform, according to Nvidia. The company returned over 30 million minutes to physicians by automating medical coding and note-taking tasks that previously required manual data entry.
Nvidia also reported that Latitude reduced gaming inference costs 4x for its AI Dungeon platform by running large mixture-of-experts (MoE) models on DeepInfra’s Blackwell deployment. Cost per million tokens dropped from 20 cents on Nvidia’s previous Hopper platform to 10 cents on Blackwell, then to 5 cents after adopting Blackwell’s native NVFP4 low-precision format. Hardware alone delivered 2x improvement, but reaching 4x required the precision format change.
Sentient Foundation achieved 25% to 50% better cost efficiency for its agentic chat platform using Fireworks AI’s Blackwell-optimized inference stack, according to Nvidia. The platform orchestrates complex multi-agent workflows and processed 5.6 million queries in a single week during its viral launch while maintaining low latency.
Nvidia said Decagon saw 6x cost reduction per query for AI-powered voice customer support by running its multimodel stack on Together AI’s Blackwell infrastructure. Response times stayed under 400 milliseconds, even when processing thousands of tokens per query, critical for voice interactions where delays cause users to hang up or lose trust.
The range from 4x to 10x cost reductions across deployments reflects different combinations of technical optimizations rather than just hardware differences. Three factors emerge as primary drivers: precision format adoption, model architecture choices, and software stack integration.
Precision formats show the clearest impact. Latitude’s case demonstrates this directly. Moving from Hopper to Blackwell delivered 2x cost reduction through hardware improvements. Adopting NVFP4, Blackwell’s native low-precision format, doubled that improvement to 4x total. NVFP4 reduces the number of bits required to represent model weights and activations, allowing more computation per GPU cycle while maintaining accuracy. The format works particularly well for MoE models where only a subset of the model activates for each inference request.
Model architecture matters. MoE models, which activate different specialized sub-models based on input, benefit from Blackwell’s NVLink fabric that enables rapid communication between experts. “Having those experts communicate across that NVLink fabric allows you to reason very quickly,” Harris said. Dense models that activate all parameters for every inference don’t leverage this architecture as effectively.
Software stack integration creates additional performance deltas. Harris said that Nvidia’s co-design approach — where Blackwell hardware, NVL72 scale-up architecture, and software like Dynamo and TensorRT-LLM are optimized together — also makes a difference. Baseten’s deployment for Sully.ai used this integrated stack, combining NVFP4, TensorRT-LLM and Dynamo to achieve the 10x cost reduction. Providers running alternative frameworks like vLLM may see lower gains.
Workload characteristics matter. Reasoning models show particular advantages on Blackwell because they generate significantly more tokens to reach better answers. The platform’s ability to process these extended token sequences efficiently through disaggregated serving, where context prefill and token generation are handled separately, makes reasoning workloads cost-effective.
Teams evaluating potential cost reductions should examine their workload profiles against these factors. High token generation workloads using mixture-of-experts models with the integrated Blackwell software stack will approach the 10x range. Lower token volumes using dense models on alternative frameworks will land closer to 4x.
While these case studies focus on Nvidia Blackwell deployments, enterprises have multiple paths to reducing inference costs. AMD’s MI300 series, Google TPUs, and specialized inference accelerators from Groq and Cerebras offer alternative architectures. Cloud providers also continue optimizing their inference services. The question isn’t whether Blackwell is the only option but whether the specific combination of hardware, software and models fits particular workload requirements.
Enterprises considering Blackwell-based inference should start by calculating whether their workloads justify infrastructure changes.
“Enterprises need to work back from their workloads and use case and cost constraints,” Shruti Koparkar, AI product marketing at Nvidia, told VentureBeat.
The deployments achieving 6x to 10x improvements all involved high-volume, latency-sensitive applications processing millions of requests monthly. Teams running lower volumes or applications with latency budgets exceeding one second should explore software optimization or model switching before considering infrastructure upgrades.
Testing matters more than provider specifications. Koparkar emphasizes that providers publish throughput and latency metrics, but these represent ideal conditions.
“If it’s a highly latency-sensitive workload, they might want to test a couple of providers and see who meets the minimum they need while keeping the cost down,” she said. Teams should run actual production workloads across multiple Blackwell providers to measure real performance under their specific usage patterns and traffic spikes rather than relying on published benchmarks.
The staged approach Latitude used provides a model for evaluation. The company first moved to Blackwell hardware and measured 2x improvement, then adopted NVFP4 format to reach 4x total reduction. Teams currently on Hopper or other infrastructure can test whether precision format changes and software optimization on existing hardware capture meaningful savings before committing to full infrastructure migrations. Running open source models on current infrastructure might deliver half the potential cost reduction without new hardware investments.
Provider selection requires understanding software stack differences. While multiple providers offer Blackwell infrastructure, their software implementations vary. Some run Nvidia’s integrated stack using Dynamo and TensorRT-LLM, while others use frameworks like vLLM. Harris acknowledges performance deltas exist between these configurations. Teams should evaluate what each provider actually runs and how it matches their workload requirements rather than assuming all Blackwell deployments perform identically.
The economic equation extends beyond cost per token. Specialized inference providers like Baseten, DeepInfra, Fireworks and Together offer optimized deployments but require managing additional vendor relationships. Managed services from AWS, Azure or Google Cloud may have higher per-token costs but lower operational complexity. Teams should calculate total cost including operational overhead, not just inference pricing, to determine which approach delivers better economics for their specific situation.
A team of researchers led by Nvidia has released DreamDojo, a new AI system designed to teach robots how to interact with the physical world by watching tens of thousands of hours of human video — a development that could significantly reduce the time and cost required to train the next generation of humanoid machines.
The research, published this month and involving collaborators from UC Berkeley, Stanford, the University of Texas at Austin, and several other institutions, introduces what the team calls “the first robot world model of its kind that demonstrates strong generalization to diverse objects and environments after post-training.”
At the core of DreamDojo is what the researchers describe as “a large-scale video dataset” comprising “44k hours of diverse human egocentric videos, the largest dataset to date for world model pretraining.” The dataset, called DreamDojo-HV, is a dramatic leap in scale — “15x longer duration, 96x more skills, and 2,000x more scenes than the previously largest dataset for world model training,” according to the project documentation.
The system operates in two distinct phases. First, DreamDojo “acquires comprehensive physical knowledge from large-scale human datasets by pre-training with latent actions.” Then it undergoes “post-training on the target embodiment with continuous robot actions” — essentially learning general physics from watching humans, then fine-tuning that knowledge for specific robot hardware.
For enterprises considering humanoid robots, this approach addresses a stubborn bottleneck. Teaching a robot to manipulate objects in unstructured environments traditionally requires massive amounts of robot-specific demonstration data — expensive and time-consuming to collect. DreamDojo sidesteps this problem by leveraging existing human video, allowing robots to learn from observation before ever touching a physical object.
One of the technical breakthroughs is speed. Through a distillation process, the researchers achieved “real-time interactions at 10 FPS for over 1 minute” — a capability that enables practical applications like live teleoperation and on-the-fly planning. The team demonstrated the system working across multiple robot platforms, including the GR-1, G1, AgiBot, and YAM humanoid robots, showing what they call “realistic action-conditioned rollouts” across “a wide range of environments and object interactions.”
The release comes at a pivotal moment for Nvidia’s robotics ambitions — and for the broader AI industry. At the World Economic Forum in Davos last month, CEO Jensen Huang declared that AI robotics represents a “once-in-a-generation” opportunity, particularly for regions with strong manufacturing bases. According to Digitimes, Huang has also stated that the next decade will be “a critical period of accelerated development for robotics technology.”
The financial stakes are enormous. Huang told CNBC’s “Halftime Report” on February 6 that the tech industry’s capital expenditures — potentially reaching $660 billion this year from major hyperscalers — are “justified, appropriate and sustainable.” He characterized the current moment as “the largest infrastructure buildout in human history,” with companies like Meta, Amazon, Google, and Microsoft dramatically increasing their AI spending.
That infrastructure push is already reshaping the robotics landscape. Robotics startups raised a record $26.5 billion in 2025, according to data from Dealroom. European industrial giants including Siemens, Mercedes-Benz, and Volvo have announced robotics partnerships in the past year, while Tesla CEO Elon Musk has claimed that 80 percent of his company’s future value will come from its Optimus humanoid robots.
For technical decision-makers evaluating humanoid robots, DreamDojo’s most immediate value may lie in its simulation capabilities. The researchers highlight downstream applications including “reliable policy evaluation without real-world deployment and model-based planning for test-time improvement” — capabilities that could let companies simulate robot behavior extensively before committing to costly physical trials.
This matters because the gap between laboratory demonstrations and factory floors remains significant. A robot that performs flawlessly in controlled conditions often struggles with the unpredictable variations of real-world environments — different lighting, unfamiliar objects, unexpected obstacles. By training on 44,000 hours of diverse human video spanning thousands of scenes and nearly 100 distinct skills, DreamDojo aims to build the kind of general physical intuition that makes robots adaptable rather than brittle.
The research team, led by Linxi “Jim” Fan, Joel Jang, and Yuke Zhu, with Shenyuan Gao and William Liang as co-first authors, has indicated that code will be released publicly, though a timeline was not specified.
Whether DreamDojo translates into commercial robotics products remains to be seen. But the research signals where Nvidia’s ambitions are heading as the company increasingly positions itself beyond its gaming roots. As Kyle Barr observed at Gizmodo earlier this month, Nvidia now views “anything related to gaming and the ‘personal computer'” as “outliers on Nvidia’s quarterly spreadsheets.”
The shift reflects a calculated bet: that the future of computing is physical, not just digital. Nvidia has already invested $10 billion in Anthropic and signaled plans to invest heavily in OpenAI’s next funding round. DreamDojo suggests the company sees humanoid robots as the next frontier where its AI expertise and chip dominance can converge.
For now, the 44,000 hours of human video at the heart of DreamDojo represent something more fundamental than a technical benchmark. They represent a theory — that robots can learn to navigate our world by watching us live in it. The machines, it turns out, have been taking notes.