NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents

NanoClaw, the open-source AI agent platform created by Gavriel Cohen, is partnering with the containerized development platform Docker to let teams run agents inside Docker Sandboxes, a move aimed at one of the biggest obstacles to enterprise adoption:…

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.

The obvious workaround is spot GPU markets — renting spare capacity to whoever needs it. But spot instances mean the cloud vendor is still the one doing the renting, and engineers buying that capacity are still paying for raw compute with no inference stack attached.

FriendliAI’s answer is different: run inference directly on the unused hardware, optimize for token throughput, and split the revenue with the operator. FriendliAI was founded by Byung-Gon Chun, the researcher whose paper on continuous batching became foundational to vLLM, the open source inference engine used across most production deployments today.

Chun spent over a decade as a professor at Seoul National University studying efficient execution of machine learning models at scale. That research produced a paper called Orca, which introduced continuous batching. The technique processes inference requests dynamically rather than waiting to fill a fixed batch before executing. It is now industry standard and is the core mechanism inside vLLM.

This week, FriendliAI is launching a new platform called InferenceSense. Just as publishers use Google AdSense to monetize unsold ad inventory, neocloud operators can use InferenceSense to fill unused GPU cycles with paid AI inference workloads and collect a share of the token revenue. The operator’s own jobs always take priority — the moment a scheduler reclaims a GPU, InferenceSense yields.

“What we are providing is that instead of letting GPUs be idle, by running inferences they can monetize those idle GPUs,” Chun told VentureBeat.

How a Seoul National University lab built the engine inside vLLM

Chun founded FriendliAI in 2021, before most of the industry had shifted attention from training to inference. The company’s primary product is a dedicated inference endpoint service for AI startups and enterprises running open-weight models. FriendliAI also appears as a deployment option on Hugging Face alongside Azure, AWS and GCP, and currently supports more than 500,000 open-weight models from the platform.

InferenceSense now extends that inference engine to the capacity problem GPU operators face between workloads.

How it works

InferenceSense runs on top of Kubernetes, which most neocloud operators are already using for resource orchestration. An operator allocates a pool of GPUs to a Kubernetes cluster managed by FriendliAI — declaring which nodes are available and under what conditions they can be reclaimed. Idle detection runs through Kubernetes itself.

“We have our own orchestrator that runs on the GPUs of these neocloud — or just cloud — vendors,” Chun said. “We definitely take advantage of Kubernetes, but the software running on top is a really highly optimized inference stack.”

When GPUs are unused, InferenceSense spins up isolated containers serving paid inference workloads on open-weight models including DeepSeek, Qwen, Kimi, GLM and MiniMax. When the operator’s scheduler needs hardware back, the inference workloads are preempted and GPUs are returned. FriendliAI says the handoff happens within seconds.

Demand is aggregated through FriendliAI’s direct clients and through inference aggregators like OpenRouter. The operator supplies the capacity; FriendliAI handles the demand pipeline, model optimization and serving stack. There are no upfront fees and no minimum commitments. A real-time dashboard shows operators which models are running, tokens being processed and revenue accrued.

Why token throughput beats raw capacity rental

Spot GPU markets from providers like CoreWeave, Lambda Labs and RunPod involve the cloud vendor renting out its own hardware to a third party. InferenceSense runs on hardware the neocloud operator already owns, with the operator defining which nodes participate and setting scheduling agreements with FriendliAI in advance. The distinction matters: spot markets monetize capacity, InferenceSense monetizes tokens.

Token throughput per GPU-hour determines how much InferenceSense can actually earn during unused windows. FriendliAI claims its engine delivers two to three times the throughput of a standard vLLM deployment, though Chun notes the figure varies by workload type.

Most competing inference stacks are built on Python-based open source frameworks. FriendliAI’s engine is written in C++ and uses custom GPU kernels rather than Nvidia’s cuDNN library. The company has built its own model representation layer for partitioning and executing models across hardware, with its own implementations of speculative decoding, quantization and KV-cache management.

Since FriendliAI’s engine processes more tokens per GPU-hour than a standard vLLM stack, operators should generate more revenue per unused cycle than they could by standing up their own inference service. 

What AI engineers evaluating inference costs should watch

For AI engineers evaluating where to run inference workloads, the neocloud versus hyperscaler decision has typically come down to price and availability.

InferenceSense adds a new consideration: if neoclouds can monetize idle capacity through inference, they have more economic incentive to keep token prices competitive.

That is not a reason to change infrastructure decisions today — it is still early. But engineers tracking total inference cost should watch whether neocloud adoption of platforms like InferenceSense puts downward pressure on API pricing for models like DeepSeek and Qwen over the next 12 months.

“When we have more efficient suppliers, the overall cost will go down,” Chun said. “With InferenceSense we can contribute to making those models cheaper.”

Manufact raises $6.3M as MCP becomes the ‘USB-C for AI’ powering ChatGPT and Claude apps

For decades, software companies designed their products for a single type of customer: a human being staring at a screen. Every button, menu, and dashboard existed to translate a person’s intention into a machine’s action. But a small startup based in …

How to make your e-commerce product visible to AI agents? Use this new system trusted by L’Oréal, Unilever, Mars & Beiersdorf

For future-focused e-commerce brands, the primary customer is rapidly changing from a person behind a screen to the AI agents that said human customer deploys on their behalf to research and, if projections are correct, purchase the product on their behalf.

Investment banking and financial services giant Morgan Stanley, for instance, has published research suggesting 10-20% of the entire U.S. commerce spend could be agentic by 2030 — amounting to $190 billion to $385 billion.

In response to this seismic shift, the four-year-old agentic AI e-commerce startup Azoma has unveiled the Agentic Merchant Protocol (AMP).

This new framework is designed to provide high-volume retailers — such as grocery brands, electronics manufacturers, and fashion labels — with a “brand-friendly” anchor in an ecosystem increasingly dominated by autonomous shoppers

The idea is compelling and deceptively simple at its core: instead of the current status quo wherein merchants selling physical products online have to manually enter information about each product like SKUs and materials on different online marketplaces and product listing aggregators (e.g. Walmart, Amazon, Google Shopping, etc.) — brands can now just take all that information, put it into Azoma’s platform, and push it out everywhere it needs to go, including pages optimized for AI agents to search and retrieve the information for users, recommending them the products that fit their specific query.

Using technology to end the ‘black box’ era of early agentic AI e-commerce

Modern AI integration typically relies on siloed systems like OpenAI’s ACP or Google’s UCP. While these protocols manage the technical handshakes required for discovery and payment, they offer minimal oversight regarding brand integrity.

When an AI agent deployed by a customer “reasons” about their human consumer’s product query, it often synthesizes data from unverified corners of the web, such as Reddit or outdated affiliate sites, creating a “black box” effect where the brand’s intended messaging is lost.

AMP functions as a high-level “system of record” that bridges these disparate platforms. It allows companies to centralize their product intelligence—including legal guardrails and brand books—into a single, machine-native format.

“AMP breaks the foundations of traditional ecommerce,” states Max Sinclair, CEO of Azoma in a press release shared with VentureBeat ahead of the official announcement set for March 12 in London.”For decades, marketplaces like Amazon and Walmart acted as gatekeepers by controlling product detail pages, rankings, and distribution. Brands optimized a finite set of endpoints: PDPs, ads, search results. In an agentic world, those fixed pages no longer exist”.

The Azoma platform is specifically engineered for high-volume retailers and manufacturers of physical goods, with a primary concentration on the Consumer Packaged Goods (CPG) and fast-moving consumer goods (FMCG) sectors.

In an interview with VentureBeat, Sinclair explicitly distinguished the protocol’s utility from digital-only assets or services, noting that Azoma does not currently support NFTs, SaaS, or financial sectors like banking and insurance.

Whether facilitating the automated reordering of household staples like dishwasher soap or providing the “reasoning” data for high-consideration purchases like specialized supplements and ski hardware, the protocol serves as the digital connective tissue for brands whose value is rooted in the physical world.

Sovereignty in a multi-agent world

The protocol has already seen rapid adoption by a coalition of consumer goods giants, including L’Oréal, Unilever, Mars, Beiersdorf, and Reckitt. For these organizations, maintaining a consistent identity across various AI surfaces is an urgent priority.

“The fact that businesses like L’Oréal, Unilever, Mars & Beiersdorf have moved so quickly to adopt AMP tells you everything about the urgency they feel,” Sinclair remarked during a recent interview with VentureBeat. “These are companies that have spent decades building brand equity—they’re not about to hand control of how their products are represented to an AI black box”.

The AMP suite provides several critical levers for technical leaders:

  • Canonical Machine-Native Catalogues: Data structures designed specifically for LLM ingestion, enriched with persona-level signaling.

  • Programmatic Open Web Distribution: Ensuring that the data agents find on the open web matches the brand’s official documentation.

  • Agent-Agnostic Infrastructure: A design that prevents vendor lock-in by allowing brands to interface with any AI assistant or marketplace agent.

  • Performance Visibility: Tools to measure how agents “weigh” specific product attributes and verify compliance across the ecosystem.

Intelligence as a competitive moat

Beyond simple data distribution, Azoma provides an end-to-end workflow designed to secure market share in an AI-first economy.

The platform includes a proprietary “RegGuard™ Compliance” engine that automatically audits all generated content against strict brand guidelines and regulatory rules, such as FDA/DSHEA standards.

This automated oversight is paired with advanced citation tracking, allowing brands to see exactly which sources—ranging from Reddit and Quora to Wikipedia and YouTube—AI agents are citing when they make a recommendation.

This granular visibility has already yielded significant performance gains for early partners. The company reports that for the brand Ruroc, site traffic from ChatGPT has increased 14x, positioning them as the #1 recommended ski helmet brand in target geographies.

Similarly, clients have seen their share of mentions within specific retail agents like Amazon Rufus increase by 5x, while optimized content has demonstrated conversion lifts of up to 32% in split-testing.

By addressing technical “GEO blockers”—such as schema errors, crawlability gaps, and JavaScript-only content that traditional scrapers might miss—Azoma enables brands to transition from passive observation to active optimization of the AI conversation.

For rapidly growing firms like Perfect Ted, this visibility contributed to a +532% year-over-year revenue increase.

Fusing marketplace DNA with AI research

Azoma’s leadership team mirrors the intersection of high-scale retail and advanced computation.

Sinclair spent six years at Amazon, where he spearheaded the customer browse experience for the Singapore launch and managed the expansion of Amazon Grocery throughout the European Union.

This tenure at the world’s largest retailer highlighted the limitations of static listings in a dynamic, AI-driven market. “In the traditional e-commerce world… you’d write a product listing, publish it, and that would be that,” Sinclair observed. “In this new world, the product detail pages are generative… our customers lose all of the control”.

The technical backbone of the protocol is led by CTO Timur Luguev, a Fulbright Scholar and ERCIM Fellow with over a decade in multimodal deep learning.

Luguev views AMP as a way to indirectly influence the broader “online footprint” that informs AI reasoning. “We want to feed agents through, basically, indirectly, through open online footprint,” Luguev explained.

“That’s the focus: basically first define this kind of a standard, so centralize this information about the product and the brand in one place, then syndicated across the open surfaces, and then quantify and measure the impact”.

Licensing and market implications

Azoma is positioning its protocol as a neutral alternative to the walled-garden approaches of major tech providers. While search engines prioritize the consumer’s user experience, AMP focuses exclusively on the merchant’s requirement for predictability and accuracy.

Feature

Platform Protocols (ACP/UCP)

Azoma AMP

Primary Focus

Transaction execution

Brand control & multi-agent syndication

Data Reach

Internal ecosystem only

Cross-platform & Open Web

Brand Governance

No / Partial oversight

Full enterprise-defined control

Integration

Developer-centric APIs

Marketing & Commerce team-friendly

This shift effectively replaces traditional Search Engine Optimization (SEO) with Agentic Commerce Optimization (ACO).

Sinclair argues that this transition is driven by a shift in consumer trust. “You’re going to trust ChatGPT acting on over your data [more] than just putting into Google, ‘what mattress should I use’ and just clicking on whoever paid for that top link,” he says.

Pricing structure

Azoma’s commercial strategy is designed to bridge the gap between traditional enterprise software procurement and the performance-driven metrics of the AI era. Currently, the company utilizes a standard enterprise model, engaging with its global partners through annual contracts that typically fall within the six-to-seven-figure range. This structure is intended to align with the existing budgetary frameworks of large-scale organizations, providing the predictability required for multi-national department planning.

However, the company’s long-term vision involves a fundamental pivot toward an outcome-based pricing model. By integrating directly into a brand’s data and revenue flows, Azoma can measure the specific financial impact of every syndicated intervention across the agentic ecosystem.

“Our ambition is the future is kind of… taking a cut when they [agents] deliver value,” explained Sinclair.

This goal would effectively transition the protocol from a SaaS expense into a performance-based asset, mirroring how modern advertising platforms operate by tying costs directly to incremental revenue growth.

Outcome-based agentic e-commerce

Beyond mere data distribution, Azoma is looking toward a model where revenue is tied directly to successful agentic interactions. While current enterprise clients typically engage via traditional six-to-seven-figure annual contracts, the company’s long-term goal is outcome-based pricing.

“Our ambition is the future is kind of… taking a cut when they [agents] deliver value,” Sinclair stated. Luguev noted that by accessing a brand’s data flows, they can provide rigorous ROI forecasting. “We have access to our actions, and then we measure what actions actually made the biggest impact… provide them ability to forecast which campaigns which actions and where to syndicate based on this understanding”.

As the market prepares for the official unveiling of the protocol at the Agentic Commerce Optimization event in London on March 12th, the message to the C-suite is clear: the “fixed” product page is dead. “When L’Oréal, Unilever and Mars move together in the same direction, the rest of the market pays attention,” Sinclair concluded.

The limits of bubble thinking: How AI breaks every historical analogy

It’s always the same story: A new technology appears and everyone starts talking about how it’ll change everything. Then capital rushes in, companies form overnight, and valuations climb faster than anyone can justify. Then, many many months later, the…

Intuit is betting its 40 years of small business data can outlast the SaaSpocalypse

Intuit has lost more than 40% of its market cap since the beginning of the year. It’s not alone. Many established SaaS players have seen their stock prices fall in recent months, including Adobe and IBM — the latter experiencing its most significant one-day drop (roughly $40 billion) with Anthropic’s announcement that Claude could now read, analyze and translate legacy COBOL into modern languages like Java and Python. The market has a name for it: the SaaSpocalypse.

The argument from investors and market watchers: AI agents can now do bookkeeping, file taxes and reconcile accounts — without a human ever touching software. For instance, instead of a human using QuickBooks to categorize transactions, Claude Cowork can access financial data, apply tax logic and autonomously prepare documents. Rather than using TurboTax, agentic AI tools can handle complex tax logic and even file taxes. In lieu of QuickBooks, automated agents can handle multi-step bookkeeping tasks (like lining up receipts).

Why investors are repricing SaaS

Intuit has been among the hardest-hit, with its market capitalization now sitting at around $106 billion

The catalyst has been the emergence of fully agentic, no-code AI assistants like Claude Cowork and open-source tools like OpenClaw, whose founder was recently acqui-hired by OpenAI. Fears are that these cheaper service-as-a-service offerings (or service-as-software, or results-as-a-service, depending on who you ask) will upend pay-per-seat subscriptions; whereas traditional SaaS delivers a tool (software) for users to complete a task, service-as-a-service delivers a fully-automated outcome.

For instance, Anthropic’s Cowork platform includes finance capabilities that allow the agent to read financial files and turn them into structured models, tables and reports. 

“The advantage is that I am abstracting away the complexity of my business operations,” said Brian Jackson, principal research director at Info-Tech Research Group (who prefers to call it “service-as-software”). “To hear about a model where you only pay when you get the outcome that you want, that’s very appealing.”

This emerging capability is in line with past technological advancements, he pointed out: IT departments used to be in charge of running infrastructure, but cloud computing came along to abstract away that management. Then, SaaS tools emerged to orchestrate the application layer. Now users manage their work — inputting data, filling out forms, creating analytics dashboards — within SaaS apps. 

“So the next step is automated intelligence,” Jackson said. “Instead of having people do those things, we’ll just have AI do them.” Essentially, it could become a headless system without a UI; users simply let it run and don’t think about it. 

This new concept comes at a time when enterprises are becoming fed up with the SaaS business model, he noted. Lock-in is frustrating, fees continue to go up, seats expand, and “it becomes this unwieldy operating cost,” Jackson said. “And it’s not always guaranteed to drive value, it doesn’t guarantee ROI at all.”

Why Intuit got hit the hardest

Intuit, which was founded in 1983, now serves around 100 million customers with a suite of products that, in addition to QuickBooks and TurboTax, include Mailchimp and Credit Karma. But these core offerings are now considered low-hanging fruit for AI, potentially endangering the company whose revenue model relies heavily on per-seat/per-user subscriptions.

Intuit’s CEO Sasan Goodarzi has recently shrugged off SaaSpocalypse claims, calling data the “most important moat” in a Semafor interview. 

Marianna Tessel, EVP and GM for Intuit’s small business group, takes the same stance. Yes, Claude Cowork and similar agentic tools are “robust” tools, she noted, but Intuit has “persistent” and “durable” advantages. 

Notably: First-party data. Customers generate various types of data on Intuit’s systems, whether it’s by creating an invoice, importing ledgers or performing various finance projects. Then there’s third-party data, which is generated through Intuit’s connections with 24,000-plus banks, e-commerce sites and other entities, Tessel pointed out. 

AI agents simply do not have access to this “vastness” of data, she contended. Further, Intuit knows how to organize and use data, such as stitching together information across customer segments to provide market snapshots. “We understand this data, we know how to turn it into action,” Tessel argued. 

She also doubled down on Intuit’s deep understanding of its customers. Rather than a chatbot that can process and act on numbers and figures, “we know what small businesses face,” she said, whether it’s their concerns around bookkeeping and payroll, or their struggles with hiring. 

“We’ve been in business for over 40 years,” Tessel noted. “We have a lot of know-how that is very specific.”

Other SaaS companies stand staunchly behind this argument. Jon Aniano, Zendesk’s SVP of product and CRM applications, pointed out that his company serves 80,000 customers and deeply understands their needs. “We actually see [general purpose agentic tools] at a disadvantage because they’ve gotta go customer by customer and learn things that we’ve learned over the course of 20 years,” he said at a recent VentureBeat event

The data moat argument does hold up, noted Info-Tech’s Jackson. He also pointed out that, realistically, the SaaS market is projected to grow at a “pretty good clip” in the years ahead. “Could that change very quickly? It’s possible, but it’s unlikely,” he said. 

Also, SaaS is so entrenched in modern business, and pivoting to something entirely new can be a challenge. Even disruptive and compelling technologies like AI can take time to deploy at scale because enterprises have to recraft their workflows, Jackson noted.

“You have workers in place. You have departments in place. It just takes effort and time to change the processes and the expectations around these things,” he said, although “the appetite will definitely be there.”

How Intuit is betting on what agents can’t replicate 

To get ahead of this, Intuit recently signed a multi-year partnership with Anthropic to bring AI agents to mid-market businesses. Using Anthropic’s Claude Agent SDK on the Intuit platform, enterprises will be able to build and customize agents. On the other end, Intuit’s tools can be surfaced directly inside Anthropic products such as Cowork, Claude for Enterprise, and Claude.ai through Model Context Protocol (MCP) integrations with TurboTax, Credit Karma, QuickBooks and Mailchimp.

This builds on Intuit’s previous rollout of Intuit Intelligence, which features specialized AI agents for sales, tax, payroll, accounting and project management. Users can query and interact with their financial data in natural language, automate tasks and generate dynamic reports or KPI scorecards. 

“They have the data, they have the interface, and now they’re introducing themselves as an orchestration layer,” Jackson said of moves like this by large SaaS players. “We can be the place where you build your agents and manage them.”

To this point, Tessel calls Intuit “a well-run company” that can react with speed. Her team keeps up with orchestration advancements, reads academic papers and is “constantly learning” about new technologies. “We’re on it,” she said.”

Ultimately, companies must be “awake and aware right now,” she emphasized. As she put it: “What’s the pivot of the day? How many times did you pivot? Are you experimenting?”

Zendesk’s Aniano agreed that there are “cool new ways of developing software,” and acknowledged that he “lives” 90 to 120 minutes of his day inside Claude Code. Companies that can make the “mental shift” to building software in new ways can create a level playing field between incumbents and startups. 

One thing that’ll be interesting to see is how quickly SaaS providers offer MCP plugins or build their own within their software suites, Jackson noted. “How good will these SaaS providers be at supporting AI interoperability?” he said. “And what ways will they try to create friction or make it harder for enterprises to abandon their interface?”

Shadow mode, drift alerts and audit logs: Inside the modern audit loop

Traditional software governance often uses static compliance checklists, quarterly audits and after-the-fact reviews. But this method can’t keep up with AI systems that change in real time. A machine learning (ML) model might retrain or drift between quarterly operational syncs. This means that, by the time an issue is discovered, hundreds of bad decisions could already have been made. This can be almost impossible to untangle. 

In the fast-paced world of AI, governance must be inline, not an after-the-fact compliance review. In other words, organizations must adopt what I call an “audit loop”: A continuous, integrated compliance process that operates in real-time alongside AI development and deployment, without halting innovation.

This article explains how to implement such continuous AI compliance through shadow mode rollouts, drift and misuse monitoring and audit logs engineered for direct legal defensibility.

From reactive checks to an inline “audit loop”

When systems moved at the speed of people, it made sense to do compliance checks every so often. But AI doesn’t wait for the next review meeting. The change to an inline audit loop means audits will no longer occur just once in a while; they happen all the time. Compliance and risk management should be “baked in” to the AI lifecycle from development to production, rather than just post-deployment. This means establishing live metrics and guardrails that monitor AI behavior as it occurs and raise red flags as soon as something seems off.

For instance, teams can set up drift detectors that automatically alert when a model’s predictions go off course from the training distribution, or when confidence scores fall below acceptable levels. Governance is no longer just a set of quarterly snapshots; it’s a streaming process with alerts that go off in real time when a system goes outside of its defined confidence bands.

Cultural shift is equally important: Compliance teams must act less like after-the-fact auditors and more like AI co-pilots. In practice, this might mean compliance and AI engineers working together to define policy guardrails and continuously monitor key indicators. With the right tools and mindset, real-time AI governance can “nudge” and intervene early, helping teams course-correct without slowing down innovation.

In fact, when done well, continuous governance builds trust rather than friction, providing shared visibility into AI operations for both builders and regulators, instead of unpleasant surprises after deployment. The following strategies illustrate how to achieve this balance.

Shadow mode rollouts: Testing compliance safely

One effective framework for continuous AI compliance is “shadow mode” deployments with new models or agent features. This means a new AI system is deployed in parallel with the existing system, receiving real production inputs but not influencing real decisions or user-facing outputs. The legacy model or process continues to handle decisions, while the new AI’s outputs are captured only for analysis. This provides a safe sandbox to vet the AI’s behavior under real conditions.

According to global law firm Morgan Lewis: “Shadow-mode operation requires the AI to run in parallel without influencing live decisions until its performance is validated,” giving organizations a safe environment to test changes.

Teams can discover problems early by comparing the shadow model’s decisions to expectations (the current model’s decisions). For instance, when a model is running in shadow mode, they can check to see if its inputs and predictions differ from those of the current production model or the patterns seen in training. Sudden changes could indicate bugs in the data pipeline, unexpected bias or drops in performance.

In short, shadow mode is a way to check compliance in real time: It ensures that the model handles inputs correctly and meets policy standards (accuracy, fairness) before it is fully released. One AI security framework showed how this method worked: Teams first ran AI in shadow mode (AI makes suggestions but doesn’t act on its own), then compared AI and human inputs to determine trust. They only let the AI suggest actions with human approval after it was reliable.

For instance, Prophet Security eventually let the AI make low-risk decisions on its own. Using phased rollouts gives people confidence that an AI system meets requirements and works as expected, without putting production or customers at risk during testing.

Real-time drift and misuse detection

Even after an AI model is fully deployed, the compliance job is never “done.” Over time, AI systems can drift, meaning that their performance or outputs change due to new data patterns, model retraining or bad inputs. They can also be misused or lead to results that go against policy (for example, inappropriate content or biased decisions) in unexpected ways.

To remain compliant, teams must set up monitoring signals and processes to catch these issues as they happen. In SLA monitoring, they may only check for uptime or latency. In AI monitoring, however, the system must be able to tell when outputs are not what they should be. For example, if a model suddenly starts giving biased or harmful results. This means setting “confidence bands” or quantitative limits for how a model should behave and setting automatic alerts when those limits are crossed.

Some signals to monitor include:

  • Data or concept drift: When input data distributions change significantly or model predictions diverge from training-time patterns. For example, a model’s accuracy on certain segments might drop as the incoming data shifts, a sign to investigate and possibly retrain.

  • Anomalous or harmful outputs: When outputs trigger policy violations or ethical red flags. An AI content filter might flag if a generative model produces disallowed content, or a bias monitor might detect if decisions for a protected group begin to skew negatively. Contracts for AI services now often require vendors to detect and address such noncompliant results promptly.

  • User misuse patterns: When unusual usage behavior suggests someone is trying to manipulate or misuse the AI. For instance, rapid-fire queries attempting prompt injection or adversarial inputs could be automatically flagged by the system’s telemetry as potential misuse.

When a drift or misuse signal crosses a critical threshold, the system should support “intelligent escalation” rather than waiting for a quarterly review. In practice, this could mean triggering an automated mitigation or immediately alerting a human overseer. Leading organizations build in fail-safes like kill-switches, or the ability to suspend an AI’s actions the moment it behaves unpredictably or unsafely.

For example, a service contract might allow a company to instantly pause an AI agent if it’s outputting suspect results, even if the AI provider hasn’t acknowledged a problem. Likewise, teams should have playbooks for rapid model rollback or retraining windows: If drift or errors are detected, there’s a plan to retrain the model (or revert to a safe state) within a defined timeframe. This kind of agile response is crucial; it recognizes that AI behavior may drift or degrade in ways that cannot be fixed with a simple patch, so swift retraining or tuning is part of the compliance loop.

By continuously monitoring and reacting to drift and misuse signals, companies transform compliance from a periodic audit to an ongoing safety net. Issues are caught and addressed in hours or days, not months. The AI stays within acceptable bounds, and governance keeps pace with the AI’s own learning and adaptation, rather than trailing behind it. This not only protects users and stakeholders; it gives regulators and executives peace of mind that the AI is under constant watchful oversight, even as it evolves.

Audit logs designed for legal defensibility

Continuous compliance also means continuously documenting what your AI is doing and why. Robust audit logs demonstrate compliance, both for internal accountability and external legal defensibility. However, logging for AI requires more than simplistic logs. Imagine an auditor or regulator asking: “Why did the AI make this decision, and did it follow approved policy?” Your logs should be able to answer that.

A good AI audit log keeps a permanent, detailed record of every important action and decision AI makes, along with the reasons and context. Legal experts say these logs “provide detailed, unchangeable records of AI system actions with exact timestamps and written reasons for decisions.” They are important evidence in court. This means that every important inference, suggestion or independent action taken by AI should be recorded with metadata, such as timestamps, the model/version used, the input received, the output produced and (if possible) the reasoning or confidence behind that output.

Modern compliance platforms stress logging not only the result (“X action taken”) but also the rationale (“X action taken because conditions Y and Z were met according to policy”). These enhanced logs let an auditor see, for example, not just that an AI approved a user’s access, but that it was approved “based on continuous usage and alignment with the user’s peer group,” according to Attorney Aaron Hall.

Audit logs should also be well-organized and difficult to change if they are to be legally sound. Techniques like immutable storage or cryptographic hashing of logs ensure that records can’t be changed. Log data should be protected by access controls and encryption so that sensitive information, such as security keys and personal data, is hidden or protected while still being open.

In regulated industries, keeping these logs can show examiners that you are not only keeping track of AI’s outputs, but you are retaining records for review. Regulators are expecting companies to show more than that an AI was checked before it was released. They want to see that it is being monitored continuously and there is a forensic trail to analyze its behavior over time. This evidentiary backbone comes from complete audit trails that include data inputs, model versions and decision outputs. They make AI less of a “black box” and more of a system that can be tracked and held accountable.

If there is a disagreement or an event (for example, an AI made a biased choice that hurt a customer), these logs are your legal lifeline. They help you figure out what went wrong. Was it a problem with the data, a model drift or misuse? Who was in charge of the process? Did we stick to the rules we set?

Well-kept AI audit logs show that the company did its homework and had controls in place. This not only lowers the risk of legal problems but makes people more trusting of AI systems. With AI, teams and executives can be sure that every decision made is safe because it is open and accountable.

Inline governance as an enabler, not a roadblock

Implementing an “audit loop” of continuous AI compliance might sound like extra work, but in reality, it enables faster and safer AI delivery. By integrating governance into each stage of the AI lifecycle, from shadow mode trial runs to real-time monitoring to immutable logging, organizations can move quickly and responsibly. Issues are caught early, so they don’t snowball into major failures that require project-halting fixes later. Developers and data scientists can iterate on models without endless back-and-forth with compliance reviewers, because many compliance checks are automated and happen in parallel.

Rather than slowing down delivery, this approach often accelerates it: Teams spend less time on reactive damage control or lengthy audits, and more time on innovation because they are confident that compliance is under control in the background.

There are bigger benefits to continuous AI compliance, too. It gives end-users, business leaders and regulators a reason to believe that AI systems are being handled responsibly. When every AI decision is clearly recorded, watched and checked for quality, stakeholders are much more likely to accept AI solutions. This trust benefits the whole industry and society, not just individual businesses.

An audit-loop governance model can stop AI failures and ensure AI behavior is in line with moral and legal standards. In fact, strong AI governance benefits the economy and the public because it encourages innovation and protection. It can unlock AI’s potential in important areas like finance, healthcare and infrastructure without putting safety or values at risk. As national and international standards for AI change quickly, U.S. companies that set a good example by always following the rules are at the forefront of trustworthy AI.

People say that if your AI governance isn’t keeping up with your AI, it’s not really governance; it’s “archaeology.” Forward-thinking companies are realizing this and adopting audit loops. By doing so, they not only avoid problems but make compliance a competitive advantage, ensuring that faster delivery and better oversight go hand in hand.

Dhyey Mavani is working to accelerate gen AI and computational mathematics.

Editor’s note: The opinions expressed in this article are the authors’ personal opinions and do not reflect the opinions of their employers.

Is your startup’s check engine light on? Google Cloud’s VP explains what to do

Startup founders are being pushed to move faster than ever, using AI while facing tighter funding, rising infrastructure costs, and more pressure to show real traction early. Cloud credits, access to GPUs, and foundation models have made it easier to g…

When accurate AI is still dangerously incomplete

Typically, when building, training and deploying AI, enterprises prioritize accuracy. And that, no doubt, is important; but in highly complex, nuanced industries like law, accuracy alone isn’t enough. Higher stakes mean higher standards: Models outputs must be assessed for relevancy, authority, citation accuracy and hallucination rates. 

To tackle this immense task, LexisNexis has evolved beyond standard retrieval-augmented generation (RAG) to graph RAG and agentic graphs; it has also built out “planner” and “reflection” AI agents that parse requests and criticize their own outputs. 

“There’s no such [thing] as ‘perfect AI’ because you never get 100% accuracy or 100% relevancy, especially in complex, high stake domains like legal,” Min Chen, LexisNexis’ SVP and chief AI officer, acknowledges in a new VentureBeat Beyond the Pilot podcast. 

The goal is to manage that uncertainty as much as possible and translate it into consistent customer value. “At the end of the day, what matters most for us is the quality of the AI outcome, and that is a continuous journey of experimentation, iteration and improvement,” Chen said. 

Getting ‘complete’ answers to multi-faceted questions

To evaluate models and their outputs, Chen’s team has established more than a half-dozen “sub metrics” to measure “usefulness” based on several factors — authority, citation accuracy, hallucination rates — as well as “comprehensiveness.” This particular metric is designed to evaluate whether a gen AI response fully addressed all aspects of a users’ legal questions. 

“So it’s not just about relevancy,” Chen said. “Completeness speaks directly to legal reliability.”

For instance, a user may ask a question that requires an answer covering five distinct legal considerations. Gen AI may provide a response that accurately addresses three of these. But, while relevant, this partial answer is incomplete and, from a user perspective, insufficient. This can be misleading and pose real-life risks.

Or, for example, some citations may be semantically relevant to a user’s question, but they may point to arguments or instances that were ultimately overruled in court. “Our lawyers will consider them not citable,” Chen said. “If they’re not citable, they’re not useful.”

Moving beyond standard RAG

LexisNexis launched its flagship gen AI product, Lexis+ AI — a legal AI tool for drafting, research and analysis — in 2023. It was built on a standard RAG framework and hybrid vector search that grounds responses in LexisNexis’ trusted, authoritative knowledge base. 

The company then released its personal legal assistant, Protégé, in 2024. This agent incorporates a knowledge graph layer on top of vector search to overcome a “key limitation” of  pure semantic search. Although “very good” at retrieving contextually relevant content, semantic search “doesn’t always guarantee authoritative answers,” Chen said.

Initial semantic search returns what it deems relevant content; Chen’s team then traverses those returns across a “point of law” graph to further filter the most highly authoritative documents.

Going beyond this, Chen’s team is developing agentic graphs and accelerating automation so agents can plan and execute complex multi-step tasks. 

For instance, self-directed “planner agents” for research Q&A break user questions into multiple sub-questions. Human users can review and edit these to further refine and personalize final answers. Meanwhile, a “reflection agent” handles transactional document drafting. It can “automatically, dynamically” criticize its initial draft, then incorporate that feedback and refine in real time.

However, Chen said that all of this is not to cut humans out of the mix; human experts and AI agents can “learn, reason and grow together.” “I see the future [as] a deeper collaboration between humans and AI.”

Watch the podcast to hear more about: 

  • How LexisNexis’ acquisition of Henchman helped ground AI models with proprietary LexisNexis data and customer data; 

  • The difference between deterministic and non-deterministic evaluation; 

  • Why enterprises should identify KPIs and definitions of success before rushing to experimentation;

  • The importance of focusing on a “triangle” of key components: Cost, speed and quality.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Nvidia, Groq and the limestone race to real-time AI: Why enterprises win or lose here

​From miles away across the desert, the Great Pyramid looks like a perfect, smooth geometry — a sleek triangle pointing to the stars. Stand at the base, however, and the illusion of smoothness vanishes. You see massive, jagged blocks of limestone. It is not a slope; it is a staircase.

​Remember this the next time you hear futurists talking about exponential growth.

​Intel’s co-founder Gordon Moore (Moore’s Law) is famously quoted for saying in 1965 that the transistor count on a microchip would double every year. Another Intel executive, David House, later revised this statement to “compute power doubling every 18 months.” For a while, Intel’s CPUs were the poster child of this law. That is, until the growth in CPU performance flattened out like a block of limestone.

​If you zoom out, though, the next limestone block was already there — the growth in compute merely shifted from CPUs to the world of GPUs. Jensen Huang, Nvidia’s CEO, played a long game and came out a strong winner, building his own stepping stones initially with gaming, then computer visioniand recently, generative AI.

​The illusion of smooth growth

​Technology growth is full of sprints and plateaus, and gen AI is not immune. The current wave is driven by transformer architecture. To quote Anthropic’s President and co-founder Dario Amodei: “The exponential continues until it doesn’t. And every year we’ve been like, ‘Well, this can’t possibly be the case that things will continue on the exponential’ — and then every year it has.”

​But just as the CPU plateaued and GPUs took the lead, we are seeing signs that LLM growth is shifting paradigms again. For example, late in 2024, DeepSeek surprised the world by training a world-class model on an impossibly small budget, in part by using the MoE technique.

​Do you remember where you recently saw this technique mentioned? Nvidia’s Rubin press release: The technology includes “…the latest generations of Nvidia NVLink interconnect technology… to accelerate agentic AI, advanced reasoning and massive-scale MoE model inference at up to 10x lower cost per token.”

​Jensen knows that achieving that coveted exponential growth in compute doesn’t come from pure brute force anymore. Sometimes you need to shift the architecture entirely to place the next stepping stone.

​The latency crisis: Where Groq fits in

​This long introduction brings us to Groq.

​The biggest gains in AI reasoning capabilities in 2025 were driven by “inference time compute” — or, in lay terms, “letting the model think for a longer period of time.” But time is money. Consumers and businesses do not like waiting.

​Groq comes into play here with its lightning-speed inference. If you bring together the architectural efficiency of models like DeepSeek and the sheer throughput of Groq, you get frontier intelligence at your fingertips. By executing inference faster, you can “out-reason” competitive models, offering a “smarter” system to customers without the penalty of lag.

​From universal chip to inference optimization

​For the last decade, the GPU has been the universal hammer for every AI nail. You use H100s to train the model; you use H100s (or trimmed-down versions) to run the model. But as models shift toward “System 2” thinking — where the AI reasons, self-corrects and iterates before answering — the computational workload changes.

​Training requires massive parallel brute force. Inference, especially for reasoning models, requires faster sequential processing. It must generate tokens instantly to facilitate complex chains of thought without the user waiting minutes for an answer. ​Groq’s LPU (Language Processing Unit) architecture removes the memory bandwidth bottleneck that plagues GPUs during small-batch inference, delivering lightning-fast inference.

​The engine for the next wave of growth

​For the C-Suite, this potential convergence solves the “thinking time” latency crisis. Consider the expectations from AI agents: We want them to autonomously book flights, code entire apps and research legal precedent. To do this reliably, a model might need to generate 10,000 internal “thought tokens” to verify its own work before it outputs a single word to the user.

  • On a standard GPU: 10,000 thought tokens might take 20 to 40 seconds. The user gets bored and leaves.

  • On Groq: That same chain of thought happens in less than 2 seconds.

​If Nvidia integrates Groq’s technology, they solve the “waiting for the robot to think” problem. They preserve the magic of AI. Just as they moved from rendering pixels (gaming) to rendering intelligence (gen AI), they would now move to rendering reasoning in real-time.

​Furthermore, this creates a formidable software moat. Groq’s biggest hurdle has always been the software stack; Nvidia’s biggest asset is CUDA. If Nvidia wraps its ecosystem around Groq’s hardware, they effectively dig a moat so wide that competitors cannot cross it. They would offer the universal platform: The best environment to train and the most efficient environment to run (Groq/LPU).

Consider what happens when you couple that raw inference power with a next-generation open source model (like the rumored DeepSeek 4): You get an offering that would rival today’s frontier models in cost, performance and speed. That opens up opportunities for Nvidia, from directly entering the inference business with its own cloud offering, to continuing to power a growing number of exponentially growing customers.

​The next step on the pyramid

​Returning to our opening metaphor: The “exponential” growth of AI is not a smooth line of raw FLOPs; it is a staircase of bottlenecks being smashed.

  • Block 1: We couldn’t calculate fast enough. Solution: The GPU.

  • Block 2: We couldn’t train deep enough. Solution: Transformer architecture.

  • Block 3: We can’t “think” fast enough. Solution: Groq’s LPU.

​Jensen Huang has never been afraid to cannibalize his own product lines to own the future. By validating Groq, Nvidia wouldn’t just be buying a faster chip; they would be bringing next-generation intelligence to the masses.

Andrew Filev, founder and CEO of Zencoder