admin.codes » Category » Infrastructure

Google’s new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more

As Large Language Models (LLMs) expand their context windows to process massive documents and intricate conversations, they encounter a brutal hardware reality known as the “Key-Value (KV) cache bottleneck.”

Every word a model processes must be stored as a high-dimensional vector in high-speed memory. For long-form tasks, this “digital cheat sheet” swells rapidly, devouring the graphics processing unit (GPU) video random access memory (VRAM) system used during inference, and slowing the model performance down rapidly over time.

But have no fear, Google Research is here: yesterday, the unit within the search giant released its TurboQuant algorithm suite — a software-only breakthrough that provides the mathematical blueprint for extreme KV cache compression, enabling a 6x reduction on average in the amount of KV memory a given model uses, and 8x performance increase in computing attention logits, which could reduce costs for enterprises that implement it on their models by more than 50%.

The theoretically grounded algorithms and associated research papers are available now publicly for free, including for enterprise usage, offering a training-free solution to reduce model size without sacrificing intelligence.

The arrival of TurboQuant is the culmination of a multi-year research arc that began in 2024. While the underlying mathematical frameworks—including PolarQuant and Quantized Johnson-Lindenstrauss (QJL)—were documented in early 2025, their formal unveiling today marks a transition from academic theory to large-scale production reality.

The timing is strategic, coinciding with the upcoming presentations of these findings at the upcoming conferences International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro, Brazil, and Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026) in Tangier, Morocco.

By releasing these methodologies under an open research framework, Google is providing the essential “plumbing” for the burgeoning “Agentic AI” era: the need for massive, efficient, and searchable vectorized memory that can finally run on the hardware users already own. Already, it is believed to have an effect on the stock market, lowering the price of memory providers as traders look to the release as a sign that less memory will be needed (perhaps incorrect, given Jevons’ Paradox).

The Architecture of Memory: Solving the Efficiency Tax

To understand why TurboQuant matters, one must first understand the “memory tax” of modern AI. Traditional vector quantization has historically been a “leaky” process.

When high-precision decimals are compressed into simple integers, the resulting “quantization error” accumulates, eventually causing models to hallucinate or lose semantic coherence.

Furthermore, most existing methods require “quantization constants”—meta-data stored alongside the compressed bits to tell the model how to decompress them. In many cases, these constants add so much overhead—sometimes 1 to 2 bits per number—that they negate the gains of compression entirely.

TurboQuant resolves this paradox through a two-stage mathematical shield. The first stage utilizes PolarQuant, which reimagines how we map high-dimensional space.

Rather than using standard Cartesian coordinates (X, Y, Z), PolarQuant converts vectors into polar coordinates consisting of a radius and a set of angles.

The breakthrough lies in the geometry: after a random rotation, the distribution of these angles becomes highly predictable and concentrated. Because the “shape” of the data is now known, the system no longer needs to store expensive normalization constants for every data block. It simply maps the data onto a fixed, circular grid, eliminating the overhead that traditional methods must carry.

The second stage acts as a mathematical error-checker. Even with the efficiency of PolarQuant, a residual amount of error remains. TurboQuant applies a 1-bit Quantized Johnson-Lindenstrauss (QJL) transform to this leftover data. By reducing each error number to a simple sign bit (+1 or -1), QJL serves as a zero-bias estimator. This ensures that when the model calculates an “attention score”—the vital process of deciding which words in a prompt are most relevant—the compressed version remains statistically identical to the high-precision original.

Performance benchmarks and real-world reliability

The true test of any compression algorithm is the “Needle-in-a-Haystack” benchmark, which evaluates whether an AI can find a single specific sentence hidden within 100,000 words.

In testing across open-source models like Llama-3.1-8B and Mistral-7B, TurboQuant achieved perfect recall scores, mirroring the performance of uncompressed models while reducing the KV cache memory footprint by a factor of at least 6x.

This “quality neutrality” is rare in the world of extreme quantization, where 3-bit systems usually suffer from significant logic degradation.

Beyond chatbots, TurboQuant is transformative for high-dimensional search. Modern search engines increasingly rely on “semantic search,” comparing the meanings of billions of vectors rather than just matching keywords. TurboQuant consistently achieves superior recall ratios compared to existing state-of-the-art methods like RabbiQ and Product Quantization (PQ), all while requiring virtually zero indexing time.

This makes it an ideal candidate for real-time applications where data is constantly being added to a database and must be searchable immediately. Furthermore, on hardware like NVIDIA H100 accelerators, TurboQuant’s 4-bit implementation achieved an 8x performance boost in computing attention logs, a critical speedup for real-world deployments.

Rapt community reaction

The reaction on X, obtained via a Grok search, included a mixture of technical awe and immediate practical experimentation.

The original announcement from @GoogleResearch generated massive engagement, with over 7.7 million views, signaling that the industry was hungry for a solution to the memory crisis.

Within 24 hours of the release, community members began porting the algorithm to popular local AI libraries like MLX for Apple Silicon and llama.cpp.

Technical analyst @Prince_Canuma shared one of the most compelling early benchmarks, implementing TurboQuant in MLX to test the Qwen3.5-35B model.

Across context lengths ranging from 8.5K to 64K tokens, he reported a 100% exact match at every quantization level, noting that 2.5-bit TurboQuant reduced the KV cache by nearly 5x with zero accuracy loss. This real-world validation echoed Google’s internal research, proving that the algorithm’s benefits translate seamlessly to third-party models.

Other users focused on the democratization of high-performance AI. @NoahEpstein_ provided a plain-English breakdown, arguing that TurboQuant significantly narrows the gap between free local AI and expensive cloud subscriptions.

He noted that models running locally on consumer hardware like a Mac Mini “just got dramatically better,” enabling 100,000-token conversations without the typical quality degradation.

Similarly, @PrajwalTomar_ highlighted the security and speed benefits of running “insane AI models locally for free,” expressing “huge respect” for Google’s decision to share the research rather than keeping it proprietary.

Market impact and the future of hardware

The release of TurboQuant has already begun to ripple through the broader tech economy. Following the announcement on Tuesday, analysts observed a downward trend in the stock prices of major memory suppliers, including Micron and Western Digital.

The market’s reaction reflects a realization that if AI giants can compress their memory requirements by a factor of six through software alone, the insatiable demand for High Bandwidth Memory (HBM) may be tempered by algorithmic efficiency.

As we move deeper into 2026, the arrival of TurboQuant suggests that the next era of AI progress will be defined as much by mathematical elegance as by brute force. By redefining efficiency through extreme compression, Google is enabling “smarter memory movement” for multi-step agents and dense retrieval pipelines. The industry is shifting from a focus on “bigger models” to “better memory,” a change that could lower AI serving costs globally.

Strategic considerations for enterprise decision-makers

For enterprises currently using or fine-tuning their own AI models, the release of TurboQuant offers a rare opportunity for immediate operational improvement.

Unlike many AI breakthroughs that require costly retraining or specialized datasets, TurboQuant is training-free and data-oblivious.

This means organizations can apply these quantization techniques to their existing fine-tuned models—whether they are based on Llama, Mistral, or Google’s own Gemma—to realize immediate memory savings and speedups without risking the specialized performance they have worked to build.

From a practical standpoint, enterprise IT and DevOps teams should consider the following steps to integrate this research into their operations:

Optimize Inference Pipelines: Integrating TurboQuant into production inference servers can reduce the number of GPUs required to serve long-context applications, potentially slashing cloud compute costs by 50% or more.

Expand Context Capabilities: Enterprises working with massive internal documentation can now offer much longer context windows for retrieval-augmented generation (RAG) tasks without the massive VRAM overhead that previously made such features cost-prohibitive.

Enhance Local Deployments: For organizations with strict data privacy requirements, TurboQuant makes it feasible to run highly capable, large-scale models on on-premise hardware or edge devices that were previously insufficient for 32-bit or even 8-bit model weights.

Re-evaluate Hardware Procurement: Before investing in massive HBM-heavy GPU clusters, operations leaders should assess how much of their bottleneck can be resolved through these software-driven efficiency gains.

Ultimately, TurboQuant proves that the limit of AI isn’t just how many transistors we can cram onto a chip, but how elegantly we can translate the infinite complexity of information into the finite space of a digital bit. For the enterprise, this is more than just a research paper; it is a tactical unlock that turns existing hardware into a significantly more powerful asset.

Infrastructure

Cloudflare’s new Dynamic Workers ditch containers to run AI agent code 100x faster

Web infrastructure giant Cloudflare is seeking to transform the way enterprises deploy AI agents with the open beta release of Dynamic Workers, a new lightweight, isolate-based sandboxing system that it says starts in milliseconds, uses only a few mega…

Infrastructure

Liquid-cooled AI systems expose the limits of traditional storage architecture

Presented by Solidigm

Liquid cooling is rewriting the rules of AI infrastructure, but most deployments have not fully crossed the line. GPUs and CPUs have moved to liquid cooling, while storage has depended on airflow, creating an operationally inefficient hybrid architecture.

What appears to be a pragmatic transition strategy is, in practice, a structural liability.

“A hybrid cooling approach is an operationally inefficient situation,” explains Hardeep Singh, thermal-mechanical hardware team manager at Solidigm. “You’re paying for and maintaining two entirely separate, expensive cooling infrastructures, and could be exposed to the worst-of-both-world’s problems.”

While liquid cooling requires pumps, fluid manifolds, and coolant distribution units (CDUs), air-cooled components require CRAC units, cold aisles, and evaporative cooling towers. Organizations moving to a hybrid solution by just adding some liquid cooling are absorbing the cost premium without capturing the full TCO benefit.

The thermal physics makes things worse. Bulky liquid-cooling cold plates, thick hoses, and manifolds physically obstruct airflow inside the GPU server chassis. This concentrates thermal stress on the remaining air-cooled components, including storage drives, memory, and network cards, because server fans cannot push adequate airflow around the liquid plumbing. The components most reliant on fans end up in the worst possible thermal environment.

Water consumption is an all-but ignored, equally serious problem. Traditional air-cooled components rely on server fans to move heat into ambient air, which is then absorbed by a water loop and pumped to evaporative cooling towers. These systems can consume millions of gallons of water over time. As rack power densities continue to climb to support modern AI workloads, the evaporative water penalty becomes, as Singh puts it, “environmentally and economically indefensible.”

As AI infrastructure evolves toward liquid-cooled and fanless GPU systems, the true constraints on scale are shifting from compute performance to system-level thermal design. Modern AI platforms are no longer built server by server; they are engineered as tightly integrated rack- and pod-level systems where power delivery, cooling distribution, and component placement are inseparable.

In this environment, storage architectures designed for airflow-dependent data centers are becoming a limiting factor. As GPU platforms move fully into shared liquid-cooling domains, anchored by rack-level CDUs, every component in the system must operate natively within the same thermal and mechanical design. Storage can no longer rely on isolated cooling paths or bespoke thermal assumptions without introducing inefficiency, complexity, or density trade-offs at the system level.

Why storage is no longer a passive subsystem

For infrastructure leaders, this marks a fundamental transition. Storage is no longer a passive subsystem attached to compute, but instead an active participant in system-level cooling, serviceability, and GPU utilization. The ability to scale AI now depends on whether storage can integrate cleanly into liquid-cooled GPU systems, without fragmenting cooling architectures or constraining rack-level design.

And the race to scale AI is no longer just about who has the most GPUs, but instead about who can keep them cool, says Scott Shadley, director of leadership narrative and evangelist at Solidigm.

“Finding a way to enable liquid-cooled storage while still making it user serviceable has been one of the biggest challenges in designing fanless system solutions,” Shadley says. “As AI workloads evolve, the pressure on storage will only intensify.”

Techniques like KV cache offload, which move data between GPU memory and high-speed storage during inference, make storage latency and thermal performance directly relevant to model serving efficiency. In these architectures, a storage subsystem that throttles due to poor traditional airflow under thermal load slows down both reads and the model itself.

Moving to integrated liquid cooling

Moving from traditional air-cooled GPU servers to integrated liquid-cooled racks improves power usage efficiency (PUE) and reduces the operational cost for the datacenter. It also replaces the noisy computer room air handler (CRAH) and introduces a modern, efficient liquid CDU with potential scope to eliminate chillers if racks can be cooled to a liquid temperature of 45° Celsius.

When storage is cooled through liquid in absence of fans, it must also support serviceability with no liquid leakage. It also creates a new requirement that many infrastructure teams are only beginning to grapple with: every component in the rack must operate natively within the same cooling architecture.

Storage as an active participant in system design

Storage design is no longer an isolated engineering problem. It is a direct variable in GPU utilization, system reliability, and operational efficiency. The solution is to redesign storage from the ground up for liquid-cooled, fanless environments. This is harder than it sounds. Traditional SSD design assumes airflow for thermal management and places components on both sides of a thermally insulated PCB. Neither assumption holds in a CDU-anchored architecture.

“SSDs need to be designed with a best-in-class thermal solution to specifically conduct heat from internal components efficiently and transfer it to fluid,” says Singh. “The design must include a low-resistance path for heat to transfer to a single cold plate attached on one side.”

At the same time, drives must support serviceability without liquid leakage during insertion and removal, and without degrading the thermal interface between the drive and the cold plate.

Solidigm has worked with NVIDIA to address SSD liquid-cooling challenges, such as hot swap-ability and single-side cooling, reducing the thermal footprint of storage within the shared liquid loop, and ensuring GPUs receive their proportional share of coolant.

“If storage is not designed for a liquid-cooled environment efficiently, it will either throttle to lower performance or require more liquid volume,” he says. “Which directly and indirectly leads to under-utilization of GPU capability.”

Alignment on standards and the path to interoperability

Solidigm is not working on this in isolation. The broader industry is coalescing around standards to ensure liquid-cooled AI systems are interoperable rather than a patchwork of custom solutions. The SNIA and the Open Compute Project (OCP) are the primary bodies driving this work.

Solidigm led the industry standard for liquid cooling in SFF-TA-1006 for the E1.S form factor and is an active participant in OCP work streams covering rack design, thermal management, and sustainability. Custom, bespoke cooling solutions for storage are giving way to standards-aligned, production-ready designs that integrate cleanly into liquid-cooled GPU platforms.

“There are several organizations involved in this work,” says Shadley, who is also a SNIA board member. “They started with component-level solutions, driven heavily by SNIA and the SFF TA TWG. The next level is solution-level work, which is currently being heavily driven by OCP.”

Solidigm’s roadmap is leading the way

The design rules for system level architectures have changed due to the advent of liquid and immersion cooling technologies that allow for more unique design rules and removal of some barriers. The ability for systems to drive NVMe SSD-only platforms also allows for the removal of the platter-based box constraint that exists with HDD solutions, Shadley says.

“Solidigm customers have an active and lead role in roadmap decisions for our products due to their deep technical alignment with the ecosystem,” he says. “We do not simply make and sell products, we integrate, co-design, co-develop, and innovate with and alongside our partners, customers, and their customers.”

Adds Singh: “Solidigm’s key strength is innovation and customer-inspired system level engineering. This will continue to aggressively lead the way for liquid cooling adoption for storage.”

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Infrastructure

The three disciplines separating AI agent demos from real-world deployment

Getting AI agents to perform reliably in production — not just in demos — is turning out to be harder than enterprises anticipated. Fragmented data, unclear workflows, and runaway escalation rates are slowing deployments across industries.

“The technology itself often works well in demonstrations,” said Sanchit Vir Gogia, chief analyst with Greyhound Research. “The challenge begins when it is asked to operate inside the complexity of a real organization.”

Burley Kawasaki, who oversees agent deployment at Creatio, and team have developed a methodology built around three disciplines: data virtualization to work around data lake delays; agent dashboards and KPIs as a management layer; and tightly bounded use-case loops to drive toward high autonomy.

In simpler use cases, Kawasaki says these practices have enabled agents to handle up to 80-90% of tasks on their own. With further tuning, he estimates they could support autonomous resolution in at least half of use cases, even in more complex deployments.

“People have been experimenting a lot with proof of concepts, they’ve been putting a lot of tests out there,” Kawasaki told VentureBeat. “But now in 2026, we’re starting to focus on mission-critical workflows that drive either operational efficiencies or additional revenue.”

Why agents keep failing in production

Enterprises are eager to adopt agentic AI in some form or another — often because they’re afraid to be left out, even before they even identify real-world tangible use cases — but run into significant bottlenecks around data architecture, integration, monitoring, security, and workflow design.

The first obstacle almost always has to do with data, Gogia said. Enterprise information rarely exists in a neat or unified form; it is spread across SaaS platforms, apps, internal databases, and other data stores. Some are structured, some are not.

But even when enterprises overcome the data retrieval problem, integration is a big challenge. Agents rely on APIs and automation hooks to interact with applications, but many enterprise systems were designed long before this kind of autonomous interaction was a reality, Gogia pointed out.

This can result in incomplete or inconsistent APIs, and systems can respond unpredictably when accessed programmatically. Organizations also run into snags when they attempt to automate processes that were never formally defined, Gogia said.

“Many business workflows depend on tacit knowledge,” he said. That is, employees know how to resolve exceptions they’ve seen before without explicit instructions — but, those missing rules and instructions become startlingly obvious when workflows are translated into automation logic.

The tuning loop

Creatio deploys agents in a “bounded scope with clear guardrails,” followed by an “explicit” tuning and validation phase, Kawasaki explained. Teams review initial outcomes, adjust as needed, then re-test until they’ve reached an acceptable level of accuracy.

That loop typically follows this pattern:

Design-time tuning (before go-live): Performance is improved through prompt engineering, context wrapping, role definitions, workflow design, and grounding in data and documents.
Human-in-the-loop correction (during execution): Devs approve, edit, or resolve exceptions. In instances where humans have to intervene the most (escalation or approval), users establish stronger rules, provide more context, and update workflow steps; or, they’ll narrow tool access.
Ongoing optimization (after go-live): Devs continue to monitor exception rates and outcomes, then tune repeatedly as needed, helping to improve accuracy and autonomy over time.

Kawasaki’s team applies retrieval-augmented generation to ground agents in enterprise knowledge bases, CRM data, and other proprietary sources.

Once agents are deployed in the wild, they are monitored with a dashboard providing performance analytics, conversion insights, and auditability. Essentially, agents are treated like digital workers. They have their own management layer with dashboards and KPIs.

For instance, an onboarding agent will be incorporated as a standard dashboard interface providing agent monitoring and telemetry. This is part of the platform layer — orchestration, governance, security, workflow execution, monitoring, and UI embedding — that sits “above the LLM,” Kawasaki said.

Users see a dashboard of agents in use and each of their processes, workflows, and executed results. They can “drill down” into an individual record (like a referral or renewal) that shows a step-by-step execution log and related communications to support traceability, debugging, and agent tweaking. The most common adjustments involve logic and incentives, business rules, prompt context, and tool access, Kawasaki said.

The biggest issues that come up post-deployment:

Exception handling volume can be high: Early spikes in edge cases often occur until guardrails and workflows are tuned.
Data quality and completeness: Missing or inconsistent fields and documents can cause escalations; teams can identify which data to prioritize for grounding and which checks to automate.
Auditability and trust: Regulated customers, particularly, require clear logs, approvals, role-based access control (RBAC), and audit trails.

“We always explain that you have to allocate time to train agents,” Creatio’s CEO Katherine Kostereva told VentureBeat. “It doesn’t happen immediately when you switch on the agent, it needs time to understand fully, then the number of mistakes will decrease.”

“Data readiness” doesn’t always require an overhaul

When looking to deploy agents, “Is my data ready?,” is a common early question. Enterprises know data access is important, but can be turned off by a massive data consolidation project.

But virtual connections can allow agents access to underlying systems and get around typical data lake/lakehouse/warehouse delays. Kawasaki’s team built a platform that integrates with data, and is now working on an approach that will pull data into a virtual object, process it, and use it like a standard object for UIs and workflows. This way, they don’t have to “persist or duplicate” large volumes of data in their database.

This technique can be helpful in areas like banking, where transaction volumes are simply too large to copy into CRM, but are “still valuable for AI analysis and triggers,” Kawasaki said.

Once integrations and virtual objects are established, teams can evaluate data completeness, consistency, and availability, and identify low-friction starting points (like document-heavy or unstructured workflows).

Kawasaki emphasized the importance of “really using the data in the underlying systems, which tends to actually be the cleanest or the source of truth anyway.”

Matching agents to the work

The best fit for autonomous (or near-autonomous) agents are high-volume workflows with “clear structure and controllable risk,” Kawasaki said. For instance, document intake and validation in onboarding or loan preparation, or standardized outreach like renewals and referrals.

“Especially when you can link them to very specific processes inside an industry — that’s where you can really measure and deliver hard ROI,” he said.

For instance, financial institutions are often siloed by nature. Commercial lending teams perform in their own environment, wealth management in another. But an autonomous agent can look across departments and separate data stores to identify, for instance, commercial customers who might be good candidates for wealth management or advisory services.

“You think it would be an obvious opportunity, but no one is looking across all the silos,” Kawasaki said. Some banks that have applied agents to this very scenario have seen “benefits of millions of dollars of incremental revenue,” he claimed, without naming specific institutions.

However, in other cases — particularly in regulated industries — longer-context agents are not only preferable, but necessary. For instance, in multi-step tasks like gathering evidence across systems, summarizing, comparing, drafting communications, and producing auditable rationales.

“The agent isn’t giving you a response immediately,” Kawasaki said. “It may take hours, days, to complete full end-to-end tasks.”

This requires orchestrated agentic execution rather than a “single giant prompt,” he said. This approach breaks work down into deterministic steps to be performed by sub-agents. Memory and context management can be maintained across various steps and time intervals. Grounding with RAG can help keep outputs tied to approved sources, and users have the ability to dictate expansion to file shares and other document repositories.

This model typically doesn’t require custom retraining or a new foundation model. Whatever model enterprises use (GPT, Claude, Gemini), performance improves through prompts, role definitions, controlled tools, workflows, and data grounding, Kawasaki said.

The feedback loop puts “extra emphasis” on intermediate checkpoints, he said. Humans review intermediate artifacts (such as summaries, extracted facts, or draft recommendations) and correct errors. Those can then be converted into better rules and retrieval sources, narrower tool scopes, and improved templates.

“What is important for this style of autonomous agent, is you mix the best of both worlds: The dynamic reasoning of AI, with the control and power of true orchestration,” Kawasaki said.

Ultimately, agents require coordinated changes across enterprise architecture, new orchestration frameworks, and explicit access controls, Gogia said. Agents must be assigned identities to restrict their privileges and keep them within bounds. Observability is critical; monitoring tools can record task completion rates, escalation events, system interactions, and error patterns. This kind of evaluation must be a permanent practice, and agents should be tested to see how they react when encountering new scenarios and unusual inputs.

“The moment an AI system can take action, enterprises have to answer several questions that rarely appear during copilot deployments,” Gogia said. Such as: What systems is the agent allowed to access? What types of actions can it perform without approval? Which activities must always require a human decision? How will every action be recorded and reviewed?

“Those [enterprises] that underestimate the challenge often find themselves stuck in demonstrations that look impressive but cannot survive real operational complexity,” Gogia said.

Data, Infrastructure, Orchestration

Testing autonomous agents (Or: how I learned to stop worrying and embrace chaos)

Look, we’ve spent the last 18 months building production AI systems, and we’ll tell you what keeps us up at night — and it’s not whether the model can answer questions. That’s table stakes now. What haunts us is the mental image of an agent autonomously approving a six-figure vendor contract at 2 a.m. because someone typo’d a config file.

We’ve moved past the era of “ChatGPT wrappers” (thank God), but the industry still treats autonomous agents like they’re just chatbots with API access. They’re not. When you give an AI system the ability to take actions without human confirmation, you’re crossing a fundamental threshold. You’re not building a helpful assistant anymore — you’re building something closer to an employee. And that changes everything about how we need to engineer these systems.

The autonomy problem nobody talks about

Here’s what’s wild: We’ve gotten really good at making models that *sound* confident. But confidence and reliability aren’t the same thing, and the gap between them is where production systems go to die.

We learned this the hard way during a pilot program where we let an AI agent manage calendar scheduling across executive teams. Seems simple, right? The agent could check availability, send invites, handle conflicts. Except, one Monday morning, it rescheduled a board meeting because it interpreted “let’s push this if we need to” in a Slack message as an actual directive. The model wasn’t wrong in its interpretation — it was plausible. But plausible isn’t good enough when you’re dealing with autonomy.

That incident taught us something crucial: The challenge isn’t building agents that work most of the time. It’s building agents that fail gracefully, know their limitations, and have the circuit breakers to prevent catastrophic mistakes.

What reliability actually means for autonomous systems

Layered reliability architecture

When we talk about reliability in traditional software engineering, we’ve got decades of patterns: Redundancy, retries, idempotency, graceful degradation. But AI agents break a lot of our assumptions.

Traditional software fails in predictable ways. You can write unit tests. You can trace execution paths. With AI agents, you’re dealing with probabilistic systems making judgment calls. A bug isn’t just a logic error—it’s the model hallucinating a plausible-sounding but completely fabricated API endpoint, or misinterpreting context in a way that technically parses but completely misses the human intent.

So what does reliability look like here? In our experience, it’s a layered approach.

Layer 1: Model selection and prompt engineering

This is foundational but insufficient. Yes, use the best model you can afford. Yes, craft your prompts carefully with examples and constraints. But don’t fool yourself into thinking that a great prompt is enough. I’ve seen too many teams ship “GPT-4 with a really good system prompt” and call it enterprise-ready.

Layer 2: Deterministic guardrails

Before the model does anything irreversible, run it through hard checks. Is it trying to access a resource it shouldn’t? Is the action within acceptable parameters? We’re talking old-school validation logic — regex, schema validation, allowlists. It’s not sexy, but it’s effective.

One pattern that’s worked well for us: Maintain a formal action schema. Every action an agent can take has a defined structure, required fields, and validation rules. The agent proposes actions in this schema, and we validate before execution. If validation fails, we don’t just block it — we feed the validation errors back to the agent and let it try again with context about what went wrong.

Layer 3: Confidence and uncertainty quantification

Here’s where it gets interesting. We need agents that know what they don’t know. We’ve been experimenting with agents that can explicitly reason about their confidence before taking actions. Not just a probability score, but actual articulated uncertainty: “I’m interpreting this email as a request to delay the project, but the phrasing is ambiguous and could also mean…”

This doesn’t prevent all mistakes, but it creates natural breakpoints where you can inject human oversight. High-confidence actions go through automatically. Medium-confidence actions get flagged for review. Low-confidence actions get blocked with an explanation.

Layer 4: Observability and auditability

Action Validation Pipeline

If you can’t debug it, you can’t trust it. Every decision the agent makes needs to be loggable, traceable, and explainable. Not just “what action did it take” but “what was it thinking, what data did it consider, what was the reasoning chain?”

We’ve built a custom logging system that captures the full large language model (LLM) interaction — the prompt, the response, the context window, even the model temperature settings. It’s verbose as hell, but when something goes wrong (and it will), you need to be able to reconstruct exactly what happened. Plus, this becomes your dataset for fine-tuning and improvement.

Guardrails: The art of saying no

Let’s talk about guardrails, because this is where engineering discipline really matters. A lot of teams approach guardrails as an afterthought — “we’ll add some safety checks if we need them.” That’s backwards. Guardrails should be your starting point.

We think of guardrails in three categories.

Permission boundaries

What is the agent physically allowed to do? This is your blast radius control. Even if the agent hallucinates the worst possible action, what’s the maximum damage it can cause?

We use a principle called “graduated autonomy.” New agents start with read-only access. As they prove reliable, they graduate to low-risk writes (creating calendar events, sending internal messages). High-risk actions (financial transactions, external communications, data deletion) either require explicit human approval or are simply off-limits.

One technique that’s worked well: Action cost budgets. Each agent has a daily “budget” denominated in some unit of risk or cost. Reading a database record costs 1 unit. Sending an email costs 10. Initiating a vendor payment costs 1,000. The agent can operate autonomously until it exhausts its budget; then, it needs human intervention. This creates a natural throttle on potentially problematic behavior.

Graduated Autonomy and Action Cost Budget

Semantic Houndaries

What should the agent understand as in-scope vs out-of-scope? This is trickier because it’s conceptual, not just technical.

I’ve found that explicit domain definitions help a lot. Our customer service agent has a clear mandate: handle product questions, process returns, escalate complaints. Anything outside that domain — someone asking for investment advice, technical support for third-party products, personal favors — gets a polite deflection and escalation.

The challenge is making these boundaries robust to prompt injection and jailbreaking attempts. Users will try to convince the agent to help with out-of-scope requests. Other parts of the system might inadvertently pass instructions that override the agent’s boundaries. You need multiple layers of defense here.

Operational boundaries

How much can the agent do, and how fast? This is your rate limiting and resource control.

We’ve implemented hard limits on everything: API calls per minute, maximum tokens per interaction, maximum cost per day, maximum number of retries before human escalation. These might seem like artificial constraints, but they’re essential for preventing runaway behavior.

We once saw an agent get stuck in a loop trying to resolve a scheduling conflict. It kept proposing times, getting rejections, and trying again. Without rate limits, it sent 300 calendar invites in an hour. With proper operational boundaries, it would’ve hit a threshold and escalated to a human after attempt number 5.

Agents need their own style of testing

Traditional software testing doesn’t cut it for autonomous agents. You can’t just write test cases that cover all the edge cases, because with LLMs, everything is an edge case.

What’s worked for us:

Simulation environments

Build a sandbox that mirrors production but with fake data and mock services. Let the agent run wild. See what breaks. We do this continuously — every code change goes through 100 simulated scenarios before it touches production.

The key is making scenarios realistic. Don’t just test happy paths. Simulate angry customers, ambiguous requests, contradictory information, system outages. Throw in some adversarial examples. If your agent can’t handle a test environment where things go wrong, it definitely can’t handle production.

Red teaming

Get creative people to try to break your agent. Not just security researchers, but domain experts who understand the business logic. Some of our best improvements came from sales team members who tried to “trick” the agent into doing things it shouldn’t.

Shadow mode

Before you go live, run the agent in shadow mode alongside humans. The agent makes decisions, but humans actually execute the actions. You log both the agent’s choices and the human’s choices, and you analyze the delta.

This is painful and slow, but it’s worth it. You’ll find all kinds of subtle misalignments you’d never catch in testing. Maybe the agent technically gets the right answer, but with phrasing that violates company tone guidelines. Maybe it makes legally correct but ethically questionable decisions. Shadow mode surfaces these issues before they become real problems.

The human-in-the-loop pattern

Three Human-in-the-Loop Patterns

Despite all the automation, humans remain essential. The question is: Where in the loop?

We’re increasingly convinced that “human-in-the-loop” is actually several distinct patterns:

Human-on-the-loop: The agent operates autonomously, but humans monitor dashboards and can intervene. This is your steady-state for well-understood, low-risk operations.

Human-in-the-loop: The agent proposes actions, humans approve them. This is your training wheels mode while the agent proves itself, and your permanent mode for high-risk operations.

Human-with-the-loop: Agent and human collaborate in real-time, each handling the parts they’re better at. The agent does the grunt work, the human does the judgment calls.

The trick is making these transitions smooth. An agent shouldn’t feel like a completely different system when you move from autonomous to supervised mode. Interfaces, logging, and escalation paths should all be consistent.

Failure modes and recovery

Let’s be honest: Your agent will fail. The question is whether it fails gracefully or catastrophically.

We classify failures into three categories:

Recoverable errors: The agent tries to do something, it doesn’t work, the agent realizes it didn’t work and tries something else. This is fine. This is how complex systems operate. As long as the agent isn’t making things worse, let it retry with exponential backoff.

Detectable failures: The agent does something wrong, but monitoring systems catch it before significant damage occurs. This is where your guardrails and observability pay off. The agent gets rolled back, humans investigate, you patch the issue.

Undetectable failures: The agent does something wrong, and nobody notices until much later. These are the scary ones. Maybe it’s been misinterpreting customer requests for weeks. Maybe it’s been making subtly incorrect data entries. These accumulate into systemic issues.

The defense against undetectable failures is regular auditing. We randomly sample agent actions and have humans review them. Not just pass/fail, but detailed analysis. Is the agent showing any drift in behavior? Are there patterns in its mistakes? Is it developing any concerning tendencies?

The cost-performance tradeoff

Here’s something nobody talks about enough: reliability is expensive.

Every guardrail adds latency. Every validation step costs compute. Multiple model calls for confidence checking multiply your API costs. Comprehensive logging generates massive data volumes.

You have to be strategic about where you invest. Not every agent needs the same level of reliability. A marketing copy generator can be looser than a financial transaction processor. A scheduling assistant can retry more liberally than a code deployment system.

We use a risk-based approach. High-risk agents get all the safeguards, multiple validation layers, extensive monitoring. Lower-risk agents get lighter-weight protections. The key is being explicit about these trade-offs and documenting why each agent has the guardrails it does.

Organizational challenges

We’d be remiss if we didn’t mention that the hardest parts aren’t technical — they’re organizational.

Who owns the agent when it makes a mistake? Is it the engineering team that built it? The business unit that deployed it? The person who was supposed to be supervising it?

How do you handle edge cases where the agent’s logic is technically correct but contextually inappropriate? If the agent follows its rules but violates an unwritten norm, who’s at fault?

What’s your incident response process when an agent goes rogue? Traditional runbooks assume human operators making mistakes. How do you adapt these for autonomous systems?

These questions don’t have universal answers, but they need to be addressed before you deploy. Clear ownership, documented escalation paths, and well-defined success metrics are just as important as the technical architecture.

Where we go from here

The industry is still figuring this out. There’s no established playbook for building reliable autonomous agents. We’re all learning in production, and that’s both exciting and terrifying.

What we know for sure: The teams that succeed will be the ones who treat this as an engineering discipline, not just an AI problem. You need traditional software engineering rigor — testing, monitoring, incident response — combined with new techniques specific to probabilistic systems.

You need to be paranoid but not paralyzed. Yes, autonomous agents can fail in spectacular ways. But with proper guardrails, they can also handle enormous workloads with superhuman consistency. The key is respecting the risks while embracing the possibilities.

We’ll leave you with this: Every time we deploy a new autonomous capability, we run a pre-mortem. We imagine it’s six months from now and the agent has caused a significant incident. What happened? What warning signs did we miss? What guardrails failed?

This exercise has saved us more times than we can count. It forces you to think through failure modes before they occur, to build defenses before you need them, to question assumptions before they bite you.

Because in the end, building enterprise-grade autonomous AI agents isn’t about making systems that work perfectly. It’s about making systems that fail safely, recover gracefully, and learn continuously.

And that’s the kind of engineering that actually matters.

Madhvesh Kumar is a principal engineer. Deepika Singh is a senior software engineer.

Views expressed are based on hands-on experience building and deploying autonomous agents, along with the occasional 3 AM incident response that makes you question your career choices.

DataDecisionMakers, Infrastructure, Orchestration

Why enterprises are replacing generic AI with tools that know their users

The future of AI isn’t just agentic; it’s deep personalization.

Rather than simple recommender systems that correlate user behavior to identify patterns and apply those to individual workflows, large language models (LLMs) and AI agents can analyze users directly to create deeply personalized experiences.

It’s this kind of aggressive customization users are increasingly demanding — and the savviest enterprises who provide it (and soon) will win.

The goal is: “Don’t try to randomize, or guess who I am. I tell you, this is what I care about,” Lijuan Qin, head of product, at Zoom AI, explains in a new Beyond the Pilot podcast.

How Zoom is incorporating personalization

Zoom is one company that has adapted to this trend: Its generative assistant, AI Companion, goes beyond basic summarization, smart recordings, and after-meeting action items to opinion divergence and user alignment tracking.

Users can customize meeting summaries based on their specific interests, and create targeted templates for follow-up emails to different personas (whether it be a salesperson or account executive). The AI assistant can then automatically populate these documents post-call. Meanwhile, a custom dictionary in Zoom AI Studio can process unique enterprise terminology and vocabulary for more relevant AI outputs, and a deep research mode can quickly deliver comprehensive analyses based on “internal expertise and external insights.”

Control is key here; the human can be “very specific [and] nail down” agent permissioning, Qin explained. They have “very clear controls” on follow-up actions, such as: Can the agent automatically send emails to specific recipients? Or will it trigger a verification step when it recognizes transcripts contain sensitive information (as dictated by the user)?

Knowing that AI can go off the rails at times, human users can track agent behavior in Zoom, enable and disable features, and control data access. This can help prevent outputs that are inaccurate or off-target.

“The most important thing is we do not assume AI is smart enough to get everything right,” Qin emphasized.

Getting context right

In this new agentic AI age, there is essentially a “land grab for context,” Sam Witteveen, co-founder of Red Dragon AI and Beyond the Pilot host, explains in the podcast.

“Definitely knowing your users is the big thing, right? Knowing what apps they are living in, what day-to-day tasks are they constantly doing?,” he said. “Companies realize the more they have about you, the better the [AI] memory can get, the better they can customize.”

Claude Cowork is one app that is “really shining” at this, Witteveen says; OpenClaw is another. Models are good enough that they can begin to make decisions for users and respond to directions like: “You know a bunch of things about me. You’ve got all this context. Go and generate the skills that are going to help me do a better job.”

“With something like OpenClaw, you can customize it in any way you want, right? You can chat with it, you can tell it, ‘Hey, at 4 o’clock I want you to do this,’” Witteveen said.

However, token usage and security must always be taken into account, he advised. OpenClaw has been plagued by security issues since its launch. This has prompted many enterprises to uninstall the autonomous agent or outright ban its use; however, these uninstalls must be done correctly so that IT leaders don’t inadvertently delete their entire enterprise stack.

Meanwhile, in terms of token budget, personalization can run up costs. “You need to think about the metrics you are tracking,” Witteveen said. “This is very different from product to product, but metrics around these things are gonna be key.”

Watch the podcast to hear more about:

Why the companies that don’t experiment with AI skills right now “may be toast”
How Zoom built an AI companion that tracks opinion divergence — not just action items — in your meetings
Why the build vs. buy question just got a lot more urgent for enterprise software
Why “skills” may matter more than MCP for the future of enterprise AI

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Infrastructure

Mistral AI launches Forge to help companies build proprietary AI models, challenging cloud giants

Mistral AI on Monday launched Forge, an enterprise model training platform that allows organizations to build, customize, and continuously improve AI models using their own proprietary data — a move that positions the French AI lab squarely against the hyperscale cloud providers in one of the most consequential and least understood markets in enterprise technology.

The announcement caps a remarkably aggressive week for Mistral, which also released its Mistral Small 4 model, unveiled Leanstral — an open-source code agent for formal verification — and joined the newly formed Nvidia Nemotron Coalition as a co-developer of the coalition’s first open frontier base model. Together, these moves paint the picture of a company that is no longer content to compete on model benchmarks alone and is instead racing to become the infrastructure backbone for organizations that want to own their AI rather than rent it.

Forge goes significantly beyond the fine-tuning APIs that Mistral and its competitors have offered for the past year. The platform supports the full model training lifecycle: pre-training on large internal datasets, post-training through supervised fine-tuning, DPO, and ODPO, and — critically — reinforcement learning pipelines designed to align models with internal policies, evaluation criteria, and operational objectives over time.

“Forge is Mistral’s model training platform,” said Maliena Guy, head of product at Mistral AI, in an exclusive interview with VentureBeat ahead of the launch. “We’ve been building this out behind the scenes with our AI scientists. What Forge actually brings to the table is that it lets enterprises and governments customize AI models for their specific needs.”

Why Mistral says fine-tuning APIs are no longer enough for serious enterprise AI

The distinction Mistral is drawing — between lightweight fine-tuning and full-cycle model training — is central to understanding why Forge exists and whom it serves.

For the past two years, most enterprise AI adoption has followed a familiar pattern: companies select a general-purpose model from OpenAI, Anthropic, Google, or an open-source provider, then apply fine-tuning through a cloud API to adjust the model’s behavior for a narrow set of tasks. This approach works well for proof-of-concept deployments and many production use cases. But Guy argues that it fundamentally plateaus when organizations try to solve their hardest problems.

“We had a fine-tuning API relying on supervised fine-tuning. I think it was kind of what was the standard a couple of months ago,” Guy told VentureBeat. “It gets you to a proof-of-concept state. Whenever you actually want to have the performance that you’re targeting, you need to go beyond. AI scientists today are not using fine-tuning APIs. They’re using much more advanced tools, and that’s what Forge is bringing to the table.”

What Forge packages, in Guy’s telling, is the training methodology that Mistral’s own AI scientists use internally to build the company’s flagship models — including data mixing strategies, data generation pipelines, distributed computing optimizations, and battle-tested training recipes. She drew a sharp line between Forge and the open-source tools and community tutorials that are freely available today.

“There’s no platform out there that provides you real-world training recipes that work,” Guy said. “Other open-source repositories or other tools can give you generic configurations or community tutorials, but they don’t give you the recipe that’s been validated — that we’ve been doing for all of our flagship models today.”

From ancient manuscripts to hedge fund quant languages, early customers reveal what off-the-shelf AI can’t do

The obvious question facing any product like Forge is demand. In a market where GPT-5, Claude, Gemini, and a growing fleet of open-source models can handle an enormous range of tasks, why would an enterprise invest the time, compute, and expertise required to train its own model from scratch?

Guy acknowledged the question head-on but argued that the need emerges quickly once companies move beyond generic use cases. “A lot of the existing models can get you very far,” she said. “But when you’re looking at what’s going to make you competitive compared to your competition — everyone can adopt and use the models that are out there. When you want to go a step beyond that, you actually need to create your own models. You need to leverage your proprietary information.”

The real-world examples she cited illustrate the edges of the current model ecosystem. In one case, Mistral worked with a public institution that had ancient manuscripts with missing text from damaged sections. “The models that were available were not able to do this because they’ve never seen the data,” Guy explained. “Digitization was not very good. There were some unique patterns and characters, and so we actually created a model for them to fill in the spans. This is now used by their researchers, and it’s accelerating their publication and understanding of these documents.”

In another engagement, Mistral partnered with Ericsson to customize its Codestral model for legacy-to-modern code translation. Ericsson, Guy said, has built up half a decade of proprietary knowledge around an internal calling language — a codebase so specialized that no off-the-shelf model has ever encountered it. “The concrete impact is like turning a year-long manual migration process, where each engineer needs six months of onboarding, to something that’s really more scalable and faster,” she said.

Perhaps the most telling example involves hedge funds. Guy described working with financial firms to customize models for proprietary quantitative languages — the kind of deeply guarded intellectual property that these firms keep on-premises and never expose to cloud-hosted AI services. Using Forge’s reinforcement learning capabilities, Mistral helped one hedge fund develop custom benchmarks and then trained the model to outperform on them, producing what Guy called “a unique model that was able to give them the competitive edge that was needed.”

How Forge makes money: license fees, data pipelines, and embedded AI scientists

Forge’s business model reflects the complexity of enterprise model training. According to Guy, it operates across several revenue streams. For customers who run training jobs on their own GPU clusters — a common requirement in highly regulated or IP-sensitive industries — Mistral does not charge for compute. Instead, the company charges a license fee for the Forge platform itself, along with optional fees for data pipeline services and what Mistral calls “forward-deployed scientists” — embedded AI researchers who work alongside the customer’s team.

“No competitor out there today is kind of selling this embedded scientist as part of their training platform offering,” Guy said.

This model has clear echoes of Palantir’s early playbook, where forward-deployed engineers served as the critical bridge between powerful software and the messy reality of enterprise data. It also suggests that Mistral recognizes a fundamental truth about the current state of enterprise AI: the technology alone is not enough. Most organizations lack the internal expertise to design effective training recipes, curate data at scale, or navigate the treacherous optimization landscape of distributed GPU training.

The infrastructure itself is flexible. Training can happen on Mistral’s own clusters, on Mistral Compute (the company’s dedicated infrastructure offering), or entirely on-premises within the customer’s own data centers. “We have all these different cases, and we support everything,” Guy said.

Keeping proprietary data off the cloud is Forge’s sharpest selling point

One of the sharpest points of differentiation Mistral is pressing with Forge is data privacy. When customers train on their own infrastructure, Guy emphasized that Mistral never sees the data at all.

“It’s on their clusters, it’s with their data — we don’t see anything of it, and so it’s completely under their control,” she said. “I think this is something that sets us apart from the competition, where you actually need to upload your data, and you have a black box effect.”

This matters enormously in sectors like defense, intelligence, financial services, and healthcare, where the legal and reputational risks of exposing proprietary data to a third-party cloud service can be deal-breakers. Mistral has already partnered with organizations including ASML, DSO National Laboratories Singapore, the European Space Agency, Home Team Science and Technology Agency Singapore, and Reply — a roster that suggests the company is deliberately targeting the most data-sensitive corners of the enterprise market.

Forge also includes data pipeline capabilities that Mistral has developed through its own model training: data acquisition, curation, and synthetic data generation. “Data is a critical piece of any training job today,” Guy said. “You need to have good data. You need to have a good amount of data to make sure that the model is going to be good performing. We’ve acquired, as a company, really great knowledge building out these data pipelines.”

In the age of AI agents, Mistral argues that custom models still matter more than MCP servers

The timing of Forge’s launch raises an important strategic question. The AI industry in 2026 has been consumed by agents — autonomous AI systems that can use tools, navigate multi-step workflows, and take actions on behalf of users. If the future belongs to agents, why does the underlying model matter? Can’t companies simply plug into the best available frontier model through an MCP server or API and focus their energy on orchestration?

Guy pushed back on this framing with conviction. “The customers that we’ve been working on — some of these specific problems are things that no MCP server would ever solve,” she said. “You actually need that intelligence. You actually need to create that model that will help you solve your most critical business problem.”

She also argued that model customization is essential even in purely agentic architectures. “There are some agentic behaviors that you need to bring to the model,” Guy said. “It can be about reasoning patterns, specific types of documentation, making sure that you have the right reasoning traces. Even in these cases where people are going completely agentic, you still need model customization — like reinforcement learning techniques — to actually get the right level of performance.”

Mistral’s press release makes this connection explicit, arguing that custom models make enterprise agents more reliable by providing deeper understanding of internal environments: more precise tool selection, more dependable multi-step workflows, and decisions that reflect internal policies rather than generic assumptions.

The platform also supports an “agent-first” design philosophy. Forge exposes interfaces that allow autonomous agents — including Mistral’s own Vibe coding agent — to launch training experiments, find optimal hyperparameters, schedule jobs, and generate synthetic data. “We’ve actually been building Forge in an AI-native way,” Guy said. “We’re already testing out how autonomous agents can actually launch training experiments.”

Mistral Small 4, Leanstral, and the Nvidia coalition: the week that redefined the company’s ambitions

To fully appreciate Forge’s significance, it helps to view it alongside the other announcements Mistral made in the same week — a barrage of releases that together represent the most ambitious expansion in the company’s short history.

Just yesterday, Mistral released Leanstral, the first open-source code agent for Lean 4, the proof assistant used in formal mathematics and software verification. Leanstral operates with just 6 billion active parameters and is designed for realistic formal repositories — not isolated math competition problems. On the same day, Mistral launched Mistral Small 4, a mixture-of-experts model with 119 billion total parameters but only 6 billion active per query, running 40 percent faster than its predecessor while handling three times more queries per second. Both models ship under the Apache 2.0 license — the most permissive open-source license in wide use.

And then there is the Nvidia Nemotron Coalition. Announced at Nvidia’s GTC conference, the coalition is a first-of-its-kind collaboration between Nvidia and a group of AI labs — including Mistral, Perplexity, LangChain, Cursor, Black Forest Labs, Reflection AI, Sarvam, and Thinking Machines Lab — to co-develop open frontier models. The coalition’s first project is a base model co-developed specifically by Mistral AI and Nvidia, trained on Nvidia DGX Cloud, which will underpin the upcoming Nvidia Nemotron 4 family of open models.

“Open frontier models are how AI becomes a true platform,” said Arthur Mensch, cofounder and CEO of Mistral AI, in Nvidia’s announcement. “Together with Nvidia, we will take a leading role in training and advancing frontier models at scale.”

This coalition role is strategically significant. It positions Mistral not merely as a consumer of Nvidia’s compute infrastructure but as a co-creator of the foundational models that the broader ecosystem will build upon. For a company that is still a fraction of the size of its American competitors, this is an outsized seat at the table.

Forge takes aim at Amazon, Microsoft, and Google — and says they can’t go deep enough

Forge enters a market that is already crowded — at least on the surface. Amazon Bedrock, Microsoft Azure AI Foundry, and Google Cloud Vertex AI all offer model training and customization capabilities. But Guy argued that these offerings are fundamentally limited in two respects.

First, they are cloud-only. “In one set of cases, it’s very easy to answer — they want to run this on their premises, and so all these tools that are available on the cloud are just not available for them,” Guy said. Second, she argued that the hyperscalers’ training tools largely offer simplified API interfaces that don’t provide the depth of control that serious model training requires.

There is also the dependency question. Guy described digital-native companies that had built products on top of closed-source models, only to have a new model release — more verbose than its predecessor — crash their production pipelines. “When you’re relying on closed-source models, you are also super dependent on the updates of the model that have side effects,” she warned.

This argument resonates with the broader sovereignty narrative that has powered Mistral’s rise in Europe and beyond. The company has positioned itself as the alternative for organizations that want to own their AI stack rather than lease it from American hyperscalers. Forge extends that argument from inference to training: not just running models you own, but building them in the first place.

The open-source foundation matters here, too. Mistral has been releasing models under permissive licenses since its founding, and Guy emphasized that the company is building Forge as an open platform. While it currently works with Mistral’s own models, she confirmed that support for other open-source architectures is planned. “We’re deeply rooted into open source. This has been part of our DNA since the beginning, and we have been building Forge to be an open platform — it’s just a question of a matter of time that we’ll be opening this to other open-source models.”

A co-founder’s departure to xAI underscores why Mistral is turning expertise into a product

The timing of Forge’s launch also arrives against a backdrop of fierce talent competition. As FinTech Weekly reported on March 14, Devendra Singh Chaplot — a co-founder of Mistral AI who headed the company’s multimodal group and contributed to training Mistral 7B, Mixtral 8x7B, and Mistral Large — left to join Elon Musk’s xAI, where he will work on Grok model training. Chaplot had previously also been a founding member of Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati.

The loss of a co-founder is never insignificant, but Mistral appears to be compensating with institutional capability rather than individual brilliance. Forge is, in essence, a productization of the company’s collective training expertise — the recipes, the pipelines, the distributed computing optimizations — in a form that can scale beyond any single researcher. By packaging this knowledge into a platform and pairing it with forward-deployed scientists, Mistral is attempting to build a durable competitive asset that doesn’t walk out the door when a key hire departs.

Mistral’s big bet: the companies that own their AI models will be the ones that win

Forge is a bet on a specific theory of the enterprise AI future: that the most valuable AI systems will be those trained on proprietary knowledge, governed by internal policies, and operated under the organization’s direct control. This stands in contrast to the prevailing paradigm of the past two years, in which enterprises have largely consumed AI as a cloud service — powerful but generic, convenient but uncontrolled.

The question is whether enough enterprises will be willing to make the investment. Model training is expensive, technically demanding, and requires sustained organizational commitment. Forge lowers the barriers — through its infrastructure automation, its battle-tested recipes, and its embedded scientists — but it does not eliminate them.

What Mistral appears to be banking on is that the organizations with the most to gain from AI — the ones sitting on decades of proprietary knowledge in highly specialized domains — are precisely the ones for whom generic models are least sufficient. These are the companies where the gap between what a general-purpose model can do and what the business actually needs is widest, and where the competitive advantage of closing that gap is greatest.

Forge supports both dense and mixture-of-experts architectures, accommodating different trade-offs between performance, cost, and operational constraints. It handles multimodal inputs. It is designed for continuous adaptation rather than one-time training, with built-in evaluation frameworks that let enterprises test models against internal benchmarks before production deployment.

For the past two years, the enterprise AI playbook has been straightforward: pick a model, call an API, ship a feature. Mistral is now asking a harder question — whether the organizations willing to do the difficult, expensive, unglamorous work of training their own models will end up with something the API-callers never get.

An unfair advantage.

Infrastructure

Nvidia’s DGX Station is a desktop supercomputer that runs trillion-parameter AI models without the cloud

Nvidia on Monday unveiled a deskside supercomputer powerful enough to run AI models with up to one trillion parameters — roughly the scale of GPT-4 — without touching the cloud. The machine, called the DGX Station, packs 748 gigabytes of coherent memory and 20 petaflops of compute into a box that sits next to a monitor, and it may be the most significant personal computing product since the original Mac Pro convinced creative professionals to abandon workstations.

The announcement, made at the company’s annual GTC conference in San Jose, lands at a moment when the AI industry is grappling with a fundamental tension: the most powerful models in the world require enormous data center infrastructure, but the developers and enterprises building on those models increasingly want to keep their data, their agents, and their intellectual property local. The DGX Station is Nvidia’s answer — a six-figure machine that collapses the distance between AI’s frontier and a single engineer’s desk.

What 20 petaflops on your desktop actually means

The DGX Station is built around the new GB300 Grace Blackwell Ultra Desktop Superchip, which fuses a 72-core Grace CPU and a Blackwell Ultra GPU through Nvidia’s NVLink-C2C interconnect. That link provides 1.8 terabytes per second of coherent bandwidth between the two processors — seven times the speed of PCIe Gen 6 — which means the CPU and GPU share a single, seamless pool of memory without the bottlenecks that typically cripple desktop AI work.

Twenty petaflops — 20 quadrillion operations per second — would have ranked this machine among the world’s top supercomputers less than a decade ago. The Summit system at Oak Ridge National Laboratory, which held the global No. 1 spot in 2018, delivered roughly ten times that performance but occupied a room the size of two basketball courts. Nvidia is packaging a meaningful fraction of that capability into something that plugs into a wall outlet.

The 748 GB of unified memory is arguably the more important number. Trillion-parameter models are enormous neural networks that must be loaded entirely into memory to run. Without sufficient memory, no amount of processing speed matters — the model simply won’t fit. The DGX Station clears that bar, and it does so with a coherent architecture that eliminates the latency penalties of shuttling data between CPU and GPU memory pools.

Always-on agents need always-on hardware

Nvidia designed the DGX Station explicitly for what it sees as the next phase of AI: autonomous agents that reason, plan, write code, and execute tasks continuously — not just systems that respond to prompts. Every major announcement at GTC 2026 reinforced this “agentic AI” thesis, and the DGX Station is where those agents are meant to be built and run.

The key pairing is NemoClaw, a new open-source stack that Nvidia also announced Monday. NemoClaw bundles Nvidia’s Nemotron open models with OpenShell, a secure runtime that enforces policy-based security, network, and privacy guardrails for autonomous agents. A single command installs the entire stack. Jensen Huang, Nvidia’s founder and CEO, framed the combination in unmistakable terms, calling OpenClaw — the broader agent platform NemoClaw supports — “the operating system for personal AI” and comparing it directly to Mac and Windows.

The argument is straightforward: cloud instances spin up and down on demand, but always-on agents need persistent compute, persistent memory, and persistent state. A machine under your desk, running 24/7 with local data and local models inside a security sandbox, is architecturally better suited to that workload than a rented GPU in someone else’s data center. The DGX Station can operate as a personal supercomputer for a solo developer or as a shared compute node for teams, and it supports air-gapped configurations for classified or regulated environments where data can never leave the building.

From desk prototype to data center production in zero rewrites

One of the cleverest aspects of the DGX Station’s design is what Nvidia calls architectural continuity. Applications built on the machine migrate seamlessly to the company’s GB300 NVL72 data center systems — 72-GPU racks designed for hyperscale AI factories — without rearchitecting a single line of code. Nvidia is selling a vertically integrated pipeline: prototype at your desk, then scale to the cloud when you’re ready.

This matters because the biggest hidden cost in AI development today isn’t compute — it’s the engineering time lost to rewriting code for different hardware configurations. A model fine-tuned on a local GPU cluster often requires substantial rework to deploy on cloud infrastructure with different memory architectures, networking stacks, and software dependencies. The DGX Station eliminates that friction by running the same NVIDIA AI software stack that powers every tier of Nvidia’s infrastructure, from the DGX Spark to the Vera Rubin NVL72.

Nvidia also expanded the DGX Spark, the Station’s smaller sibling, with new clustering support. Up to four Spark units can now operate as a unified system with near-linear performance scaling — a “desktop data center” that fits on a conference table without rack infrastructure or an IT ticket. For teams that need to fine-tune mid-size models or develop smaller-scale agents, clustered Sparks offer a credible departmental AI platform at a fraction of the Station’s cost.

The early buyers reveal where the market is heading

The initial customer roster for DGX Station maps the industries where AI is transitioning fastest from experiment to daily operating tool. Snowflake is using the system to locally test its open-source Arctic training framework. EPRI, the Electric Power Research Institute, is advancing AI-powered weather forecasting to strengthen electrical grid reliability. Medivis is integrating vision language models into surgical workflows. Microsoft Research and Cornell have deployed the systems for hands-on AI training at scale.

Systems are available to order now and will ship in the coming months from ASUS, Dell Technologies, GIGABYTE, MSI, and Supermicro, with HP joining later in the year. Nvidia hasn’t disclosed pricing, but the GB300 components and the company’s historical DGX pricing suggest a six-figure investment — expensive by workstation standards, but remarkably cheap compared to the cloud GPU costs of running trillion-parameter inference at scale.

The list of supported models underscores how open the AI ecosystem has become: developers can run and fine-tune OpenAI’s gpt-oss-120b, Google Gemma 3, Qwen3, Mistral Large 3, DeepSeek V3.2, and Nvidia’s own Nemotron models, among others. The DGX Station is model-agnostic by design — a hardware Switzerland in an industry where model allegiances shift quarterly.

Nvidia’s real strategy: own every layer of the AI stack, from orbit to office

The DGX Station didn’t arrive in a vacuum. It was one piece of a sweeping set of GTC 2026 announcements that collectively map Nvidia’s ambition to supply AI compute at literally every physical scale.

At the top, Nvidia unveiled the Vera Rubin platform — seven new chips in full production — anchored by the Vera Rubin NVL72 rack, which integrates 72 next-generation Rubin GPUs and claims up to 10x higher inference throughput per watt compared to the current Blackwell generation. The Vera CPU, with 88 custom Olympus cores, targets the orchestration layer that agentic workloads increasingly demand. At the far frontier, Nvidia announced the Vera Rubin Space Module for orbital data centers, delivering 25x more AI compute for space-based inference than the H100.

Between orbit and office, Nvidia revealed partnerships spanning Adobe for creative AI, automakers like BYD and Nissan for Level 4 autonomous vehicles, a coalition with Mistral AI and seven other labs to build open frontier models, and Dynamo 1.0, an open-source inference operating system already adopted by AWS, Azure, Google Cloud, and a roster of AI-native companies including Cursor and Perplexity.

The pattern is unmistakable: Nvidia wants to be the computing platform — hardware, software, and models — for every AI workload, everywhere. The DGX Station is the piece that fills the gap between the cloud and the individual.

The cloud isn’t dead, but its monopoly on serious AI work is ending

For the past several years, the default assumption in AI has been that serious work requires cloud GPU instances — renting Nvidia hardware from AWS, Azure, or Google Cloud. That model works, but it carries real costs: data egress fees, latency, security exposure from sending proprietary data to third-party infrastructure, and the fundamental loss of control inherent in renting someone else’s computer.

The DGX Station doesn’t kill the cloud — Nvidia’s data center business dwarfs its desktop revenue and is accelerating. But it creates a credible local alternative for an important and growing category of workloads. Training a frontier model from scratch still demands thousands of GPUs in a warehouse. Fine-tuning a trillion-parameter open model on proprietary data? Running inference for an internal agent that processes sensitive documents? Prototyping before committing to cloud spend? A machine under your desk starts to look like the rational choice.

This is the strategic elegance of the product: it expands Nvidia’s addressable market into personal AI infrastructure while reinforcing the cloud business, because everything built locally is designed to scale up to Nvidia’s data center platforms. It’s not cloud versus desk. It’s cloud and desk, and Nvidia supplies both.

A supercomputer on every desk — and an agent that never sleeps on top of it

The PC revolution’s defining slogan was “a computer on every desk and in every home.” Four decades later, Nvidia is updating the premise with an uncomfortable escalation. The DGX Station puts genuine supercomputing power — the kind that ran national laboratories — beside a keyboard, and NemoClaw puts an autonomous AI agent on top of it that runs around the clock, writing code, calling tools, and completing tasks while its owner sleeps.

Whether that future is exhilarating or unsettling depends on your vantage point. But one thing is no longer debatable: the infrastructure required to build, run, and own frontier AI just moved from the server room to the desk drawer. And the company that sells nearly every serious AI chip on the planet just made sure it sells the desk drawer, too.

Infrastructure

Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthropic and Meta on board

Nvidia on Monday took the wraps off Vera Rubin, a sweeping new computing platform built from seven chips now in full production — and backed by an extraordinary lineup of customers that includes Anthropic, OpenAI, Meta and Mistral AI, along with every major cloud provider.

The message to the AI industry, and to investors, was unmistakable: Nvidia is not slowing down. The Vera Rubin platform claims up to 10x more inference throughput per watt and one-tenth the cost per token compared with the Blackwell systems that only recently began shipping. CEO Jensen Huang, speaking at the company’s annual GTC conference, called it “a generational leap” that would kick off “the greatest infrastructure buildout in history.” Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will all offer the platform, and more than 80 manufacturing partners are building systems around it.

“Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI,” Huang declared. “The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.”

In any other industry, such rhetoric might be dismissed as keynote theater. But Nvidia occupies a singular position in the global economy — a company whose products have become so essential to the AI boom that its market capitalization now rivals the GDP of mid-sized nations. When Huang says the infrastructure buildout is historic, the CEOs of the companies actually writing the checks are standing behind him, nodding.

Dario Amodei, the chief executive of Anthropic, said Nvidia’s platform “gives us the compute, networking and system design to keep delivering while advancing the safety and reliability our customers depend on.” Sam Altman, the chief executive of OpenAI, said that “with Nvidia Vera Rubin, we’ll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people.”

Inside the seven-chip architecture designed to power the age of AI agents

The Vera Rubin platform brings together the Nvidia Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch and the newly integrated Groq 3 LPU — a purpose-built inference accelerator. Nvidia organized these into five interlocking rack-scale systems that function as a unified supercomputer.

The flagship NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. Nvidia says it can train large mixture-of-experts models using one-quarter the GPUs required on Blackwell, a claim that, if validated in production, would fundamentally alter the economics of building frontier AI systems.

The Vera CPU rack packs 256 liquid-cooled processors into a single rack, sustaining more than 22,500 concurrent CPU environments — the sandboxes where AI agents execute code, validate results and iterate. Nvidia describes the Vera CPU as the first processor purpose-built for agentic AI and reinforcement learning, featuring 88 custom-designed Olympus cores and LPDDR5X memory delivering 1.2 terabytes per second of bandwidth at half the power of conventional server CPUs.

The Groq 3 LPX rack, housing 256 inference processors with 128 gigabytes of on-chip SRAM, targets the low-latency demands of trillion-parameter models with million-token contexts. The BlueField-4 STX storage rack provides what Nvidia calls “context memory” — high-speed storage for the massive key-value caches that agentic systems generate as they reason across long, multi-step tasks. And the Spectrum-6 SPX Ethernet rack ties it all together with co-packaged optics delivering 5x greater optical power efficiency than traditional transceivers.

Why Nvidia is betting the future on autonomous AI agents — and rebuilding its stack around them

The strategic logic binding every announcement Monday into a single narrative is Nvidia’s conviction that the AI industry is crossing a threshold. The era of chatbots — AI that responds to a prompt and stops — is giving way to what Huang calls “agentic AI“: systems that reason autonomously for hours or days, write and execute software, call external tools, and continuously improve.

This isn’t just a branding exercise. It represents a genuine architectural shift in how computing infrastructure must be designed. A chatbot query might consume milliseconds of GPU time. An agentic system orchestrating a drug discovery pipeline or debugging a complex codebase might run continuously, consuming CPU cycles to execute code, GPU cycles to reason, and massive storage to maintain context across thousands of intermediate steps. That demands not just faster chips, but a fundamentally different balance of compute, memory, storage and networking.

Nvidia addressed this with the launch of its Agent Toolkit, which includes OpenShell, a new open-source runtime that enforces security and privacy guardrails for autonomous agents. The enterprise adoption list is remarkable: Adobe, Atlassian, Box, Cadence, Cisco, CrowdStrike, Dassault Systèmes, IQVIA, Red Hat, Salesforce, SAP, ServiceNow, Siemens and Synopsys are all integrating the toolkit into their platforms. Nvidia also launched NemoClaw, an open-source stack that lets users install its Nemotron models and OpenShell runtime in a single command to run secure, always-on AI assistants on everything from RTX laptops to DGX Station supercomputers.

The company separately announced Dynamo 1.0, open-source software it describes as the first “operating system” for AI inference at factory scale. Dynamo orchestrates GPU and memory resources across clusters and has already been adopted by AWS, Azure, Google Cloud, Oracle, Cursor, Perplexity, PayPal and Pinterest. Nvidia says it boosted Blackwell inference performance by up to 7x in recent benchmarks.

The Nemotron coalition and Nvidia’s play to shape the open-source AI landscape

If Vera Rubin represents Nvidia’s hardware ambition, the Nemotron Coalition represents its software ambition. Announced Monday, the coalition is a global collaboration of AI labs that will jointly develop open frontier models trained on Nvidia’s DGX Cloud. The inaugural members — Black Forest Labs, Cursor, LangChain, Mistral AI, Perplexity, Reflection AI, Sarvam and Thinking Machines Lab, the startup led by former OpenAI executive Mira Murati — will contribute data, evaluation frameworks and domain expertise.

The first model will be co-developed by Mistral AI and Nvidia and will underpin the upcoming Nemotron 4 family. “Open models are the lifeblood of innovation and the engine of global participation in the AI revolution,” Huang said.

Nvidia also expanded its own open model portfolio significantly. Nemotron 3 Ultra delivers what the company calls frontier-level intelligence with 5x throughput efficiency on Blackwell. Nemotron 3 Omni integrates audio, vision and language understanding. Nemotron 3 VoiceChat supports real-time, simultaneous conversations. And the company previewed GR00T N2, a next-generation robot foundation model that it says helps robots succeed at new tasks in new environments more than twice as often as leading alternatives, currently ranking first on the MolmoSpaces and RoboArena benchmarks.

The open-model push serves a dual purpose. It cultivates the developer ecosystem that drives demand for Nvidia hardware, and it positions Nvidia as a neutral platform provider rather than a competitor to the AI labs building on its chips — a delicate balancing act that grows more complex as Nvidia’s own models grow more capable.

From operating rooms to orbit: how Vera Rubin’s reach extends far beyond the data center

The vertical breadth of Monday’s announcements was almost disorienting. Roche revealed it is deploying more than 3,500 Blackwell GPUs across hybrid cloud and on-premises environments in the U.S. and Europe — the largest announced GPU footprint in the pharmaceutical industry. The company is using the infrastructure for biological foundation models, drug discovery and digital twins of manufacturing facilities, including its new GLP-1 facility in North Carolina. Nearly 90 percent of Genentech’s eligible small-molecule programs now integrate AI, Roche said, with one oncology molecule designed 25 percent faster and a backup candidate delivered in seven months instead of more than two years.

In autonomous vehicles, BYD, Geely, Isuzu and Nissan are building Level 4-ready vehicles on Nvidia’s Drive Hyperion platform. Nvidia and Uber expanded their partnership to launch autonomous vehicles across 28 cities on four continents by 2028, starting with Los Angeles and San Francisco in the first half of 2027. The company introduced Alpamayo 1.5, a reasoning model for autonomous driving already downloaded by more than 100,000 automotive developers, and Nvidia Halos OS, a safety architecture built on ASIL D-certified foundations for production-grade autonomy.

Nvidia also released the first domain-specific physical AI platform for healthcare robotics, anchored by Open-H — the world’s largest healthcare robotics dataset, with over 700 hours of surgical video. CMR Surgical, Johnson & Johnson MedTech and Medtronic are among the adopters.

And then there was space. The Vera Rubin Space Module delivers up to 25x more AI compute for orbital inferencing compared with the H100 GPU. Aetherflux, Axiom Space, Kepler Communications, Planet Labs and Starcloud are building on it. “Space computing, the final frontier, has arrived,” Huang said, deploying the kind of line that, from another executive, might draw eye-rolls — but from the CEO of a company whose chips already power the majority of the world’s AI workloads, lands differently.

The deskside supercomputer and Nvidia’s quiet push into enterprise hardware

Amid the spectacle of trillion-parameter models and orbital data centers, Nvidia made a quieter but potentially consequential move: it launched the DGX Station, a deskside system powered by the GB300 Grace Blackwell Ultra Desktop Superchip that delivers 748 gigabytes of coherent memory and up to 20 petaflops of AI compute performance. The system can run open models of up to one trillion parameters from a desk.

Snowflake, Microsoft Research, Cornell, EPRI and Sungkyunkwan University are among the early users. DGX Station supports air-gapped configurations for regulated industries, and applications built on it move seamlessly to Nvidia’s data center systems without rearchitecting — a design choice that creates a natural on-ramp from local experimentation to large-scale deployment.

Nvidia also updated DGX Spark, its more compact system, with support for clustering up to four units into a “desktop data center” with linear performance scaling. Both systems ship preconfigured with NemoClaw and the Nvidia AI software stack, and support models including Nemotron 3, Google Gemma 3, Qwen3, DeepSeek V3.2, Mistral Large 3 and others.

Adobe and Nvidia separately announced a strategic partnership to develop the next generation of Firefly models using Nvidia’s computing technology and libraries. Adobe will also build a cloud-native 3D digital twin solution for marketing on Nvidia Omniverse and integrate Nemotron capabilities into Adobe Acrobat. The partnership spans creative tools including Photoshop, Premiere Pro, Frame.io and Adobe Experience Platform.

Building the factories that build intelligence: Nvidia’s AI infrastructure blueprint

Perhaps the most telling indicator of where Nvidia sees the industry heading is the Vera Rubin DSX AI Factory reference design — essentially a blueprint for constructing entire buildings optimized to produce AI. The reference design outlines how to integrate compute, networking, storage, power and cooling into a system that maximizes what Nvidia calls “tokens per watt,” along with an Omniverse DSX Blueprint for creating digital twins of these facilities before they are built.

The software stack includes DSX Max-Q for dynamic power provisioning — which Nvidia says enables 30 percent more AI infrastructure within a fixed-power data center — and DSX Flex, which connects AI factories to power-grid services to unlock what the company estimates is 100 gigawatts of stranded grid capacity. Energy leaders Emerald AI, GE Vernova, Hitachi and Siemens Energy are using the architecture. Nscale and Caterpillar are building one of the world’s largest AI factories in West Virginia using the Vera Rubin reference design.

Industry partners Cadence, Dassault Systèmes, Eaton, Jacobs, Schneider Electric, Siemens, PTC, Switch, Trane Technologies and Vertiv are contributing simulation-ready assets and integrating their platforms. CoreWeave is using Nvidia’s DSX Air to run operational rehearsals of AI factories in the cloud before physical delivery.

“In the age of AI, intelligence tokens are the new currency, and AI factories are the infrastructure that generates them,” Huang said. It is the kind of formulation — tokens as currency, factories as mints — that reveals how Nvidia thinks about its place in the emerging economic order.

What Nvidia’s grand vision gets right — and what remains unproven

The scale and coherence of Monday’s announcements are genuinely impressive. No other company in the semiconductor industry — and arguably no other technology company, period — can present an integrated stack spanning custom silicon, systems architecture, networking, storage, inference software, open models, agent frameworks, safety runtimes, simulation platforms, digital twin infrastructure and vertical applications from drug discovery to autonomous driving to orbital computing.

But scale and coherence are not the same as inevitability. The performance claims for Vera Rubin, while dramatic, remain largely unverified by independent benchmarks. The agentic AI thesis that underpins the entire platform — the idea that autonomous, long-running AI agents will become the dominant computing workload — is a bet on a future that has not yet fully materialized. And Nvidia’s expanding role as a provider of models, software, and reference architectures raises questions about how long its hardware customers will remain comfortable depending so heavily on a single supplier for so many layers of their stack.

Competitors are not standing still. AMD continues to close the gap on data center GPU performance. Google’s TPUs power some of the world’s largest AI training runs. Amazon’s Trainium chips are gaining traction inside AWS. And a growing cohort of startups is attacking various pieces of the AI infrastructure puzzle.

Yet none of them showed up at GTC on Monday with endorsements from the CEOs of Anthropic and OpenAI. None of them announced seven new chips in full production simultaneously. And none of them presented a vision this comprehensive for what comes next.

There is a scene that repeats at every GTC: Huang, in his trademark leather jacket, holds up a chip the way a jeweler holds up a diamond, rotating it slowly under the stage lights. It is part showmanship, part sermon. But the congregation keeps growing, the chips keep getting faster, and the checks keep getting larger. Whether Nvidia is building the greatest infrastructure in history or simply the most profitable one may, in the end, be a distinction without a difference.

Infrastructure

Rethinking AEO when software agents navigate the web on behalf of users

For more than two decades, digital businesses have relied on a simple assumption: When someone interacts with a website, that activity reflects a human making a conscious choice. Clicks are treated as signals of interest. Time on page is assumed to ind…

DataDecisionMakers, Infrastructure, Orchestration, technology