Salesforce launches Headless 360 to turn its entire platform into infrastructure for AI agents

Salesforce on Wednesday unveiled the most ambitious architectural transformation in its 27-year history, introducing “Headless 360” — a sweeping initiative that exposes every capability in its platform as an API, MCP tool, or CLI command so AI agents can operate the entire system without ever opening a browser.

The announcement, made at the company’s annual TDX developer conference in San Francisco, ships more than 100 new tools and skills immediately available to developers. It marks a decisive response to the existential question hanging over enterprise software: In a world where AI agents can reason, plan, and execute, does a company still need a CRM with a graphical interface?

Salesforce’s answer: No — and that’s exactly the point.

“We made a decision two and a half years ago: Rebuild Salesforce for agents,” the company said in its announcement. “Instead of burying capabilities behind a UI, expose them so the entire platform will be programmable and accessible from anywhere.”

The timing is anything but coincidental. Salesforce finds itself navigating one of the most turbulent periods in enterprise software history — a sector-wide sell-off that has pushed the iShares Expanded Tech-Software Sector ETF down roughly 28% from its September peak. The fear driving the decline: that AI, particularly large language models from Anthropic, OpenAI, and others, could render traditional SaaS business models obsolete.

Jayesh Govindarjan, EVP of Salesforce and one of the key architects behind the Headless 360 initiative, described the announcement as rooted not in marketing theory but in hard-won lessons from deploying agents with thousands of enterprise customers.

“The problem that emerged is the lifecycle of building an agentic system for every one of our customers on any stack, whether it’s ours or somebody else’s,” Govindarjan told VentureBeat in an exclusive interview. “The challenge that they face is very much the software development challenge. How do I build an agent? That’s only step one.”

More than 100 new tools give coding agents full access to the Salesforce platform for the first time

Salesforce Headless 360 rests on three pillars that collectively represent the company’s attempt to redefine what an enterprise platform looks like in the agentic era.

The first pillar — build any way you want — delivers more than 60 new MCP (Model Context Protocol) tools and 30-plus preconfigured coding skills that give external coding agents like Claude Code, Cursor, Codex, and Windsurf complete, live access to a customer’s entire Salesforce org, including data, workflows, and business logic. Developers no longer need to work inside Salesforce’s own IDE. They can direct AI coding agents from any terminal to build, deploy, and manage Salesforce applications.

Agentforce Vibes 2.0, the company’s own native development environment, now includes what it calls an “open agent harness” supporting both the Anthropic agent SDK and the OpenAI agents SDK. As demonstrated during the keynote, developers can choose between Claude Code and OpenAI agents depending on the task, with the harness dynamically adjusting available capabilities based on the selected agent. The environment also adds multi-model support, including Claude Sonnet and GPT-5, along with full org awareness from the start.

A significant technical addition is native React support on the Salesforce platform. During the keynote demo, presenters built a fully functional partner service application using React — not Salesforce’s own Lightning framework — that connected to org metadata via GraphQL while inheriting all platform security primitives. This opens up dramatically more expressive front-end possibilities for developers who want complete control over the visual layer.

The second pillar — deploy on any surface — centers on the new Agentforce Experience Layer, which separates what an agent does from how it appears, rendering rich interactive components natively across Slack, mobile apps, Microsoft Teams, ChatGPT, Claude, Gemini, and any client supporting MCP apps. During the keynote, presenters defined an experience once and deployed it across six different surfaces without writing surface-specific code. The philosophical shift is significant: rather than pulling customers into a Salesforce UI, enterprises push branded, interactive agent experiences into whatever workspace their customers already inhabit.

The third pillar — build agents you can trust at scale — introduces an entirely new suite of lifecycle management tools spanning testing, evaluation, experimentation, observation, and orchestration. Agent Script, the company’s new domain-specific language for defining agent behavior deterministically, is now generally available and open-sourced. A new Testing Center surfaces logic gaps and policy violations before deployment. Custom Scoring Evals let enterprises define what “good” looks like for their specific use case. And a new A/B Testing API enables running multiple agent versions against real traffic simultaneously.

Why enterprise customers kept breaking their own AI agents — and how Salesforce redesigned its tooling in response

Perhaps the most technically significant — and candid — portion of VentureBeat’s interview with Govindarjan addressed the fundamental engineering tension at the heart of enterprise AI: agents are probabilistic systems, but enterprises demand deterministic outcomes.

Govindarjan explained that early Agentforce customers, after getting agents into production through “sheer hard work,” discovered a painful reality. “They were afraid to make changes to these agents, because the whole system was brittle,” he said. “You make one change and you don’t know whether it’s going to work 100% of the time. All the testing you did needs to be redone.”

This brittleness problem drove the creation of Agent Script, which Govindarjan described as a programming language that “brings together the determinism that’s in programming languages with the inherent flexibility in probabilistic systems that LLMs provide.” The language functions as a single flat file — versionable, auditable — that defines a state machine governing how an agent behaves. Within that machine, enterprises specify which steps must follow explicit business logic and which can reason freely using LLM capabilities.

Salesforce open-sourced Agent Script this week, and Govindarjan noted that Claude Code can already generate it natively because of its clean documentation. The approach stands in sharp contrast to the “vibe coding” movement gaining traction elsewhere in the industry. As the Wall Street Journal recently reported, some companies are now attempting to vibe-code entire CRM replacements — a trend Salesforce’s Headless 360 directly addresses by making its own platform the most agent-friendly substrate available.

Govindarjan described the tooling as a product of Salesforce’s own internal practice. “We needed these tools to make our customers successful. Then our FDEs needed them. We hardened them, and then we gave them to our customers,” he told VentureBeat. In other words, Salesforce productized its own pain.

Inside the two competing AI agent architectures Salesforce says every enterprise will need

Govindarjan drew a revealing distinction between two fundamentally different agentic architectures emerging in the enterprise — one for customer-facing interactions and one he linked to what he called the “Ralph Wiggum loop.”

Customer-facing agents — those deployed to interact with end customers for sales or service — demand tight deterministic control. “Before customers are willing to put these agents in front of their customers, they want to make sure that it follows a certain paradigm — a certain brand set of rules,” Govindarjan told VentureBeat. Agent Script encodes these as a static graph — a defined funnel of steps with LLM reasoning embedded within each step.

The “Ralph Wiggum loop,” by contrast, represents the opposite end of the spectrum: a dynamic graph that unrolls at runtime, where the agent autonomously decides its next step based on what it learned in the previous step, killing dead-end paths and spawning new ones until the task is complete. This architecture, Govindarjan said, manifests primarily in employee-facing scenarios — developers using coding agents, salespeople running deep research loops, marketers generating campaign materials — where an expert human reviews the output before it ships.

“Ralph Wiggum loops are great for employee-facing because employees are, in essence, experts at something,” Govindarjan explained. “Developers are experts at development, salespeople are experts at sales.”

The critical technical insight: both architectures run on the same underlying platform and the same graph engine. “This is a dynamic graph. This is a static graph,” he said. “It’s all a graph underneath.” That unified runtime — spanning the spectrum from tightly controlled customer interactions to free-form autonomous loops — may be Salesforce’s most important technical bet, sparing enterprises from maintaining separate platforms for different agent modalities.

Salesforce hedges its bets on MCP while opening its ecosystem to every major AI model and tool

Salesforce’s embrace of openness at TDX was striking. The platform now integrates with OpenAI, Anthropic, Google Gemini, Meta’s LLaMA, and Mistral AI models. The open agent harness supports third-party agent SDKs. MCP tools work from any coding environment. And the new AgentExchange marketplace unifies 10,000 Salesforce apps, 2,600-plus Slack apps, and 1,000-plus Agentforce agents, tools, and MCP servers from partners including Google, Docusign, and Notion, backed by a new $50 million AgentExchange Builders Initiative.

Yet Govindarjan offered a surprisingly candid assessment of MCP itself — the protocol Anthropic created that has become a de facto standard for agent-tool communication.

“To be very honest, not at all sure” that MCP will remain the standard, he told VentureBeat. “When MCP first came along as a protocol, a lot of us engineers felt that it was a wrapper on top of a really well-written CLI — which now it is. A lot of people are saying that maybe CLI is just as good, if not better.”

His approach: pragmatic flexibility. “We’re not wedded to one or the other. We just use the best, and often we will offer all three. We offer an API, we offer a CLI, we offer an MCP.” This hedging explains the “Headless 360” naming itself — rather than betting on a single protocol, Salesforce exposes every capability across all three access patterns, insulating itself against protocol shifts.

Engine, the B2B travel management company featured prominently in the keynote demos, offered a real-world proof point for the open ecosystem approach. The company built its customer service agent, Ava, in 12 days using Agentforce and now handles 50% of customer cases autonomously. Engine runs five agents across customer-facing and employee-facing functions, with Data 360 at the heart of its infrastructure and Slack as its primary workspace. “CSAT goes up, costs to deliver go down. Customers are happier. We’re getting them answers faster. What’s the trade off? There’s no trade off,” an Engine executive said during the keynote.

Underpinning all of it is a shift in how Salesforce gets paid. The company is moving from per-seat licensing to consumption-based pricing for Agentforce — a transition Govindarjan described as “a business model change and innovation for us.” It’s a tacit acknowledgment that when agents, not humans, are doing the work, charging per user no longer makes sense.

Salesforce isn’t defending the old model — it’s dismantling it and betting the company on what comes next

Govindarjan framed the company’s evolution in architectural terms. Salesforce has organized its platform around four layers: a system of context (Data 360), a system of work (Customer 360 apps), a system of agency (Agentforce), and a system of engagement (Slack and other surfaces). Headless 360 opens every layer via programmable endpoints.

“What you saw today, what we’re doing now, is we’re opening up every single layer, right, with MCP tools, so we can go build the agentic experiences that are needed,” Govindarjan told VentureBeat. “I think you’re seeing a company transforming itself.”

Whether that transformation succeeds will depend on execution across thousands of customer deployments, the staying power of MCP and related protocols, and the fundamental question of whether incumbent enterprise platforms can move fast enough to remain relevant when AI agents can increasingly build new systems from scratch. The software sector’s bear market, the financial pressures bearing down on the entire industry, and the breathtaking pace of LLM improvement all conspire to make this one of the highest-stakes bets in enterprise technology.

But there is an irony embedded in Salesforce’s predicament that Headless 360 makes explicit. The very AI capabilities that threaten to displace traditional software are the same capabilities that Salesforce now harnesses to rebuild itself. Every coding agent that could theoretically replace a CRM is now, through Headless 360, a coding agent that builds on top of one. The company is not arguing that agents won’t change the game. It’s arguing that decades of accumulated enterprise data, workflows, trust layers, and institutional logic give it something no coding agent can generate from a blank prompt.

As Benioff declared on CNBC’s Mad Money in March: “The software industry is still alive, well and growing.” Headless 360 is his company’s most forceful attempt to prove him right — by tearing down the walls of the very platform that made Salesforce famous and inviting every agent in the world to walk through the front door.

Parker Harris, Salesforce’s co-founder, captured the bet most succinctly in a question he posed last month: “Why should you ever log into Salesforce again?”

If Headless 360 works as designed, the answer is: You shouldn’t have to. And that, Salesforce is wagering, is precisely what will keep you paying for it.

Are we getting what we paid for? How to turn AI momentum into measurable value

Enterprise AI is entering a new phase — one where the central question is no longer what can be built, but how to make the most of our AI investment.

At VentureBeat’s latest AI Impact Tour session, Brian Gracely, director of portfolio strategy at Red Hat, described the operational reality inside large organizations: AI sprawl, rising inference costs, and limited visibility into what those investments are actually returning.

It’s the “Day 2” moment — when pilots give way to production, and cost, governance, and sustainability become harder than building the system in the first place.

“We’ve seen customers who say, ‘I have 50,000 licenses of Copilot. I don’t really know what people are getting out of that. But I do know that I’m paying for the most expensive computing in the world, because it’s GPUs,'” Gracely said. “‘How am I going to get that under control?'”

Why enterprise AI costs are now a board-level problem

For much of the past two years, cost was not the primary concern for organizations evaluating generative AI. The experimental phase gave teams cover to spend freely, and the promise of productivity gains justified aggressive investment, but that dynamic is shifting as enterprises enter their second and third budget cycles with AI. The focus has moved from “can we build something?” to “are we getting what we paid for?”

Enterprises that made large, early bets on managed AI services are conducting hard reviews of whether those investments are delivering measurable value. The issue isn’t just that GPU computing is expensive. It is that many organizations lack the instrumentation to connect spending to outcomes, making it nearly impossible to justify renewals or scale responsibly.

The strategic shift from token consumer to token producer

The dominant AI procurement model of the past few years has been straightforward: pay a vendor per token, per seat, or per API call, and let someone else manage the infrastructure. That model made sense as a starting point but is increasingly being questioned by organizations with enough experience to compare alternatives.

Enterprises that have been through one AI cycle are starting to rethink that model.

“Instead of being purely a token consumer, how can I start being a token generator?” Gracely said. “Are there use cases and workloads that make sense for me to own more? It may mean operating GPUs. It may mean renting GPUs. And then asking, ‘Does that workload need the greatest state-of-the-art model? Are there more capable open models or smaller models that fit?'”

The decision is not binary. The right answer depends on the workload, the organization, and the risk tolerance involved, but the math is getting more complicated as the number of capable open models, from DeepSeek to models now available through cloud marketplaces, grows. Now enterprises actually have real alternatives to the handful of providers that dominated the landscape two years ago.

Falling AI costs and rising usage create a paradox for enterprise budgets

Some enterprise leaders argue that locking into infrastructure investments now could mean significantly overpaying in the long run, pointing to the statement from Anthropic CEO Dario Amodei that AI inference costs are declining roughly 60% per year.

The emergence of open-source models such as DeepSeek and others has meaningfully expanded the strategic options available to enterprises that are willing to invest in the underlying infrastructure in the last three years.

But while costs per token are falling, usage is accelerating at a pace that more than offsets efficiency gains. It’s a version of Jevons Paradox, the economic principle that improvements in resource efficiency tend to increase total consumption rather than reduce it, as lower cost enables broader adoption.

For enterprise budget planners, this means declining unit costs do not translate into declining total bills. An organization that triples its AI usage while costs fall by half still ends up spending more than it did before. The consideration becomes which workloads genuinely require the most capable and most expensive models, and which can be handled just fine by smaller, cheaper alternatives.

The business case for investing in AI infrastructure flexibility

The prescription isn’t to slow down AI investment, but to build with flexibility being top of mind. The organizations that will win aren’t necessarily the ones that move fastest or spend the most; they’re the ones building infrastructure and operating models capable of absorbing the next unexpected development.

“The more you can build some abstractions and give yourself some flexibility, the more you can experiment without running up costs, but also without jeopardizing your business. Those are as important as asking whether you’re doing everything best practice right now,” Gracely explained.

But despite how entrenched AI discussions have become in enterprise planning cycles, the practical experience most organizations have is still measured in years, not decades.

“It feels like we’ve been doing this forever. We’ve been doing this for three years,” Gracely added. “It’s early and it’s moving really fast. You don’t know what’s coming next. But the characteristics of what’s coming next — you should have some sense of what that looks like.”

For enterprise leaders still calibrating their AI investment strategies, that may be the most actionable takeaway: the goal is not to optimize for today’s cost structure, but to build the organizational and technical flexibility to adapt when, not if, it changes again.

AI lowered the cost of building software. Enterprise governance hasn’t caught up

Presented by Retool


The logic used to be: buying software is cheaper, faster, and safer for most use cases. Building was reserved for companies with large engineering teams, deep pockets, and problems so specific that no vendor could address them. But now, the cost to code a piece of software has dropped to zero.

Anyone can build their own software now, but enterprise and governance models have yet to catch up. Retool’s 2026 Build vs. Buy Shift Report, based on a survey of 817 builders, traces exactly how this shift is playing out.

The cost curve changed; SaaS pricing didn’t

Two years ago, a custom internal tool might have taken an engineering team weeks or months and cost six figures. Today, an operations lead with the right platform can have a working prototype in a day or two. This structural shift is driven by AI-assisted development and the maturation of enterprise app-building platforms.

Meanwhile, SaaS pricing hasn’t adjusted, still charging per-seat for generic software that requires customization and integration costs on top. When the cost of building drops by an order of magnitude but the cost of buying stays flat, the math changes for every company, not just the ones with large engineering teams.

The data reflects this. Retool’s report found that 35% of teams have already replaced at least one SaaS tool with a custom build, and 78% plan to build more custom tooling in 2026.

Workflow automations and admin tools are among SaaS tools at risk

The shift isn’t happening uniformly. The top SaaS tools respondents have replaced or considered replacing include workflow automations (35%) and internal admin tools (33%), followed by BI tools (29%) and CRMs (25%).

A purchased workflow automation tool has to serve thousands of customers, so it optimizes for the average case — and the average case is nobody’s actual case. Every company’s internal workflows are different. They reflect org structure, compliance requirements, data systems, and business logic unique to that organization.

Internal admin tools carry the same problem: they’re inherently company-specific. These categories were always the most awkward fit for off-the-shelf software, and there’s now an affordable, accessible alternative (MIT’s State of AI in Business reported $2-10 million in savings annually for customer service and document processing tasks).

The replacement pattern tends to be additive rather than wholesale (nobody is just ripping out Salesforce). They’re replacing the specific pieces that never quite fit: an approval flow that required three workarounds, the dashboard that couldn’t connect to their actual data … but those narrow replacements add up. Once a team builds one tool that works better than what they bought, the default question shifts from “What should we buy?” to “Can we build this?”

Builders go around IT, signaling broader procurement challenges

The clearest evidence that procurement processes haven’t kept up with building capability is the scale of shadow IT now occurring inside enterprises. Retool’s report found that 60% of builders have created tools, workflows, or automations outside of IT oversight in the past year — and 25% report doing so frequently.

Even experienced, high-judgment people choose speed over process. Two-thirds of total survey respondents (64%) are senior managers and above. Existing procurement cycles weren’t designed for a world where building software takes days rather than months. When people love to quote the 95% generative AI pilot failure rate they’re not accounting for the robust grassroots adoption happening under executives’ noses.

Shadow IT at this scale is a demand signal. The people closest to the problems are telling organizations that the existing process can’t can’t keep up — 31% of those going around IT do so simply because they can build faster than IT can provision tools. So, suppression isn’t a productive response. The challenge is that the tools being built in the shadows are also the ones most likely to stall before they become useful.

A vibe-coded prototype running on sample data is impressive. A production tool connected to your actual Salesforce instance, with role-based access and a security review, is useful. The report found that 51% of builders have shipped production software currently in use by their teams, and among those, about half report saving six or more hours per week.

When building happens in an ungoverned environment, organizations get neither outcome reliably. Someone connects an AI-powered tool to production data with no audit trail, no access controls, and no owner. Multiply that by dozens of builders across an organization, and you have an expanding security surface that IT doesn’t even know exists.[1]

The teams whose homebuilt solutions reach production tend to have three things the others don’t: connectivity to real data sources, a security and permissions model they trust, and a review process for what gets deployed. Channeling builder energy into governed environments, where speed and security aren’t in conflict, is how organizations avoid shadow IT becoming a liability.

Governance will define the next era of SaaS

The build vs. buy shift is already underway. The more important question now is who controls the environment where that building happens.

Ungoverned building invites security risks and makes the ROI case difficult to close. You can’t measure time saved by tools IT doesn’t know exist, or are only run in one individual’s workflow. You can’t enforce access controls on a prototype that someone connected to production data last Tuesday. And those aren’t hypothetical risks: in Deloitte’s 2026 State of AI in the Enterprise survey of 3,200+ leaders, data privacy and security ranked as the top AI concern at 73%, with governance capabilities close behind at 46%. The 35% of organizations with no AI productivity metrics are missing more than just a dashboard. They’re missing the accountability infrastructure that justifies building over buying in the first place.

The organizations that treat governed environments as a prerequisite for building at scale will be the ones that can actually prove it’s working. The ones that don’t will find out when something breaks.

For a closer look at the data, including how enterprises are approaching AI-assisted building, read the full 2026 Build vs. Buy Shift Report.

[1] The cost of which can be steep: IBM’s 2025 Cost of Data Breach Report found that AI-associated cases cost organizations more than $650,000 per breach.


David Hsu is CEO at Retool.


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

AI-RAN is redefining enterprise edge intelligence and autonomy

Presented by Booz Allen


AI-RAN, or artificial intelligence radio area networks, is a reimagining of what wireless infrastructure can do. Rather than treating the network as a passive conduit for data, AI-RAN turns it into an active computational layer. It’s a sensor, a compute fabric, and a control plane for physical operations, all rolled into one. That shift has huge implications for industries from manufacturing and logistics to healthcare and smart infrastructure.

VentureBeat spoke with two leaders at the center of this transformation: Chris Christou, senior vice president at Booz Allen, and Shervin Gerami, managing director at Cerberus Operations Supply Chain Fund.

“AI-RAN can bring the promise of extending 5G and eventually 6G networks into the enterprise,” Christou said. “Proving that a platform can host inference at the edge to enable new types of AI — in particular, physical AI and autonomy-type use cases for things like smart manufacturing and smart warehousing — can make operations more efficient and effective.”

“AI-RAN lets enterprises move from digitizing processes to autonomously operating them,” Gerami added. “The enterprise investment should not look at AI-RAN as a networking upgrade. It’s an operating system for physical industries.”

The difference between AI for RAN, AI on RAN, and AI and RAN

The difference between AI for RAN, AI on RAN, and AI and RAN is critical. AI on RAN runs enterprise AI workloads on edge compute infrastructure integrated with the RAN, enabling real-time applications like computer vision, robotics, and localized LLM inference.

AI and RAN represents the deeper convergence — where networks are designed to be AI-native, with AI workloads and radio infrastructure architected together as a coordinated, distributed system. At this stage, RAN evolves from a transport layer into a foundational layer of the AI economy.

“This is the transformational part,” Gerami said. “It’s jointly designing applications with networks. Now the application knows the network state, and the network understands the application’s intent. AI for RAN saves money. AI on RAN adds capability. Then AI and RAN together create entirely new business models.”

It’s this layered framework that makes AI-RAN more than an incremental evolution of existing wireless technology, and instead a platform shift that opens the network to the kind of developer ecosystem and application innovation that has historically been the domain of cloud computing.

How ISAC turns the network into a sensor

Integrated sensing and communications (ISAC) is the center of the infrastructure. The network becomes the sensor, a converged infrastructure simultaneously communicating and sensing its environment at the same time it hosts algorithms and applications at the edge. It will enable drone detection, pedestrian safety, and automotive sensing, and eventually even more innovative use cases.

The enterprise value proposition of ISAC and a network as the sensor is clear, Gerami says. Today, organizations rely on multiple discrete systems to achieve situational awareness: cameras, radar, asset trackers, motion sensors and more. Each comes with its own maintenance burden, integration overhead and vendor relationship. ISAC has the potential to handle many of those capabilities natively within the network.

“With ISAC you can do asset tracking at sub-meter precision inside factories and hospitals,” he explained. “You can detect movement patterns, perimeter breaches, and anomalies. Smart buildings can have occupancy-aware HVAC and energy optimization.”

How AI-RAN shaves milliseconds off edge AI and inference

With AI-RAN, edge AI and low-latency inference become supercharged in use cases like real-time robotics management, instant quality inspection, and predictive maintenance. There are the applications where the latency gap between cloud and edge is the difference between a system that works and one that doesn’t.

“Where edge AI kicks in is driving operations in milliseconds, not seconds, which is what cloud does,” Gerami explained.

Split inference can also change the game, Christou says.

“You have a lot of different use cases where the processing is done on the device, making that device more expensive and more power-hungry,” he said. “Now there’s the possibility of offloading that to a local AI-RAN stack, even getting into concepts like split inference, so you do some of the inference on the device, some on the edge AI-RAN stack, and some in the cloud, all appropriate to the use cases and the time scale of the processing required.”

Why the timing of AI-RAN investment is critical now

AI-RAN investment has a narrow and strategically critical window, both Germani and Christou said.

“5G infrastructure is already being deployed, almost getting to a point of completion. 6G standards are not yet locked in,” Gerami explained. “This is an architectural moment for AI-RAN to come in. It allows the ability to not make RAN become a telco-centric design only. It allows the enterprise to become the co-creator of the application, the revenue and value generator of that network infrastructure.”

Historically, enterprise IT has consumed wireless standards rather than shaped them. AI-RAN’s open architecture, built on software-defined, cloud-native, containerized components, changes that standardization dynamic.

“Previously in the wireless industry it was a very long cycle. Now we’re seeing a push to get it implemented, get it out there, get early pilots, and then we’ll see how the technology works,” Christou said. Simultaneously, in parallel, you can start defining the standards. You have real-life implementation experience to help influence how those standards take shape.”

And the entry point is accessible, Gerami added.

“The barrier to entry is very low,” he said. “Right now, it’s all code-based, all software. It’s no different than downloading software. You get yourself an Nvidia box and you can deploy it with a radio.”

Why AI-RAN is the future of innovative AI use cases

“We see AI-RAN as being an open architecture that’s truly driving innovation,” Gerami said. “It’s a flywheel for innovation. We want to create everything to be microservices, open native, cloud native, to allow partners to build vertical AI apps. There’s so much focus right now in the industry around how we can adopt AI effectively, how it will enable autonomy and robotics. This is one of those foundational pieces that can help realize some of those use cases. The future is about owning that physical inference.”

“There’s so much focus right now in the industry around how we can adopt AI effectively — how it will enable autonomy and robotics,” Christou said. “This is one of those foundational pieces that can help realize some of those use cases.”


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Claude, OpenClaw and the new reality: AI agents are here — and so is the chaos

The age of agentic AI is upon us — whether we like it or not. What started with an innocent question-answer banter with ChatGPT back in 2022 has become an existential debate on job security and the rise of the machines.

More recently, fears of reaching artificial general intelligence (AGI) have become more real with the advent of powerful autonomous agents like Claude Cowork and OpenClaw. Having played with these tools for some time, here is a comparison.

First, we have OpenClaw (formerly known as Moltbot and Clawdbot). Surpassing 150,000 GitHub stars in days, OpenClaw is already being deployed on local machines with deep system access. This is like a robot “maid” (Irona for Richie Rich fans, for instance) that you give the keys to your house. It’s supposed to clean it, and you give it the necessary autonomy to take actions and manage your belongings (files and data) as it pleases. The whole purpose is to perform the task at hand — inbox triaging, auto-replies, content curation, travel planning, and more.

Next we have Google’s Antigravity, a coding agent with an IDE that accelerates the path from prompt to production. You can interactively create complete application projects and modify specific details over individual prompts. This is like having a junior developer that can not only code, but build, test, integrate, and fix issues. In the realworld, this is like hiring an electrician: They are really good at a specific job and you only need to give them access to a specific item (your electric junction box). 

Finally, we have the mighty Claude. The release of Anthropic’s Cowork, which featured AI agents for automating legal tasks like contract review and NDA triage, caused a sharp sell-off in legal-tech and software-as-a-service (SaaS) stocks (referred to as the SaaSpocalypse). Claude has anyway been the go-to chatbot; now with Cowork, it has domain knowledge for specific industries like legal and finance. This is like hiring an accountant. They know the domain inside-out and can complete taxes and manage invoices. Users provide specific access to highly-sensitive financial details.

Making these tools work for you

The key to making these tools more impactful is giving them more power, but that increases the risk of misuse. Users must trust providers like Anthorpic and Google to ensure that agent prompts will not cause harm, leak data, or provide unfair (illegal) advantage to certain vendors. OpenClaw is open-source, which complicates things, as there is no central governing authority. 

While these technological advancements are amazing and meant for the greater good, all it takes is one or two adverse events to cause panic. Imagine the agentic electrician frying all your house circuits by connecting the wrong wire. In an agent scenario, this could be injecting incorrect code, breaking down a bigger system or adding hidden flaws that may not be immediately evident. Cowork could miss major saving opportunities when doing a user’s taxes; on the flip side, it could include illegal writeoffs. Claude can do unimaginable damage when it has more control and authority.

But in the middle of this chaos, there is an opportunity to really take advantage. With the right guardrails in place, agents can focus on specific actions and avoid making random, unaccounted-for decisions. Principles of responsible AI — accountability, transparency, reproducibility, security, privacy — are extremely important. Logging agent steps and human confirmation are absolutely critical.

Also, when agents deal with so many diverse systems, it’s important they speak the same language. Ontology becomes very important so that events can be tracked, monitored, and accounted for. A shared domain-specific ontology can define a “code of conduct.” These ethics can help control the chaos. When tied together with a shared trust and distributed identity framework, we can build systems that enable agents to do truly useful work.

When done right, an agentic ecosystem can greatly offload the human “cognitive load” and enable our workforce to perform high-value tasks. Humans will benefit when agents handle the mundane.

Dattaraj Rao is innovation and R&D architect at Persistent Systems.

Nvidia-backed ThinkLabs AI raises $28 million to tackle a growing power grid crunch

ThinkLabs AI, a startup building artificial intelligence models that simulate the behavior of the electric grid, announced today that it has closed a $28 million Series A financing round led by Energy Impact Partners (EIP), one of the largest energy transition investment firms in the world. Nvidia’s venture capital arm NVentures and Edison International, the parent company of Southern California Edison, also participated in the round.

The funding marks a significant escalation in the race to apply AI not just to software and content generation, but to the physical infrastructure that powers modern life. While most AI investment headlines have centered on large language models and generative tools, ThinkLabs is pursuing a different and arguably more consequential application: using physics-informed AI to model the behavior of electrical grids in real time, compressing engineering studies that once took weeks or months into minutes.

“We are dead focused on the grid,” ThinkLabs CEO Josh Wong told VentureBeat in an exclusive interview ahead of the announcement. “We do AI models to model the grid, specifically transmission and distribution power flow related modeling. We can calculate things like interconnection of large loads — like data centers or electric vehicle charging — and understand the impact they have on the grid.”

The round drew participation from a deep bench of returning investors, including GE Vernova, Powerhouse Ventures, Active Impact Investments, Blackhorn Ventures, and Amplify Capital, along with an unnamed large North American investor-owned utility. The company initially set out to raise less than $28 million, according to Wong, but strong demand from strategic partners pushed the round higher.

“This was way oversubscribed,” Wong said. “We attracted the right ecosystem partners and the right capital partners to grow with, and that’s how we ended up at $28 million.”

Why surging electricity demand is breaking the grid’s legacy planning tools

The timing of the raise is no coincidence. U.S. electricity demand is projected to grow 25% by 2030, according to consultancy ICF International, driven largely by AI data centers, electrified transportation, and the broader push toward building and vehicle electrification. That surge is crashing into a grid that was engineered decades ago for a fundamentally different set of demands — and utilities are scrambling to keep up.

The core problem is one of computational capacity. When a utility needs to understand what will happen to its grid if a large data center connects to a particular substation, or if a cluster of EV chargers goes live in a residential neighborhood, engineers must run power flow simulations — complex calculations that model how electricity moves through the network. Those studies have traditionally relied on legacy software tools from companies like Siemens, GE, and Schneider Electric, and they can take weeks or months to complete for a single scenario.

ThinkLabs’ approach replaces that bottleneck with physics-informed AI models that learn from the same engineering simulators but can then run orders of magnitude faster. According to the company, its platform can compress a month-long grid study into under three minutes and run 10 million scenarios in 10 minutes, while maintaining greater than 99.7% accuracy on grid power flow calculations.

Wong draws a sharp distinction between what ThinkLabs does and the generative AI models that dominate public discourse. “We’re not hallucinating the heck out of things,” he said. “We are talking about engineering calculations here. I would really compare this to a computation of fluid dynamics, or like F1 cars, or aerospace, or climate models. We do have a source of truth from existing physics-based engineering models.”

That source of truth is crucial. ThinkLabs trains its AI on the outputs of first-principles physics simulators — the same tools utilities already trust — and then validates its models against those simulators. The result, Wong argues, is an AI system that is not only fast but fully explainable and auditable, a critical requirement in an industry where a miscalculation can cause blackouts or damage physical infrastructure.

How ThinkLabs’ three-phase power flow analysis differs from every other grid AI startup

The competitive landscape for AI in grid management has grown crowded over the past two years, with startups and incumbents alike racing to apply machine learning to utility workflows. But Wong contends that ThinkLabs occupies a fundamentally different position from most of its competitors.

“As far as we know, we’re the only ones actually doing AI-native grid simulation analysis,” he said. “Others might be using AI for forecasting, load disaggregation, or local energy management, but fundamentally, they’re not calculating a power flow.”

What ThinkLabs performs is a full three-phase AC power flow analysis — examining every node and bus on the electric grid to determine real and reactive power levels, line flows, and voltages. This is the same type of analysis that utility engineers perform today using legacy tools, but ThinkLabs can deliver it at a speed and scale that those tools simply cannot match.

The distinction matters because utilities make capital investment decisions — worth billions of dollars — based on exactly these types of studies. If a power flow analysis shows that a proposed data center connection will overload a transmission line, the utility may need to build new infrastructure at enormous cost. But if the analysis can also suggest alternative solutions — battery storage placement, load flexibility scheduling, or topology optimization — the utility can potentially avoid or defer those capital expenditures.

“With many utilities, existing tools will basically show them all the problems, but they can only address solutions by trial and error,” Wong explained. “With AI, we can use reinforcement learning to generate more creative solutions, but also very effectively weigh the pros and cons of each of these solutions.”

Inside ThinkLabs’ strategic relationships with NVIDIA, Edison, and Microsoft

The presence of NVentures in the round — Nvidia’s venture arm does not write many checks — signals a deeper strategic relationship that extends well beyond capital. Wong confirmed that ThinkLabs works extensively within the Nvidia ecosystem on the energy and utility side, leveraging CUDA for GPU-accelerated computation and integrating Nvidia’s Earth-2 climate simulation platform into ThinkLabs’ probabilistic forecasting and risk-adjusted analysis pipelines.

“We are what one utility mentioned as the only high-intensity GPU workload for the OT side — the operational technology side — that’s planning and operations,” Wong said. He added that ThinkLabs is also in discussions with Nvidia’s Omniverse team about additional utility use cases, though those efforts are still early.

Edison International’s participation carries a different kind of strategic weight. In January 2026, ThinkLabs publicly announced results from a collaboration with Southern California Edison (SCE), Edison International’s utility subsidiary, that demonstrated the real-world capabilities of its platform. As the Los Angeles Times reported at the time, the collaboration showed that ThinkLabs’ AI could train in minutes per circuit, process a full year of hourly power-flow data in under three minutes across more than 100 circuits, and produce engineering reports with bridging-solution recommendations in under 90 seconds — work that previously required dedicated engineers an average of 30 to 35 days.

In today’s announcement, Edison International’s Sergej Mahnovski, Managing Director of Strategy, Technology and Innovation, reinforced that urgency: “We must rapidly transition from legacy planning tools and processes to meet the growing demands on the electric grid — new AI-native solutions are needed to transform our capabilities.”

ThinkLabs also works closely with Microsoft, which hosted a webinar in mid-2025 featuring Wong alongside representatives from Southern Company, EPRI, and Microsoft’s own energy team. The SCE collaboration was built on Microsoft Azure AI Foundry, situating ThinkLabs within the cloud infrastructure that many large utilities already use.

The 20-year career path that led from Toronto Hydro to an autonomous grid startup

Wong’s biography reads like a deliberate preparation for this exact moment. He has spent more than 20 years in the utility industry, starting his career at Toronto Hydro before founding Opus One Solutions in 2012 — a smart-grid software company that he grew to over 100 employees serving customers across eight countries before selling it to GE in 2022, as previously reported by BetaKit.

After the acquisition, Wong joined what became GE Vernova and was asked to develop the company’s “grid of the future” roadmap. The thesis he developed there — that the grid is the central bottleneck to economic growth, electrification, and national security, and that autonomous grid orchestration powered by AI is the solution — became the intellectual foundation for ThinkLabs.

“I was pulling together the thesis that we need to electrify, but the grid is really at the center of attention,” Wong said. “The conclusion is we need to drive towards greater autonomy. We talk a lot about autonomous cars, but I would argue that autonomous grids is the much more pressing priority.”

ThinkLabs was incubated inside GE Vernova and spun out as an independent company in April 2024, coinciding with a $5 million seed round co-led by Powerhouse Ventures and Active Impact Investments, as reported by GlobeNewswire at the time. GE Vernova remains a shareholder and strategic partner. Wong is the sole founder.

The team composition reflects the company’s dual identity. “Half of our team are power system PhDs, but the other half are the AI folks — people who have been looking at hyper-scalable AI infrastructure platforms and MLOps for other industries,” Wong said. “We have really been blending the two.”

How ThinkLabs doubled its utility customer base in a single quarter

Utilities are famously among the most conservative technology buyers in the world, with procurement cycles that can stretch years and layers of regulatory oversight that slow adoption. Wong acknowledges this reality but says the landscape is shifting faster than many observers realize.

“I have noticed sales cycles really accelerating,” he said. “It’s still long and depends on which utility and how big the deal is, but we have been witnessing firsthand sales cycles going from the traditional one to two years to a shortest two to three months.”

On the commercial side, Wong declined to share specific revenue figures but offered several data points that suggest meaningful traction. ThinkLabs is working with more than 10 utilities on AI-native grid simulation for planning and operations, he said, and the company doubled its customer accounts in the first quarter of 2026 alone.

“So not one or two, but we’re working with 10-plus utilities,” Wong said. “Things have really picked up pace even before this A round.”

The company primarily targets investor-owned utilities and system operators — the organizations that own and operate the grid — though Wong noted that AI is also beginning to democratize grid simulation capabilities for smaller utilities that previously lacked the engineering resources to run sophisticated analyses.

Wong said the primary use of funds will go toward advancing the product to enterprise grade and expanding the range of use cases the platform supports. The company sees a significant land-and-expand opportunity within individual utility accounts — moving from modeling a small region to training AI models across entire states or multi-state territories within a single customer.

EIP’s involvement as lead investor carries particular significance in this market. The firm is backed by more than half of North America’s investor-owned utilities, giving ThinkLabs a direct line into the executive suites of the customers it is trying to reach. “Utilities are being asked to add capacity on timelines the industry has never seen before, and the stakes extend far beyond the energy sector,” Sameer Reddy, Managing Partner at EIP, said in the press release.

What a 99.7% accuracy rate actually means for critical grid infrastructure

Any conversation about applying AI to critical infrastructure inevitably confronts the question of failure modes. A hallucination in a chatbot is an embarrassment; a miscalculation in a grid power flow analysis could contribute to equipment damage or widespread outages.

Wong addressed this head-on. The 99.7% accuracy figure, he explained, is an average across large-volume planning studies — specifically 8,760-hour analyses (every hour of the year) projected across three to 10 years with multiple sensitivity scenarios. For planning purposes, he argued, this level of accuracy is not only sufficient but may actually exceed what traditional methods deliver in practice.

“If you look at a source of truth, the data quality is actually the biggest limiting factor, not the accuracy of these AI models,” he said. “When we bring in traditional engineering analysis and actually snap it with telemetry — metering data, SCADA data — I would actually argue AI is far more accurate because it is data driven on actual measurements, rather than hypothetical planning analysis based on scenarios.”

For more critical real-time applications, ThinkLabs deploys what Wong called “hybrid models” that blend AI computation with traditional physics-based simulation. In the most stringent use cases, the AI handles roughly 99% of the computational workload before handing off to a physics-based engine for final validation — a technique Wong described as using AI to “warm start” the simulation.

The company also monitors for model drift and maintains strict training boundaries. “We’re not like ChatGPT training the internet here,” Wong said. “We’re training on the possibility of grid conditions. And if we do see a condition where we did not train, or outside of our training boundary, we can always run on-demand training on those certain solution spaces.”

Why ThinkLabs says its value proposition survives even if the data center boom slows down

The bullish case for ThinkLabs — and for grid-focused AI more broadly — rests heavily on the assumption that electricity demand will surge dramatically over the coming decade. But some analysts have begun questioning whether those projections are inflated, particularly if AI investment cycles cool and data center build-outs decelerate.

Wong argued that his company’s value proposition is resilient to that scenario. Even without dramatic load growth, he said, utilities face a fundamental modernization challenge. They have been using tools and processes from the 1990s and 2000s, and the workforce that knows how to operate those tools is retiring at an alarming rate.

“Workforce renewal is a big factor,” he said. “These AI tools not only modernize the tool itself, but also modernize culture and transformation and become major points of retention for the next generation.”

He also pointed to energy affordability as a driver that exists independent of load growth projections. If utilities continue to plan based on worst-case deterministic scenarios — building enough infrastructure to cover every conceivable contingency — consumer rates will become unmanageable. AI-powered probabilistic analysis, Wong argued, allows utilities to make smarter, more cost-effective decisions regardless of whether the most aggressive demand forecasts materialize.

“A large part of this AI is not only enabling workload, but how do we act with intelligence — going from worst-case to time-series analysis, from deterministic to probabilistic and stochastic analysis, and also coming up with solutions,” he said.

Wong frames the broader opportunity with an analogy that captures both the simplicity and the ambition of what ThinkLabs is attempting. For decades, he said, the utility industry’s default response to grid constraints has been the equivalent of building wider highways — more wires, more copper, more steel. ThinkLabs wants to be the navigation system that reroutes traffic instead.

“In the past, when we drive, we always drive with what we are familiar with — just the big roads,” he said. “But with AI, we can optimize the traffic patterns to drive on much more effective routes. In this case, it might be a mix of wires, flexibility, batteries, and operational decisions.”

Whether ThinkLabs can deliver on that vision at the scale the grid demands remains an open question. But Wong, who has spent two decades building and selling grid software companies, is not thinking in terms of incremental improvement. He sees a narrow window — measured in years, not decades — during which the foundational AI infrastructure for the grid will be built, and whoever builds it will shape the energy system for a generation.

“I truly believe the next two years of AI development for the grid will dictate the next decades of what can happen to the grid,” Wong said. “It’s really here now.”

The grid, in other words, is getting a copilot. The question is no longer whether utilities will trust AI with their most critical engineering decisions, but how quickly they can afford not to.

Google’s new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more

As Large Language Models (LLMs) expand their context windows to process massive documents and intricate conversations, they encounter a brutal hardware reality known as the “Key-Value (KV) cache bottleneck.”

Every word a model processes must be stored as a high-dimensional vector in high-speed memory. For long-form tasks, this “digital cheat sheet” swells rapidly, devouring the graphics processing unit (GPU) video random access memory (VRAM) system used during inference, and slowing the model performance down rapidly over time.

But have no fear, Google Research is here: yesterday, the unit within the search giant released its TurboQuant algorithm suite — a software-only breakthrough that provides the mathematical blueprint for extreme KV cache compression, enabling a 6x reduction on average in the amount of KV memory a given model uses, and 8x performance increase in computing attention logits, which could reduce costs for enterprises that implement it on their models by more than 50%.

The theoretically grounded algorithms and associated research papers are available now publicly for free, including for enterprise usage, offering a training-free solution to reduce model size without sacrificing intelligence.

The arrival of TurboQuant is the culmination of a multi-year research arc that began in 2024. While the underlying mathematical frameworks—including PolarQuant and Quantized Johnson-Lindenstrauss (QJL)—were documented in early 2025, their formal unveiling today marks a transition from academic theory to large-scale production reality.

The timing is strategic, coinciding with the upcoming presentations of these findings at the upcoming conferences International Conference on Learning Representations (ICLR 2026) in Rio de Janeiro, Brazil, and Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026) in Tangier, Morocco.

By releasing these methodologies under an open research framework, Google is providing the essential “plumbing” for the burgeoning “Agentic AI” era: the need for massive, efficient, and searchable vectorized memory that can finally run on the hardware users already own. Already, it is believed to have an effect on the stock market, lowering the price of memory providers as traders look to the release as a sign that less memory will be needed (perhaps incorrect, given Jevons’ Paradox).

The Architecture of Memory: Solving the Efficiency Tax

To understand why TurboQuant matters, one must first understand the “memory tax” of modern AI. Traditional vector quantization has historically been a “leaky” process.

When high-precision decimals are compressed into simple integers, the resulting “quantization error” accumulates, eventually causing models to hallucinate or lose semantic coherence.

Furthermore, most existing methods require “quantization constants”—meta-data stored alongside the compressed bits to tell the model how to decompress them. In many cases, these constants add so much overhead—sometimes 1 to 2 bits per number—that they negate the gains of compression entirely.

TurboQuant resolves this paradox through a two-stage mathematical shield. The first stage utilizes PolarQuant, which reimagines how we map high-dimensional space.

Rather than using standard Cartesian coordinates (X, Y, Z), PolarQuant converts vectors into polar coordinates consisting of a radius and a set of angles.

The breakthrough lies in the geometry: after a random rotation, the distribution of these angles becomes highly predictable and concentrated. Because the “shape” of the data is now known, the system no longer needs to store expensive normalization constants for every data block. It simply maps the data onto a fixed, circular grid, eliminating the overhead that traditional methods must carry.

The second stage acts as a mathematical error-checker. Even with the efficiency of PolarQuant, a residual amount of error remains. TurboQuant applies a 1-bit Quantized Johnson-Lindenstrauss (QJL) transform to this leftover data. By reducing each error number to a simple sign bit (+1 or -1), QJL serves as a zero-bias estimator. This ensures that when the model calculates an “attention score”—the vital process of deciding which words in a prompt are most relevant—the compressed version remains statistically identical to the high-precision original.

Performance benchmarks and real-world reliability

The true test of any compression algorithm is the “Needle-in-a-Haystack” benchmark, which evaluates whether an AI can find a single specific sentence hidden within 100,000 words.

In testing across open-source models like Llama-3.1-8B and Mistral-7B, TurboQuant achieved perfect recall scores, mirroring the performance of uncompressed models while reducing the KV cache memory footprint by a factor of at least 6x.

This “quality neutrality” is rare in the world of extreme quantization, where 3-bit systems usually suffer from significant logic degradation.

Beyond chatbots, TurboQuant is transformative for high-dimensional search. Modern search engines increasingly rely on “semantic search,” comparing the meanings of billions of vectors rather than just matching keywords. TurboQuant consistently achieves superior recall ratios compared to existing state-of-the-art methods like RabbiQ and Product Quantization (PQ), all while requiring virtually zero indexing time.

This makes it an ideal candidate for real-time applications where data is constantly being added to a database and must be searchable immediately. Furthermore, on hardware like NVIDIA H100 accelerators, TurboQuant’s 4-bit implementation achieved an 8x performance boost in computing attention logs, a critical speedup for real-world deployments.

Rapt community reaction

The reaction on X, obtained via a Grok search, included a mixture of technical awe and immediate practical experimentation.

The original announcement from @GoogleResearch generated massive engagement, with over 7.7 million views, signaling that the industry was hungry for a solution to the memory crisis.

Within 24 hours of the release, community members began porting the algorithm to popular local AI libraries like MLX for Apple Silicon and llama.cpp.

Technical analyst @Prince_Canuma shared one of the most compelling early benchmarks, implementing TurboQuant in MLX to test the Qwen3.5-35B model.

Across context lengths ranging from 8.5K to 64K tokens, he reported a 100% exact match at every quantization level, noting that 2.5-bit TurboQuant reduced the KV cache by nearly 5x with zero accuracy loss. This real-world validation echoed Google’s internal research, proving that the algorithm’s benefits translate seamlessly to third-party models.

Other users focused on the democratization of high-performance AI. @NoahEpstein_ provided a plain-English breakdown, arguing that TurboQuant significantly narrows the gap between free local AI and expensive cloud subscriptions.

He noted that models running locally on consumer hardware like a Mac Mini “just got dramatically better,” enabling 100,000-token conversations without the typical quality degradation.

Similarly, @PrajwalTomar_ highlighted the security and speed benefits of running “insane AI models locally for free,” expressing “huge respect” for Google’s decision to share the research rather than keeping it proprietary.

Market impact and the future of hardware

The release of TurboQuant has already begun to ripple through the broader tech economy. Following the announcement on Tuesday, analysts observed a downward trend in the stock prices of major memory suppliers, including Micron and Western Digital.

The market’s reaction reflects a realization that if AI giants can compress their memory requirements by a factor of six through software alone, the insatiable demand for High Bandwidth Memory (HBM) may be tempered by algorithmic efficiency.

As we move deeper into 2026, the arrival of TurboQuant suggests that the next era of AI progress will be defined as much by mathematical elegance as by brute force. By redefining efficiency through extreme compression, Google is enabling “smarter memory movement” for multi-step agents and dense retrieval pipelines. The industry is shifting from a focus on “bigger models” to “better memory,” a change that could lower AI serving costs globally.

Strategic considerations for enterprise decision-makers

For enterprises currently using or fine-tuning their own AI models, the release of TurboQuant offers a rare opportunity for immediate operational improvement.

Unlike many AI breakthroughs that require costly retraining or specialized datasets, TurboQuant is training-free and data-oblivious.

This means organizations can apply these quantization techniques to their existing fine-tuned models—whether they are based on Llama, Mistral, or Google’s own Gemma—to realize immediate memory savings and speedups without risking the specialized performance they have worked to build.

From a practical standpoint, enterprise IT and DevOps teams should consider the following steps to integrate this research into their operations:

Optimize Inference Pipelines: Integrating TurboQuant into production inference servers can reduce the number of GPUs required to serve long-context applications, potentially slashing cloud compute costs by 50% or more.

Expand Context Capabilities: Enterprises working with massive internal documentation can now offer much longer context windows for retrieval-augmented generation (RAG) tasks without the massive VRAM overhead that previously made such features cost-prohibitive.

Enhance Local Deployments: For organizations with strict data privacy requirements, TurboQuant makes it feasible to run highly capable, large-scale models on on-premise hardware or edge devices that were previously insufficient for 32-bit or even 8-bit model weights.

Re-evaluate Hardware Procurement: Before investing in massive HBM-heavy GPU clusters, operations leaders should assess how much of their bottleneck can be resolved through these software-driven efficiency gains.

Ultimately, TurboQuant proves that the limit of AI isn’t just how many transistors we can cram onto a chip, but how elegantly we can translate the infinite complexity of information into the finite space of a digital bit. For the enterprise, this is more than just a research paper; it is a tactical unlock that turns existing hardware into a significantly more powerful asset.

Cloudflare’s new Dynamic Workers ditch containers to run AI agent code 100x faster

Web infrastructure giant Cloudflare is seeking to transform the way enterprises deploy AI agents with the open beta release of Dynamic Workers, a new lightweight, isolate-based sandboxing system that it says starts in milliseconds, uses only a few mega…

Liquid-cooled AI systems expose the limits of traditional storage architecture

Presented by Solidigm


Liquid cooling is rewriting the rules of AI infrastructure, but most deployments have not fully crossed the line. GPUs and CPUs have moved to liquid cooling, while storage has depended on airflow, creating an operationally inefficient hybrid architecture.

What appears to be a pragmatic transition strategy is, in practice, a structural liability.

“A hybrid cooling approach is an operationally inefficient situation,” explains Hardeep Singh, thermal-mechanical hardware team manager at Solidigm. “You’re paying for and maintaining two entirely separate, expensive cooling infrastructures, and could be exposed to the worst-of-both-world’s problems.”

While liquid cooling requires pumps, fluid manifolds, and coolant distribution units (CDUs), air-cooled components require CRAC units, cold aisles, and evaporative cooling towers. Organizations moving to a hybrid solution by just adding some liquid cooling are absorbing the cost premium without capturing the full TCO benefit.

The thermal physics makes things worse. Bulky liquid-cooling cold plates, thick hoses, and manifolds physically obstruct airflow inside the GPU server chassis. This concentrates thermal stress on the remaining air-cooled components, including storage drives, memory, and network cards, because server fans cannot push adequate airflow around the liquid plumbing. The components most reliant on fans end up in the worst possible thermal environment.

Water consumption is an all-but ignored, equally serious problem. Traditional air-cooled components rely on server fans to move heat into ambient air, which is then absorbed by a water loop and pumped to evaporative cooling towers. These systems can consume millions of gallons of water over time. As rack power densities continue to climb to support modern AI workloads, the evaporative water penalty becomes, as Singh puts it, “environmentally and economically indefensible.”

As AI infrastructure evolves toward liquid-cooled and fanless GPU systems, the true constraints on scale are shifting from compute performance to system-level thermal design. Modern AI platforms are no longer built server by server; they are engineered as tightly integrated rack- and pod-level systems where power delivery, cooling distribution, and component placement are inseparable.

In this environment, storage architectures designed for airflow-dependent data centers are becoming a limiting factor. As GPU platforms move fully into shared liquid-cooling domains, anchored by rack-level CDUs, every component in the system must operate natively within the same thermal and mechanical design. Storage can no longer rely on isolated cooling paths or bespoke thermal assumptions without introducing inefficiency, complexity, or density trade-offs at the system level.

Why storage is no longer a passive subsystem

For infrastructure leaders, this marks a fundamental transition. Storage is no longer a passive subsystem attached to compute, but instead an active participant in system-level cooling, serviceability, and GPU utilization. The ability to scale AI now depends on whether storage can integrate cleanly into liquid-cooled GPU systems, without fragmenting cooling architectures or constraining rack-level design.

And the race to scale AI is no longer just about who has the most GPUs, but instead about who can keep them cool, says Scott Shadley, director of leadership narrative and evangelist at Solidigm.

“Finding a way to enable liquid-cooled storage while still making it user serviceable has been one of the biggest challenges in designing fanless system solutions,” Shadley says. “As AI workloads evolve, the pressure on storage will only intensify.”

Techniques like KV cache offload, which move data between GPU memory and high-speed storage during inference, make storage latency and thermal performance directly relevant to model serving efficiency. In these architectures, a storage subsystem that throttles due to poor traditional airflow under thermal load slows down both reads and the model itself.

Moving to integrated liquid cooling

Moving from traditional air-cooled GPU servers to integrated liquid-cooled racks improves power usage efficiency (PUE) and reduces the operational cost for the datacenter. It also replaces the noisy computer room air handler (CRAH) and introduces a modern, efficient liquid CDU with potential scope to eliminate chillers if racks can be cooled to a liquid temperature of 45° Celsius.

When storage is cooled through liquid in absence of fans, it must also support serviceability with no liquid leakage. It also creates a new requirement that many infrastructure teams are only beginning to grapple with: every component in the rack must operate natively within the same cooling architecture.

Storage as an active participant in system design

Storage design is no longer an isolated engineering problem. It is a direct variable in GPU utilization, system reliability, and operational efficiency. The solution is to redesign storage from the ground up for liquid-cooled, fanless environments. This is harder than it sounds. Traditional SSD design assumes airflow for thermal management and places components on both sides of a thermally insulated PCB. Neither assumption holds in a CDU-anchored architecture.

“SSDs need to be designed with a best-in-class thermal solution to specifically conduct heat from internal components efficiently and transfer it to fluid,” says Singh. “The design must include a low-resistance path for heat to transfer to a single cold plate attached on one side.”

At the same time, drives must support serviceability without liquid leakage during insertion and removal, and without degrading the thermal interface between the drive and the cold plate.

Solidigm has worked with NVIDIA to address SSD liquid-cooling challenges, such as hot swap-ability and single-side cooling, reducing the thermal footprint of storage within the shared liquid loop, and ensuring GPUs receive their proportional share of coolant.

“If storage is not designed for a liquid-cooled environment efficiently, it will either throttle to lower performance or require more liquid volume,” he says. “Which directly and indirectly leads to under-utilization of GPU capability.”

Alignment on standards and the path to interoperability

Solidigm is not working on this in isolation. The broader industry is coalescing around standards to ensure liquid-cooled AI systems are interoperable rather than a patchwork of custom solutions. The SNIA and the Open Compute Project (OCP) are the primary bodies driving this work.

Solidigm led the industry standard for liquid cooling in SFF-TA-1006 for the E1.S form factor and is an active participant in OCP work streams covering rack design, thermal management, and sustainability. Custom, bespoke cooling solutions for storage are giving way to standards-aligned, production-ready designs that integrate cleanly into liquid-cooled GPU platforms.

“There are several organizations involved in this work,” says Shadley, who is also a SNIA board member. “They started with component-level solutions, driven heavily by SNIA and the SFF TA TWG. The next level is solution-level work, which is currently being heavily driven by OCP.”

Solidigm’s roadmap is leading the way

The design rules for system level architectures have changed due to the advent of liquid and immersion cooling technologies that allow for more unique design rules and removal of some barriers. The ability for systems to drive NVMe SSD-only platforms also allows for the removal of the platter-based box constraint that exists with HDD solutions, Shadley says.

“Solidigm customers have an active and lead role in roadmap decisions for our products due to their deep technical alignment with the ecosystem,” he says. “We do not simply make and sell products, we integrate, co-design, co-develop, and innovate with and alongside our partners, customers, and their customers.”

Adds Singh: “Solidigm’s key strength is innovation and customer-inspired system level engineering. This will continue to aggressively lead the way for liquid cooling adoption for storage.”


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

The three disciplines separating AI agent demos from real-world deployment

Getting AI agents to perform reliably in production — not just in demos — is turning out to be harder than enterprises anticipated. Fragmented data, unclear workflows, and runaway escalation rates are slowing deployments across industries.

“The technology itself often works well in demonstrations,” said Sanchit Vir Gogia, chief analyst with Greyhound Research. “The challenge begins when it is asked to operate inside the complexity of a real organization.” 

Burley Kawasaki, who oversees agent deployment at Creatio, and team have developed a methodology built around three disciplines: data virtualization to work around data lake delays; agent dashboards and KPIs as a management layer; and tightly bounded use-case loops to drive toward high autonomy.

In simpler use cases, Kawasaki says these practices have enabled agents to handle up to 80-90% of tasks on their own. With further tuning, he estimates they could support autonomous resolution in at least half of use cases, even in more complex deployments.

“People have been experimenting a lot with proof of concepts, they’ve been putting a lot of tests out there,” Kawasaki told VentureBeat. “But now in 2026, we’re starting to focus on mission-critical workflows that drive either operational efficiencies or additional revenue.”

Why agents keep failing in production

Enterprises are eager to adopt agentic AI in some form or another — often because they’re afraid to be left out, even before they even identify real-world tangible use cases — but run into significant bottlenecks around data architecture, integration, monitoring, security, and workflow design. 

The first obstacle almost always has to do with data, Gogia said. Enterprise information rarely exists in a neat or unified form; it is spread across SaaS platforms, apps, internal databases, and other data stores. Some are structured, some are not. 

But even when enterprises overcome the data retrieval problem, integration is a big challenge. Agents rely on APIs and automation hooks to interact with applications, but many enterprise systems were designed long before this kind of autonomous interaction was a reality, Gogia pointed out. 

This can result in incomplete or inconsistent APIs, and systems can respond unpredictably when accessed programmatically. Organizations also run into snags when they attempt to automate processes that were never formally defined, Gogia said. 

“Many business workflows depend on tacit knowledge,” he said. That is, employees know how to resolve exceptions they’ve seen before without explicit instructions — but, those missing rules and instructions become startlingly obvious when workflows are translated into automation logic.

The tuning loop

Creatio deploys agents in a “bounded scope with clear guardrails,” followed by an “explicit” tuning and validation phase, Kawasaki explained. Teams review initial outcomes, adjust as needed, then re-test until they’ve reached an acceptable level of accuracy. 

That loop typically follows this pattern: 

  • Design-time tuning (before go-live): Performance is improved through prompt engineering, context wrapping, role definitions, workflow design, and grounding in data and documents. 

  • Human-in-the-loop correction (during execution): Devs approve, edit, or resolve exceptions. In instances where humans have to intervene the most (escalation or approval), users establish stronger rules, provide more context, and update workflow steps; or, they’ll narrow tool access. 

  • Ongoing optimization (after go-live): Devs continue to monitor exception rates and outcomes, then tune repeatedly as needed, helping to improve accuracy and autonomy over time. 

Kawasaki’s team applies retrieval-augmented generation to ground agents in enterprise knowledge bases, CRM data, and other proprietary sources. 

Once agents are deployed in the wild, they are monitored with a dashboard providing performance analytics, conversion insights, and auditability. Essentially, agents are treated like digital workers. They have their own management layer with dashboards and KPIs.

For instance, an onboarding agent will be incorporated as a standard dashboard interface providing agent monitoring and telemetry. This is part of the platform layer — orchestration, governance, security, workflow execution, monitoring, and UI embedding —  that sits “above the LLM,” Kawasaki said.

Users see a dashboard of agents in use and each of their processes, workflows, and executed results. They can “drill down” into an individual record (like a referral or renewal) that shows a step-by-step execution log and related communications to support traceability, debugging, and agent tweaking. The most common adjustments involve logic and incentives, business rules, prompt context, and tool access, Kawasaki said. 

The biggest issues that come up post-deployment: 

  • Exception handling volume can be high: Early spikes in edge cases often occur until guardrails and workflows are tuned. 

  • Data quality and completeness: Missing or inconsistent fields and documents can cause escalations; teams can identify which data to prioritize for grounding and which checks to automate.

  • Auditability and trust: Regulated customers, particularly, require clear logs, approvals, role-based access control (RBAC), and audit trails.

“We always explain that you have to allocate time to train agents,” Creatio’s CEO Katherine Kostereva told VentureBeat. “It doesn’t happen immediately when you switch on the agent, it needs time to understand fully, then the number of mistakes will decrease.” 

“Data readiness” doesn’t always require an overhaul

When looking to deploy agents, “Is my data ready?,” is a common early question. Enterprises know data access is important, but can be turned off by a massive data consolidation project. 

But virtual connections can allow agents access to underlying systems and get around typical data lake/lakehouse/warehouse delays. Kawasaki’s team built a platform that integrates with data, and is now working on an approach that will pull data into a virtual object, process it, and use it like a standard object for UIs and workflows. This way, they don’t have to “persist or duplicate” large volumes of data in their database. 

This technique can be helpful in areas like banking, where transaction volumes are simply too large to copy into CRM, but are “still valuable for AI analysis and triggers,” Kawasaki said.

Once integrations and virtual objects are established, teams can evaluate data completeness, consistency, and availability, and identify low-friction starting points (like document-heavy or unstructured workflows). 

Kawasaki emphasized the importance of “really using the data in the underlying systems, which tends to actually be the cleanest or the source of truth anyway.” 

Matching agents to the work

The best fit for autonomous (or near-autonomous) agents are high-volume workflows with “clear structure and controllable risk,” Kawasaki said. For instance, document intake and validation in onboarding or loan preparation, or standardized outreach like renewals and referrals.

“Especially when you can link them to very specific processes inside an industry — that’s where you can really measure and deliver hard ROI,” he said. 

For instance, financial institutions are often siloed by nature. Commercial lending teams perform in their own environment, wealth management in another. But an autonomous agent can look across departments and separate data stores to identify, for instance, commercial customers who might be good candidates for wealth management or advisory services.

“You think it would be an obvious opportunity, but no one is looking across all the silos,” Kawasaki said. Some banks that have applied agents to this very scenario have seen “benefits of millions of dollars of incremental revenue,” he claimed, without naming specific institutions. 

However, in other cases — particularly in regulated industries — longer-context agents are not only preferable, but necessary. For instance, in multi-step tasks like gathering evidence across systems, summarizing, comparing, drafting communications, and producing auditable rationales.

“The agent isn’t giving you a response immediately,” Kawasaki said. “It may take hours, days, to complete full end-to-end tasks.” 

This requires orchestrated agentic execution rather than a “single giant prompt,” he said. This approach breaks work down into deterministic steps to be performed by sub-agents. Memory and context management can be maintained across various steps and time intervals. Grounding with RAG can help keep outputs tied to approved sources, and users have the ability to dictate expansion to file shares and other document repositories.

This model typically doesn’t require custom retraining or a new foundation model. Whatever model enterprises use (GPT, Claude, Gemini), performance improves through prompts, role definitions, controlled tools, workflows, and data grounding, Kawasaki said. 

The feedback loop puts “extra emphasis” on intermediate checkpoints, he said. Humans review intermediate artifacts (such as summaries, extracted facts, or draft recommendations) and correct errors. Those can then be converted into better rules and retrieval sources, narrower tool scopes, and improved templates. 

“What is important for this style of autonomous agent, is you mix the best of both worlds: The dynamic reasoning of AI, with the control and power of true orchestration,” Kawasaki said.

Ultimately, agents require coordinated changes across enterprise architecture, new orchestration frameworks, and explicit access controls, Gogia said. Agents must be assigned identities to restrict their privileges and keep them within bounds. Observability is critical; monitoring tools can record task completion rates, escalation events, system interactions, and error patterns. This kind of evaluation must be a permanent practice, and agents should be tested to see how they react when encountering new scenarios and unusual inputs. 

“The moment an AI system can take action, enterprises have to answer several questions that rarely appear during copilot deployments,” Gogia said. Such as: What systems is the agent allowed to access? What types of actions can it perform without approval? Which activities must always require a human decision? How will every action be recorded and reviewed?

“Those [enterprises] that underestimate the challenge often find themselves stuck in demonstrations that look impressive but cannot survive real operational complexity,” Gogia said.