Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia researchers have introduced a new technique that dramatically reduces how much memory large language models need to track conversation history — by as much as 20x — without modifying the model itself. The method, called KV Cache Transform Coding (KVTC), applies ideas from media compression formats like JPEG to shrink the key-value cache behind multi-turn AI systems, lowering GPU memory demands and speeding up time-to-first-token by up to 8x.

For enterprise AI applications that rely on agents and long contexts, this translates to reduced GPU memory costs, better prompt reuse, and up to an 8x reduction in latency by avoiding the need to recompute dropped KV cache values.

Serving large language models at scale requires managing a massive amount of data, especially for multi-turn conversations and long coding sessions. Every time a user adds to a prompt, the system relies on stored memory to avoid recomputing the entire conversation history from scratch.

However, this memory footprint grows rapidly, creating a severe bottleneck for latency and infrastructure costs.

Why KV cache becomes a bottleneck at scale

To power multi-turn AI applications like coding assistants or chat apps, large language models rely on a mechanism known as the key-value (KV) cache. This cache stores the hidden numerical representations for every previous token in a conversation. Because the model remembers the past conversation, it does not have to redundantly re-process the entire chat history each time the user submits a new prompt.

However, for AI applications with long context tasks, this cache can easily balloon to multiple gigabytes. As models scale up and generate increasingly long reasoning chains, the KV cache becomes a critical bottleneck for system throughput and latency.

This creates a difficult challenge for production environments. Because LLMs are highly memory-bound during inference, serving multiple users simultaneously is constrained by GPU memory exhaustion rather than computation time. “Effective KV cache management becomes critical, as idle caches must be quickly offloaded from GPU memory to accommodate other users, and quickly restored for resumed conversations,” Adrian Lancucki, Senior Deep Learning Engineer at Nvidia, told VentureBeat. “These infrastructure costs are now reflected in commercial pricing (e.g., as ‘prompt caching’) with additional charges for caching.” 

Even compromise solutions, like offloading the cache to lower-tier storage like CPU memory or SSDs, introduce significant data transfer overheads that can saturate network bandwidth and create bottlenecks.

One common solution is to compress the KV cache so that it takes up less memory. However, existing solutions often fall short of solving the problem holistically. Tools designed to compress caches for network transmission achieve low compression rates. Other compression methods require resource-intensive calculations on the fly for every single user prompt. Meanwhile, popular techniques like quantization or sparsification can introduce latency and accuracy drops or require making permanent changes to the model’s weights, which limits their practicality.

In their paper, the Nvidia researchers note that existing approaches “seldom exploit the strong low-rank structure of KV tensors.” This means that despite its huge number of dimensions and gigabytes of size, the actual underlying information in the KV cache is highly correlated and can be accurately represented using far fewer variables. Exploiting this characteristic is what KVTC focuses on.

Borrowing tricks from media codecs

At a high level, KVTC tackles the AI memory bottleneck by borrowing a proven concept from classical media: transform coding, the methodology that powers familiar image and video compression formats like JPEG. The framework shrinks the cache footprint through a fast, multi-step process that executes between inference phases to avoid slowing down the actual token generation. “This ‘media compression’ approach is advantageous for enterprise deployment because it is non-intrusive: it requires no changes to model weights or code and operates close to the transportation layer,” Lancucki said.

First, KVTC uses principal component analysis (PCA) to align the features of the KV cache data based on their importance. PCA is a statistical technique often used in machine learning to make models more efficient by isolating the most critical features of the data and stripping away redundancies. This part of the process is performed only once during an initial calibration phase for each model. Because the PCA alignment matrix is computed offline and reused, it does not slow down the compression process at inference time for individual user prompts.

Next, the system uses a dynamic programming algorithm to automatically budget how much memory each specific data dimension actually needs. The most critical principal components get high precision, while the trailing, less important components receive fewer bits or are assigned zero bits and dropped entirely.

Finally, the pipeline takes this optimized, quantized data and packs it into a byte array, running it through an entropy coder called DEFLATE. Because this step is executed in parallel directly on the GPU using Nvidia’s nvCOMP library, it operates at very high speeds.

To decompress the data when the user returns, KVTC simply performs the computations in reverse. To speed up the process, it performs the heavy lifting of decompression in chunks, layer-by-layer. This allows the AI model to begin computing the next response early using the first decompressed chunk while the subsequent chunks are being decompressed in the background.

20x compression, less than 1% accuracy penalty

Nvidia researchers tested KVTC on a diverse roster of models ranging from 1.5B to 70B parameters, including the Llama 3 family, Mistral NeMo, and the reasoning-heavy R1-distilled Qwen 2.5 models. They evaluated these models on a variety of benchmarks, including complex math and coding challenges like MATH-500 and LiveCodeBench, as well as intensive long-context retrieval tasks like “Needle In A Haystack” and key-value retrieval.

They pitted KVTC against several popular baselines: token eviction methods (e.g., H2O and TOVA), heavy quantization techniques (e.g., KIVI and GEAR), and xKV (a prompt compression technique based on singular value decomposition).

At an effective 20x compression ratio, KVTC consistently maintained performance within less than one percentage point of accuracy penalty in comparison to the original, uncompressed vanilla models across most tasks. When researchers pushed the system to extreme limits of up to 32x and 64x compression, KVTC held its ground remarkably well.

By contrast, popular baselines like KIVI and GEAR began to suffer massive accuracy degradation at just a 5x compression ratio, particularly on long-context tasks. Standard cache eviction methods like H2O and TOVA proved entirely inadequate as generic compressors, effectively breaking down when asked to retrieve deep contextual information.

Consider the deployment of a smaller reasoning model like Qwen 2.5 1.5B for a coding assistant. Normally, this model requires 29 KB of memory for every single token. Using an 8x compression setting, KVTC shrank that footprint to roughly 3.2 KB per token, while suffering a negligible 0.3 percentage point drop in coding accuracy. 

For enterprise architects, deciding when to deploy this technique depends heavily on the use case. “KVTC is optimized for long-context, multi-turn scenarios,” Lancucki said. He pointed to coding assistants, iterative agentic reasoning workflows — particularly when waiting for high-latency tool outputs — and iterative RAG as ideal applications. “However, the users should skip KVTC for short conversations,” he added, because the uncompressed sliding window of the newest tokens dominates the sequence in shorter interactions, preventing meaningful compression ratios.

KVTC is highly portable and an optimized implementation will soon be integrated into the KV Block Manager (KVBM) within the Dynamo framework, making it compatible with popular open-source inference engines like vLLM. 

Most importantly for user experience, KVTC considerably reduces the time to first token (TTFT), the delay between sending a prompt and the model generating the first response token. On an 8,000-token prompt, a vanilla 12B model running on an Nvidia H100 GPU takes roughly 3 seconds to recompute the history from scratch. Meanwhile a system can decompress the KVTC cache in just 380 milliseconds, delivering up to an 8x reduction in the time it takes to generate the first token.

Because KVTC does not alter how the model pays attention to tokens, it is theoretically compatible with token eviction methods like Dynamic Memory Sparsification (DMS), another advanced compression technique. DMS is an autoregressive token eviction method that optimizes memory by identifying and dropping the least important tokens from the context window entirely. 

“In principle, KVTC is complementary to DMS,” Lancucki stated. “While DMS evicts individual tokens along the time axis, KVTC compresses the data at each position separately.” However, he cautioned that while they target different dimensions, “it remains to be tested what compression ratios can be achieved with KVTC on sparsified caches.”

As models continue to scale natively to multi-million token context windows, the need for robust memory management will only grow. “Given the structural similarities and recurring patterns in KV caches across various model architectures, the emergence of a dedicated, standardized compression layer is probable,” Lancucki said. Supported by hardware advancements, AI infrastructure could soon treat KV cache compression as an invisible, standardized layer, much like video compression is to streaming today.

Rethinking AEO when software agents navigate the web on behalf of users

For more than two decades, digital businesses have relied on a simple assumption: When someone interacts with a website, that activity reflects a human making a conscious choice. Clicks are treated as signals of interest. Time on page is assumed to ind…

How LinkedIn replaced five feed retrieval systems with one LLM model, at 1.3 billion-user scale

LinkedIn’s feed reaches more than 1.3 billion members — and the architecture behind it hadn’t kept pace. The system had accumulated five separate retrieval pipelines, each with its own infrastructure and optimization logic, serving different slices of what users might want to see. Engineers at the company spent the last year tearing that apart and replacing it with a single LLM-based system. The result, LinkedIn says, is a feed that understands professional context more precisely and costs less to run at scale.

The redesign touched three layers of the stack: how content is retrieved, how it’s ranked, and how the underlying compute is managed. Tim Jurka, vice president of engineering at LinkedIn, told VentureBeat the team ran hundreds of tests over the past year before reaching a milestone that, he says, reinvented a large chunk of its infrastructure.

“Starting from our entire system for retrieving content, we’ve moved over to using really large-scale LLMs to understand content much more richly on LinkedIn and be able to match it much in a much more personalized way to members,” Jurka said. “All the way to how we rank content, using really, really large sequence models, generative recommenders, and combining that end-to-end system to make things much more relevant and meaningful for members.”

One feed, 1.3 billion members

The core challenge, Jurka said, is two-sided: LinkedIn has to match members’ stated professional interests — their title, skills, industry — to their actual behavior over time, and it has to surface content that goes beyond what their immediate network is posting. Those two signals frequently pull in different directions.

People use LinkedIn in different ways: some look to connect with others in their industry, others prioritize thought leadership, and job seekers and recruiters use it to find candidates. 

How LinkedIn unified five pipelines into one

LinkedIn has spent more than 15 years building AI-driven recommendation systems, including prior work on job search and people search. LinkedIn’s feed, the one that greets you when you open the website, was built on a heterogeneous architecture, the company said in a blog post. Content fed to users came from various sources, including a chronological index of a user’s network, geographic trending topics, interest-based filtering, industry-specific content, and other embedding-based systems.

The company said this method meant each source had its own infrastructure and optimization strategy. But while it worked, maintenance costs soared. Jurka said using LLMs to scale out its new recommendation algorithm also meant updating the surrounding architecture around the feed. 

“There’s a lot that goes into that, including how we maintain that kind of member context in a prompt, making sure we provide the right data to hydrate the model, profile data, recent activity data, etc,” he said. “The second is how you actually sample the most meaningful kind of data points to then fine-tune the LLM.”

LinkedIn tested different iterations of the data mix in an offline testing environment. 

One of LinkedIn’s first hurdles in revamping its retrieval system revolved around converting its data into text for LLMs to process. To do this, LinkedIn built a prompt library that lets them create templated sequences. For posts, LinkedIn focused on format, author information, engagement counts, article metadata, and the post’s text. For members, they incorporated profile data, skills, work history, education and “a chronologically ordered sequence of posts they’ve previously engaged with.”

One of the most consequential findings from that testing phase involved how LLMs handle numbers. When a post had, say, 12,345 views, that figure appeared in the prompt as “views:12345,” and the model treated it like any other text token, stripping it of its significance as a popularity signal. To fix this, the team broke engagement counts into percentile buckets and wrapped them in special tokens, so the model could distinguish them from unstructured text. The intervention meaningfully improved how the system weighs post reach.

Teaching the feed to read professional history as a sequence

Of course, if LinkedIn wants its feed to feel more personal and posts reach the right audience, it needs to reimagine how it ranks posts, too. Traditional ranking models, the company said, misunderstand how people engage with content: that it isn’t random but follows patterns emerging from someone’s professional journey. 

LinkedIn built a proprietary Generative Recommender (GR) model for its feed that treats interaction history as a sequence, or “a professional story told through the posts you’ve engaged with over time.”

“Instead of scoring each post in isolation, GR processes more than a thousand of your historical interactions to understand temporal patterns and long-term interests,” LinkedIn’s blog said. “As with retrieval, the ranking model relies on professional signals and engagement patterns, never demographic attributes, and is regularly audited for equitable treatment across our member base.”

The compute cost of running LLMs at LinkedIn’s scale

With a revitalized data pipeline and feed, LinkedIn faced another problem: GPU cost. 

LinkedIn invested heavily in new training infrastructure to reduce how much it leans on GPUs. The biggest architectural shift was disaggregating CPU-bound feature processing from GPU-heavy model inference — keeping each type of compute doing what it’s suited for rather than bottlenecking on GPU availability. The team also wrote custom C++ data loaders to cut the overhead that Python multiprocessing was adding, and built a custom Flash Attention variant to optimize attention computation during inference. Checkpointing was parallelized rather than serialized, which helped squeeze more out of available GPU memory.

“One of the things we had to engineer for was that we needed to use a lot more GPUs than we’d like to,” Jurka said. “Being very deliberate about how you coordinate between CPU and GPU workloads because the nice thing about these kinds of LLMs and prompt context that we use to generate embeddings is you can dynamically scale them.” 

For engineers building recommendation or retrieval systems, LinkedIn’s redesign offers a concrete case study in what replacing fragmented pipelines with a unified embedding model actually requires: rethinking how numerical signals are represented in prompts, separating CPU and GPU workloads deliberately, and building ranking models that treat user history as a sequence rather than a set of independent events. The lesson isn’t that LLMs solve feed problems — it’s that deploying them at scale forces you to solve a different class of problems than the ones you started with.

Y Combinator-backed Random Labs launches Slate V1, claiming the first ‘swarm-native’ coding agent

The software engineering world is currently wrestling with a fundamental paradox of the AI era: as models become more capable, the “systems problem” of managing them has become the primary bottleneck to real-world productivity. While a developer might have access to the raw intelligence of a frontier model, that intelligence often degrades the moment a task requires a long horizon or a deep context window.

But help appears to be on the way: San Francisco-based, Y Combinator-backed startup Random Labs has officially launched Slate V1, described as the industry’s first “swarm native” autonomous coding agent designed to execute massively parallel, complex engineering tasks.

Emerging from an open beta, the tool utilizes a “dynamic pruning algorithm” to maintain context in large codebases while scaling output to enterprise complexity. Co-founded by Kiran and Mihir Chintawar in 2024, the company aims to bridge the global engineering shortage by positioning Slate as a collaborative tool for the “next 20 million engineers” rather than a replacement for human developers.

With the release of Slate V1, the team at Random Labs is attempting to architect a way out of this zone by introducing the first “swarm-native” agentic coding environment. Slate is not merely a wrapper or a chatbot with file access; it is an implementation of a “hive mind” philosophy designed to scale agentic work with the complexity of a human organization.

By leveraging a novel architectural primitive called Thread Weaving, Slate moves beyond the rigid task trees and lossy compaction methods that have defined the first generation of AI coding assistants.

Strategy: Action space

At the heart of Slate’s effectiveness is a deep engagement with Recursive Language Models (RLM).

In a traditional setup, an agent might be asked to “fix a bug,” a prompt that forces the model to juggle high-level strategy and low-level execution simultaneously.

Random Labs identifies this as a failure to tap into “Knowledge Overhang”—the latent intelligence a model possesses but cannot effectively access when it is tactically overwhelmed.

Slate solves this by using a central orchestration thread that essentially “programs in action space”. This orchestrator doesn’t write the code directly; instead, it uses a TypeScript-based DSL to dispatch parallel worker threads to handle specific, bounded tasks.

This creates a clear separation between the “kernel”—which manages the execution graph and maintains strategic alignment—and the worker “processes” that execute tactical operations in the terminal.

By mapping onto an OS-style framework, inspired by Andrej Karpathy’s “LLM OS” concept, Slate is able to treat the limited context window of a model as precious RAM, actively, intelligently managing what is retained and what is discarded.

Episodic memory and the swarm

The true innovation of the “Thread Weaving” approach lies in how it handles memory. Most agents today rely on “compaction,” which is often just a fancy term for lossy compression that risks dropping critical project state. Slate instead generates “episodes”.

When a worker thread completes a task, it doesn’t return a sprawling transcript of every failed attempt; it returns a compressed summary of the successful tool calls and conclusions.

Because these episodes share context directly with the orchestrator rather than relying on brittle message passing, the system maintains a “swarm” intelligence.

This architecture allows for massive parallelism. A developer can have Claude Sonnet orchestrating a complex refactor while GPT-5.4 executes code, and GLM 5—a favorite for its agentic search capabilities—simultaneously researches library documentation in the background. It’s a similar approach taken by Perplexity with its new Computer multi-model agent

By selecting the “right model for the job,” Slate ensures that users aren’t overspending on intelligence for simple tactical steps while still benefiting from the strategic depth of the world’s most powerful models.

The business of autonomy

From a commercial perspective, Random Labs is navigating the early beta period with a mix of transparency and strategic ambiguity.

While the company has not yet published a fixed-price subscription sheet, the Slate CLI documentation confirms a shift toward a usage-based credit model.

Commands like /usage and /billing allow users to monitor their credit burn in real-time, and the inclusion of organization-level billing toggles suggests a clear focus on professional engineering teams rather than solo hobbyists.

There is also a significant play toward integration. Random Labs recently announced that direct support for OpenAI’s Codex and Anthropic’s Claude Code is slated for release next week.

This suggests that Slate isn’t trying to compete with these models’ native interfaces, but rather to act as the superior orchestration layer that allows engineers to use all of them at once, safely and cost-effectively.

I’ve reached out to

Architecturally, the system is designed to maximize caching through subthread reuse, a “novel context engineering” trick that the team claims keeps the swarm approach from becoming a financial burden for users.

Stability AI

Perhaps the most compelling argument for the Slate architecture is its stability. In internal testing, an early version of this threading system managed to pass 2/3 of the tests on the make-mips-interpreter task within the Terminal Bench 2.0 suite.

This is a task where even the newest frontier models, like Opus 4.6, often succeed less than 20% of the time when used in standard, non-orchestrated harnesses.

This success in a “mutated” or changing environment is what separates a tool from a partner. According to Random Labs’ documentation, one fintech founder in NYC described Slate as their “best debugging tool,” a sentiment that echoes the broader goal of Random Labs: to build agents that don’t just complete a prompt, but scale like an organization.

As the industry moves past simple “chat with your code” interfaces, the “Thread Weaving” of Slate V1 offers a glimpse into a future where the primary role of the human engineer is to direct a hive mind of specialized models, each working in concert to solve the long-horizon problems of modern software.

Anthropic gives Claude shared context across Microsoft Excel and PowerPoint, enabling reusable workflows in multiple applications

Anthropic has upgraded its Claude AI model with new capabilities for Microsoft Excel and PowerPoint, marking a strategic move to expand its enterprise footprint and potentially challenging Microsoft’s newly launched Copilot Cowork — which Claude also partially powers. 

The updated add-ins are available to Mac and Windows users on paid Claude plans starting today, March 11.

Anthropic is also expanding how enterprises can deploy the tools.

Claude for Excel and Claude for PowerPoint can now be accessed either through a Claude account or through an existing LLM gateway routing to Claude models on Amazon Bedrock, Google Cloud Vertex AI or Microsoft Foundry.

That gives enterprises more flexibility to use the add-ins within cloud and compliance setups they may already have in place.

Shared context across Office apps

Starting March 11, paid Claude users on Mac and Windows can access a new beta experience in which Claude for Excel and Claude for PowerPoint share the full context of a user’s conversation with the AI model between the two applications — no need for manually copying and pasting it over.

That means Claude can carry information, instructions and task history between an open spreadsheet and an open presentation in a single continuous session.

For example, Claude can write formulas to extract data from an Excel workbook and immediately apply it to a stylized PowerPoint slide in the same session.

“In practice: a financial analyst can ask Claude to pull comparable company financials from an open workbook, build out a trading comps table in Excel, drop the valuation summary into the pitch deck, and draft the email to the MD—without switching tabs or re-explaining the dataset at each step,” Anthropic said in a press release. 

This builds on Anthropic’s release of a Claude plugin for Excel back in October 2025.

Repeatable workflows inside applications 

A central feature of this launch is Skills, which allows teams to build and save repeatable workflows directly inside the Excel and PowerPoint sidebars.

Rather than re-uploading references or re-prompting instructions, users can save standardized processes—such as specific variance analyses or approved slide templates—as one-click actions available to the entire organization.

That could include workflows for recurring financial analysis, preparing presentations in a preferred house style or running common review steps that would otherwise need to be rewritten as prompts each time.

Anthropic said every Skill, whether personal or organization-wide, will work inside the add-ins the same way MCP connectors do.

“Workflows that previously lived in one person’s head become one-click actions available to the whole organization,” the company said. 

Anthropic distinguishes these Skills from Instructions, which let users set persistent preferences across the add-ins, such as preferred number formatting in Excel or presentation-writing rules in PowerPoint.

Anthropic is also shipping a preloaded starter set of Skills, including:

  • Excel: Auditing models for formula errors, populating DCF and LBO templates, and cleaning messy data ranges.

  • PowerPoint: Building competitive landscape decks and reviewing investment banking materials for narrative alignment.

Similarly, Microsoft’s new Copilot Cowork capability introduced on Monday enables enterprise users to deploy agents to complete tasks across Microsoft applications such as Excel and PowerPoint. 

The software giant openly stated it was built in conjunction with Anthropic, which also released its own stand-alone Claude Cowork application for Mac and Windows earlier this year offering a way for Claude to access, edit, create and move information between files on a user’s computer, autonomously, at the user’s direction.

Previously, even with autonomous tools like the standalone Claude Cowork app, users often had to ask the AI to complete tasks in separate steps for each application. Now, Claude maintains a continuous session that reads live data and writes formulas across both apps simultaneously.

Battle of the enterprise app agents

Ever since the launch of Claude Cowork earlier this year, Anthropic has been making a case to be the chat and productivity platform of choice for enterprises.

Competitors like Google, with its close association with Google Workspace, which includes Gmail and Google Docs, and Microsoft, with its continued leadership in the Office suite, can directly bring AI capabilities to users’ workflows. 

Anthropic did not present the new Skills feature as equivalent to the more autonomous, agentic behavior Microsoft is now emphasizing with its own Copilot Cowork.

But the release does show Anthropic steadily expanding beyond chatbot use cases and into more structured, repeatable work inside the applications many business users already rely on.

Anthropic, through Claude Cowork, Claude Code and the Claude model family, has seeped into many organizations’ systems, using its high performance in coding benchmarks and general knowledge to navigate a computer better and complete knowledge work rapidly, at scale, with high quality.

OpenClaw, the open source AI agent that has taken the developer world by storm, owes much of its existence to Claude Code. 

The result is another sign that the battle over enterprise AI is no longer just about which model performs best on benchmarks. It is increasingly about what AI tools and systems enterprises trust to get real work done across their existing applications, files, and workflows.

Google finds that AI agents learn to cooperate when trained against unpredictable opponents

Training standard AI models against a diverse pool of opponents — rather than building complex hardcoded coordination rules — is enough to produce cooperative multi-agent systems that adapt to each other on the fly. That’s the finding from Google’s Paradigms of Intelligence team, which argues the approach offers a scalable and computationally efficient blueprint for enterprise multi-agent deployments without requiring specialized scaffolding.

The technique works by training an LLM agent via decentralized reinforcement learning against a mixed pool of opponents — some actively learning, some static and rule-based. Instead of hardcoded rules, the agent uses in-context learning to read each interaction and adapt its behavior in real time.

Why multi-agent systems keep fighting each other

The AI landscape is rapidly shifting away from isolated systems toward a fleet of agents that must negotiate, collaborate, and operate in shared spaces simultaneously. In multi-agent systems, the success of a task depends on the interactions and behaviors of multiple entities as opposed to a single agent.

The central friction in these multi-agent systems is that their interactions frequently involve competing goals. Because these autonomous agents are designed to maximize their own specific metrics, ensuring they don’t actively undermine one another in these mixed-motive scenarios is incredibly difficult.

Multi-agent reinforcement learning (MARL) tries to address this problem by training multiple AI agents operating, interacting, and learning in the same shared environment at the same time. However, in real-world enterprise architectures, a single, centralized system rarely has visibility over or controls every moving part. Developers must rely on decentralized MARL, where individual agents must figure out how to interact with others while only having access to their own limited, local data and observations.

One of the main problems with decentralized MARL is that the agents frequently get stuck in suboptimal states as they try to maximize their own specific rewards. The researchers refer to it as “mutual defection,” based on the Prisoner’s Dilemma puzzle used in game theory. For example, think of two automated pricing algorithms locked in a destructive race to the bottom. Because each agent optimizes strictly for its own selfish reward, they arrive at a stalemate where the broader enterprise loses.

Another problem is that traditional training frameworks are designed for stationary environments, meaning the rules of the game and the behavior of the environment are relatively fixed. In a multi-agent system, from the perspective of any single agent, the environment is fundamentally unpredictable and constantly shifting because the other agents are simultaneously learning and adapting their own policies.

While enterprise developers currently rely on frameworks that use rigid state machines, these methods often hit a scalability wall in complex deployments.

“The primary limitation of hardcoded orchestration is its lack of flexibility,” Alexander Meulemans, co-author of the paper and Senior Research Scientist on Google’s Paradigms of Intelligence team, told VentureBeat. “While rigid state machines function adequately in narrow domains, they can fail to scale as the scope and complexity of agent deployments broaden. Our in-context approach complements these existing frameworks by fostering adaptive social behaviors that are deeply embedded during the post-training phase.”

What this means for developers using LangGraph, CrewAI, or AutoGen

Frameworks like LangGraph require developers to explicitly define agents, state transitions, and routing logic as a graph. LangChain describes this approach as equivalent to a state machine, where agent nodes and their connections represent states and transition matrices. Google’s approach inverts that model: rather than hardcoding how agents should coordinate, it produces cooperative behavior through training, leaving the agents to infer coordination rules from context.

The researchers prove that developers can achieve advanced, cooperative multi-agent systems using the exact same standard sequence modeling and reinforcement learning techniques that already power today’s foundation models.

The team validated the concept using a new method called Predictive Policy Improvement (PPI), though Meulemans notes the underlying principle is model-agnostic.

“Rather than training a small set of agents with fixed roles, teams should implement a ‘mixed pool’ training routine,” Meulemans said. “Developers can reproduce these dynamics using standard, out-of-the-box reinforcement learning algorithms (such as GRPO).”

By exposing agents to interact with diverse co-players (i.e., varying in system prompts, fine-tuned parameters, or underlying policies) teams create a robust learning environment. This produces strategies that are resilient when interacting with new partners and ensures that multi-agent learning leads toward stable, long-term cooperative behaviors.

How the researchers proved it works

To build agents that can successfully deduce a co-player’s strategy, the researchers created a decentralized training setup where the AI is pitted against a highly diverse, mixed pool of opponents composed of actively learning models and static, rule-based programs. This forced diversity requires the agent to dynamically figure out who it is interacting with and adapt its behavior on the fly, entirely from the context of the interaction.

For enterprise developers, the phrase “in-context learning” often triggers concerns about context window bloat, API costs, and latency, especially when windows are already packed with retrieval-augmented generation (RAG) data and system prompts. However, Meulemans clarifies that this technique focuses on efficiency rather than token count. “Our method focuses on optimizing how agents utilize their available context during post-training, rather than strictly demanding larger context windows,” he said. By training agents to parse their interaction history to infer strategies, they use their allocated context more adaptively without requiring longer context windows than existing applications.

Using the Iterated Prisoner’s Dilemma (IPD) as a benchmark, the researchers achieved robust, stable cooperation without any of the traditional crutches. There are no artificial separations between meta and inner learners, and no need to hardcode assumptions about how the opponent’s algorithm functions. Because the agent is adapting in real-time while also updating its core foundation model weights over time across many interactions, it effectively occupies both roles simultaneously. In fact, the agents performed better when given no information about their adversaries and were forced to adapt to their behavior through trial and error. 

The developer’s role shifts from rule writer to architect

The researchers say that their work bridges the gap between multi-agent reinforcement learning and the training paradigms of modern foundation models. “Since foundation models naturally exhibit in-context learning and are trained on diverse tasks and behaviors, our findings suggest a scalable and computationally efficient path for the emergence of cooperative social behaviors using standard decentralized learning techniques,” they write.

As relying on in-context behavioral adaptation becomes the standard over hardcoding strict rules, the human element of AI engineering will fundamentally shift. “The AI application developer’s role may evolve from designing and managing individual interaction rules to designing and providing high-level architectural oversight for training environments,” Meulemans said. This transition elevates developers from writing narrow rulebooks to taking on a strategic role, defining the broad parameters that ensure agents learn to be helpful, safe, and collaborative in any situation.

Google upgrades Gemini for Workspace allowing it to pull data from multiple apps to create Docs, Sheets, Slides and more

Lest you thought Microsoft would have all the fun introducing new AI features for white collar enterprise work this week with its Copilot Cowork announcement yesterday, Google is here to take back the spotlight.

The search giant and, increasingly, AI leader today announced a sweeping series of updates to its Gemini AI models embedded into Google Workspace — the productivity suite of cloud-based apps including Drive, Docs, Sheets, Slides, and more. They’re being made available both to individual consumers and enterprises, though you’ll need an AI Pro ($20 per month) or higher subscription plan for the former, and your enterprise will need to be enrolled in the “Gemini Alpha” program and have the features switched on by an administrator.

The biggest news: it’s now possible to have Gemini automatically create these file types from a single text prompt and fill them out with information gathered from other files and apps throughout you, the user’s Google Workspace, including emails, chats, files, and the open web via Google Search.

By synthesizing information across these disparate apps and experiences, Gemini acts as an assistant capable of drafting, iterating, and perfecting complex, finished, professional-grade content in seconds, effectively ending the era of the manual “dig” for information.

The message is simple: the era of searching across multiple windows, tabs, files and folders for your information is over — Gemini will do it and put it all together for you in a nearly finished product, simply from a plain English (or language of your choosing) natural language text prompt!

And best of all for enterprise technical leaders — this feature is now provided first-party by Google themselves, short-cutting or eliminating large parts of the need to build their own orchestration system (if they don’t wish to pursue this route and have most of their data in these Google applications, albeit).

Prompt to document, spreadsheet, slide deck and more

The rollout spans the entire Workspace suite, with specific features tailored to the unique demands of each application:

Google Docs: “Help me create”: The new “Help me create” experience allows users to generate fully formatted first drafts by simply describing their goal. Because Gemini can access Drive, Gmail, and Chat, a user can prompt: “Draft a newsletter using the meeting minutes from my January HOA meeting and the list of upcoming events”. The result is a contextualized document that includes smart chips and structured formatting, rather than a generic template.

Google Sheets gets a 9x speed boost: The most striking efficiency claim in this release involves “Fill with Gemini”. A 95-participant study conducted by Google found that using Gemini to auto-populate tables with categorized or summarized data was 9x faster than manual entry for 100-cell tasks. Users can now describe a goal—like optimizing a weekly schedule to maximize profit while balancing staff skills—and Gemini handles the multi-step construction from start to finish.

Google Slides gets narrative-first design: Slides is receiving updates that allow Gemini to act as a design collaborator. It can now turn rough brainstorm sketches into editable diagrams and generate slide layouts that balance visual weight and hierarchy while matching the theme of an existing deck. Google also teased an upcoming feature that will generate an entire presentation from a single prompt based on a reference document.

Google Drive: The Knowledge Base: Perhaps the most fundamental shift is in Google Drive, which is moving from “passive storage” to an “active knowledge base” that compiles data from multiple files and file types stored there and allows Gemini to access it and move it around as needed in creating and editing projects.

  • AI Overviews: Similar to Google Search, Drive will now provide a summarized answer with citations at the top of search results, removing the need to open multiple files to find a specific detail.

  • Ask Gemini in Drive: This allows for complex, cross-file queries, such as comparing multiple catering proposals or synthesizing months of research on a specific topic.

  • Projects: Users can now save curated lists of sources as “projects” to share with others, maintaining built-in security and compliance controls.

Not just Gemini — numerous Google AI models power this experience

While the user interface of the new Workspace updates is designed for simplicity, the backend architecture relies on a specialized ensemble of Google’s most advanced AI models.

These features are not powered by a single general-purpose engine but rather a suite of task-specific models developed by Google DeepMind and Google Research.

  • Gemini 3 Flash & Deep Think: The core text generation, summarization, and reasoning capabilities—such as “Help me create” in Docs and “AI Overviews” in Drive—are driven by the Gemini 3 family. Specifically, Gemini 3 Flash is utilized for high-speed summarization, while Gemini 3 Deep Think handles more complex reasoning tasks involving science, research, and engineering.

  • Google Research OR-Tools: To solve the “advanced optimization problems” in Sheets, such as complex employee scheduling or budget maximization, Google integrates its OR-Tools (Operations Research tools) alongside DeepMind’s logic models.

  • Nano Banana 2 (Gemini 3 Flash Image): The professional layouts and editable diagrams found in Slides are generated by Nano Banana 2, a state-of-the-art multi-image-to-image model. This model handles everything from text-to-image generation to complex style transfers, ensuring that new slides match a company’s existing brand aesthetics.

  • Veo & Lyria 3: For multimedia integration, Google utilizes Veo for high-fidelity video generation and Lyria 3 for professional-grade music and vocal arrangements, both of which include SynthID watermarking for AI identification.

Licensing and availability

Google is positioning these features as premium additions to its ecosystem. The new Gemini capabilities are rolling out in beta starting today.

Feature Set

Target Audience

Availability

Google AI Ultra & Pro

Individual power users

English (Global for Docs, Sheets, Slides)

Gemini Alpha

Business/Enterprise customers

English (Global for Docs, Sheets, Slides)

Google Drive Updates

U.S. Customers (Initial)

English (U.S. Only for now)

The Gemini Alpha program is a pre-release initiative that allows Google Workspace administrators to grant users early access to experimental AI features before they are made generally available.

To participate, your organization must have a supported subscription — such as Business Standard, Business Plus, Enterprise, or Education tiers — along with Google AI Pro or Ultra add-ons.

Participation is managed entirely by your Google Workspace administrator, as the program is turned off by default. An admin can manually enable accessvia the Google Admin console by navigating to Menu > Generative AI > Gemini for Workspace and selecting the Alpha features panel. Once enabled, eligible users can begin reimagining their content creation journeys using these next-generation tools.

For business users, Google emphasizes that these features are built with “enterprise-grade data protections,” ensuring that sensitive company data used to ground Gemini’s responses remains confidential and is not used to train global models

Community and leadership reactions

The announcement was met with immediate traction on social media, spearheaded by CEO Sundar Pichai. In a post on X, Pichai highlighted the practical, time-saving nature of the updates:

“New Gemini updates to make @GoogleWorkspace more personal, helpful and collaborative… no more digging through folders.”

The reaction from the broader tech community has focused heavily on the “9x faster” claim for Sheets, a metric that resonates with data analysts and project managers who spend a significant portion of their week on manual data entry.

Yulie Kwon Kim, VP of Product for Workspace, framed the release as a fundamental reimagining of content creation, stating that Gemini is no longer just a “tool” but a “partner that works alongside you throughout the creative process”.

As these features move from beta to general availability in the coming months, the true test will be how effectively Gemini handles the nuance of complex, real-world data without human intervention. For now, Google has signaled its intent: the era of starting with a blank page is officially over.

AI is the latest escalation in the cloud contest — and enterprise technical leaders should take note

For CTOs, CIOs, and product managers, the deep integration of Gemini into Google Workspace is not merely a suite of new features; it is a fundamental shift toward an “agentic” operating model.

This announcement arrives just 24 hours after Microsoft unveiled “Copilot Cowork,” a cloud-based AI agentic tool designed to compl

ete work on a user’s behalf across the entire Microsoft 365 suite. Both tech giants are now converging on a singular vision: the AI assistant as an execution layer that can navigate multiple files, formats, and data sources to independently plan and deliver finished workplace materials.

By transforming static storage into an active knowledge base, these platforms are providing technical leaders with a framework to reduce the “digital debt” of searching through siloed applications, effectively reimagining white-collar work as a series of delegated outcomes rather than manual tasks.

The scale of this transformation is underpinned by Google’s massive and rapidly expanding footprint. As of early 2026, Google Workspace has surpassed 3 billion monthly active users globally. Within this ecosystem, the paid enterprise segment is seeing explosive growth, with approximately 11 million paying business customers—up from 8 million just one year prior.

More specifically, over 8 million paid Gemini Enterprise seats have already been deployed across more than 2,800 companies. While Microsoft leverages a multi-model architecture incorporating Anthropic’s Claude models for its “Cowork” features, Google is doubling down on its own integrated stack of Gemini 3 and DeepMind logic to provide a seamless, context-aware environment for its vast user base.

From an interpretive standpoint, this “Gemini-fication” of work represents the democratization of advanced analytics. When a manager can use natural language to solve complex optimization problems in Sheets or generate entire presentations from a single prompt, the traditional boundaries of professional roles begin to blur.

While early studies suggest these agentic tools can lead to productivity gains of 15% to 35%, the real value for technical leaders lies in headcount leverage—the ability to maintain high output with leaner teams.

As AI assistants evolve into autonomous agents that navigate enterprise data to “do the work for you,” the role of the knowledge worker is shifting from “creator” to “orchestrator,” requiring a strategic pivot in how enterprises hire and measure human talent in an AI-first economy.

Microsoft announces Copilot Cowork with help from Anthropic — a cloud-powered AI agent that works across M365 apps

If you thought Anthropic was about to run away with the enterprise AI business…you’re not totally off the mark, actually.

This morning, Microsoft announced “Copilot Cowork” a new cloud-based AI agentic automation tool within Microsoft’s existing AI tool 365 Copilot, except now it can complete work on users’ behalf across many Microsoft apps, instead of contained within each one. If it sounds suspiciously similar to Anthropic’s own “Claude Cowork” applications for Mac and Windows (released in January and February of 2026, respectively) that’s to be expected — as Microsoft and Anthropic worked together on this new feature.

Copilot Cowork is the centerpiece of what Microsoft is calling “Wave 3” of Microsoft 365 Copilot — a sweeping platform update that also brings agentic capabilities directly into individual Office apps, makes Anthropic’s Claude models available in mainline Copilot Chat, and introduces new enterprise pricing tiers designed to bundle AI productivity with security and governance.

Anthropic’s initial Claude Cowork applications released in the first two months of 2026 helped trigger a $285 billion selloff in enterprise software stocks as investors repriced companies whose core functionality — project management, writing, data analysis, workflow automation — overlapped with what Anthropic’s AI could do.

Thus, to some AI power users and observers in business and tech who have shared their views on X, the arrival of the closely named and similarly featured Copilot Cowork appears to be an instance of Microsoft playing “catch up.”

Like Claude Cowork, Copilot Cowork users to delegate complex, multi-step tasks to an AI agent that plans, executes, and delivers finished work — in this case, the AI is able to move across and use all the tools and features of Microsoft’s Outlook, Teams, Excel, PowerPoint, and other M365 applications.

CEO Satya Nadella promoted the launch on X, writing: “Announcing Copilot Cowork, a new way to complete tasks and get work done in M365. When you hand off a task to Cowork, it turns your request into a plan and executes it across your apps and files, grounded in your work data and operating within M365’s security and governance boundaries.”

Copilot Cowork is currently in Research Preview with a limited set of customers. Broader access will come through Microsoft’s Frontier program in late March 2026. Enterprises interested in getting early access can join the Frontier program at adoption.microsoft.com/en-us/copilot/frontier-program/. Microsoft also published a companion blog post, “Powering Frontier Transformation with Copilot and agents,” that outlines how organizations can prepare for the rollout.

The announcement represents Microsoft’s most significant step yet in transforming Copilot from a conversational assistant into what the company calls an execution layer — an AI that doesn’t just answer questions but actually completes work on a user’s behalf.

But with Claude Cowork offering much of the same functionality — the question is whether Microsoft’s first-party offering comes with enough unique advantages or integration with trusted systems currently used by enterprises, to catch on.

Claude Cowork vs. Copilot Cowork: same DNA, different bets on how work gets done

Microsoft’s announcement blog post explicitly states that Copilot Cowork integrates “the technology behind Claude Cowork,” and both products share a core premise: AI should plan and execute multi-step work, not just respond to prompts.

But the two products diverge sharply in where they operate, what they can reach, and who they’re built for.

Claude Cowork is a desktop agent. It lives on your machine — first Mac, now Windows — and operates within folders you explicitly grant it access to.

It can read, edit, and create local files, automate browser tasks, and connect to external services through Anthropic’s growing library of MCP connectors and plugins spanning tools like Google Drive, Slack, DocuSign, and Salesforce. Its power comes from flexibility: users can point it at essentially any local workflow, and Anthropic’s open plugin architecture means its reach keeps expanding.

But it is fundamentally a personal tool. The user manages what Claude can see, and the security model depends on folder-level sandboxing and individual judgment about what to share.

Copilot Cowork operates in the cloud, inside Microsoft 365’s infrastructure, and draws on something Claude Cowork simply cannot access: the full graph of a user’s enterprise work data.

That means Outlook email threads, Teams conversations, calendar history, SharePoint files, Excel workbooks, and the relationships between them. When Copilot Cowork reschedules a meeting or builds a briefing document, it is pulling from signals across all of those systems simultaneously — a capability that requires deep integration with M365’s APIs and data layer rather than just local file access. Enterprise IT administrators retain control through existing identity, permissions, and compliance policies, and all actions are auditable by default.

The practical upshot is that these products are likely to appeal to different buyers solving different problems, at least in the near term.

Organizations that are deeply embedded in the Microsoft 365 ecosystem — which is to say, most large enterprises — are the natural audience for Copilot Cowork. For a Fortune 500 company whose employees live in Outlook, Teams, and SharePoint all day, the value proposition is compelling: an AI agent that already understands your organizational context, operates within your existing security and compliance framework, and doesn’t require employees to adopt a new application or manage local file permissions.

IT departments that have spent years configuring M365 governance policies will find Copilot Cowork far easier to greenlight than a standalone desktop agent that requires individual users to make security decisions about folder access.

Claude Cowork, by contrast, is likely to attract organizations and individuals who need more flexibility than M365 can provide — teams working across heterogeneous tool stacks, power users who want granular control over what the AI can touch, and companies already building on Anthropic’s API and plugin ecosystem.

Startups and mid-market companies that haven’t standardized on Microsoft’s suite may find Claude Cowork more natural, since it doesn’t assume M365 as the center of gravity. Creative agencies, research teams, legal shops using specialized software, and technical organizations that prize customization over managed simplicity are plausible early adopters.

There is also a pricing and access dimension. Claude Cowork is available today to anyone with a $20/month Claude Pro subscription.

Copilot Cowork is in limited Research Preview and will require a Microsoft 365 Copilot license, which currently runs $30 per user per month on top of existing M365 enterprise subscriptions.

Microsoft also introduced Microsoft 365 E7, a new top-tier enterprise bundle priced at $99 per user per month and available May 1, which includes Copilot, the Agent 365 agentic AI control suite, and the Microsoft Entra Suite comprehensive security solution for identity management, and the full E5 security stack — representing the all-in price for organizations that want AI productivity, agent governance, and advanced security in a single license

For individual knowledge workers or small teams, Claude Cowork is more accessible. For organizations already paying for M365 Copilot, Copilot Cowork arrives as an incremental capability within an existing investment.

The most intriguing question may be whether the two products end up competing at all or instead serve as complementary distribution channels for the same underlying intelligence.

Microsoft is explicitly positioning itself as model-agnostic, choosing “the right model for the job regardless of who built it.”

Anthropic, for its part, benefits from having its technology embedded in the world’s dominant enterprise productivity suite while maintaining a standalone product that keeps its brand and direct customer relationship intact.

It is possible — perhaps even likely — that some enterprises will end up using both: Copilot Cowork for M365-native workflows and Claude Cowork for everything else.

From chat to execution: how Copilot Cowork actually works

Charles Lamanna, Microsoft’s president of business applications and agents, framed the product as the logical next step in Copilot’s evolution in the announcement blog post, writing: “Copilot Cowork is built for that: it helps Copilot take action, not just chat,” Lamanna wrote in a blog post accompanying the announcement.

The workflow is straightforward in concept but ambitious in scope. Users describe an outcome they want — preparing for a client meeting, researching a company, building a product launch plan — and Cowork automatically breaks that request into a structured plan.

It then grounds the work in the user’s existing emails, meetings, messages, files, and data using what Microsoft calls Work IQ, a system that draws on signals across the M365 suite so the AI operates with contextual awareness of the user’s actual work environment.

Critically, the plan executes in the background. Users can have a dozen tasks running simultaneously, each progressing while they focus on other work. Cowork checks in if it needs clarification and presents recommended actions for user approval before applying changes. Microsoft emphasized that users never give up control — the AI works independently but transparently.

Jared Spataro, Microsoft’s chief marketing officer for AI at Work, described the shift in a companion blog post: “Tasks are no longer confined to a single turn or a single app. They can run for minutes or hours, coordinating actions and producing real outputs along the way.”

Calendar triage, meeting prep, deep research, and launch planning

Microsoft showcased four scenarios that illustrate what Cowork can do in practice.

In the first, Cowork reviews a user’s Outlook calendar, identifies conflicts and low-value meetings, and proposes changes — rescheduling, declining, or adding focus blocks — that it then applies once approved. In the second, it handles end-to-end meeting preparation: pulling relevant inputs from email and files, scheduling prep time, and producing a briefing document, supporting analysis, and a client-ready deck, all saved in M365 for team collaboration.

The third scenario demonstrates deep research capabilities. Cowork can gather earnings reports, SEC filings, analyst commentary, and news, then organize findings with citations into an executive summary, a structured research memo, and an Excel workbook with labeled tabs. The fourth tackles product launch planning, building a competitive comparison in Excel, distilling a value proposition document, generating a pitch deck, and outlining milestones and owners.

In each case, Microsoft stressed that Cowork isn’t just creating content — it’s coordinating the work around it, producing multiple connected deliverables across applications in a single workflow.

The Anthropic connection: Claude technology powers Copilot’s new agent brain

This is the clearest public confirmation yet that Microsoft’s deepening relationship with Anthropic — the $30 billion Azure compute deal announced in November 2025, the integration of Claude Opus 4.6 into Microsoft Foundry in early February 2026, the ongoing internal adoption of Claude Code across Microsoft engineering teams, according to The Verge — has now reached the company’s flagship productivity suite.

But it goes beyond Cowork. According to Spataro’s companion blog post, Claude is now available in mainline Copilot Chat for Frontier program users, alongside the latest generation of OpenAI models. That means Anthropic’s models aren’t just powering Cowork’s task execution behind the scenes — they’re becoming a general-purpose option that users can access directly in everyday Copilot conversations.

And this is despite Anthropic’s ongoing clash with the U.S. Department of War over its “red lines” prohibiting AI use in mass domestic surveillance and fully autonomous lethal weaponry, which the Department of War claims are unnecessary, already guided by existing law, and should not be enforced by an outside contract vendor. Notably, Microsoft is also a major vendor of the Department of War (formerly Defense) and governments more broadly.

Yet in a major difference, Microsoft users other AI models across the Copilot 365 experience.

Spataro was pointed about why. “Many AI tools lock users into a single vendor’s models,” he wrote. “Others force people to choose between tools, experiences, or modes depending on the task. That fragmentation creates friction for individuals and complexity for organizations.” Microsoft’s answer is a multi-model architecture where “Copilot automatically applies the right model for the task, all grounded in your enterprise context.”

That multi-model positioning is a significant strategic signal. Microsoft has invested $13 billion in OpenAI, which has long served as the primary provider of AI models for Microsoft’s products.

The decision to power a major new M365 capability with Anthropic’s technology suggests Microsoft increasingly views model diversity not as a hedge but as a competitive advantage — choosing the best available AI for each specific task rather than remaining locked to a single provider.

Wave 3 extends agentic AI beyond Cowork

Copilot Cowork is the headline feature of the Wave 3 update, but it’s far from the only change.

What Microsoft previously called “Agent Mode” in Word, Excel, PowerPoint, and Outlook has been rebranded as simply how Copilot works in those apps going forward.

In Excel and Word, these agentic capabilities are now generally available. In PowerPoint and Outlook, they’re rolling out over the coming months. Copilot can now refine a Word document into a polished draft, improve Excel spreadsheets with real formulas, produce slides in PowerPoint that match an organization’s brand kits and layout conventions, and draft and refine emails directly in Outlook — all grounded in the user’s work context through Work IQ.

Copilot Chat is also becoming a more capable starting point. Users can now create documents, spreadsheets, and presentations directly from a conversation, or take workplace actions like scheduling meetings and sending emails without switching apps.

Microsoft is opening the chat experience to third-party agents as well, with integrations from Adobe, Monday.com, Figma, and others surfacing directly within Copilot Chat through open standards including MCP.

Enterprise-grade security and a cautious rollout

Microsoft emphasized that Copilot Cowork runs within M365’s existing security and governance framework. Identity, permissions, and compliance policies apply by default, and all actions and outputs are auditable. Tasks run in a protected, sandboxed cloud environment so they can continue progressing safely as users switch devices.

The cautious rollout plan signals that Microsoft is treating this as a deliberate enterprise deployment rather than a consumer splash. Copilot Cowork is currently being tested with a limited set of customers in what Microsoft calls a Research Preview. Broader availability will come through the company’s Frontier program in late March 2026.

The timing aligns with other major Microsoft announcements today, including the general availability of Agent 365 and Microsoft 365 Enterprise 7 — products designed to bring security and governance to AI agents operating inside large organizations.

Agent 365, which Microsoft calls “the control plane for agents,” will be generally available on May 1 at $15 per user per month, giving IT and security teams a single place to observe, secure, and govern every AI agent operating across an organization. Microsoft cited an IDC projection that agent use will increase by an order of magnitude in the next few years, with “hundreds of millions — and soon billions — of agents operating across enterprises.”

Together, the announcements paint a picture of Microsoft building out the infrastructure to support autonomous AI agents at enterprise scale while keeping IT administrators firmly in control.

What it means for the AI agent race

Copilot Cowork arrives at an inflection point in the AI industry. Every major platform company is now racing to deliver agents that don’t just converse but execute.

Anthropic has its standalone Claude Cowork. OpenAI recently launched GPT-5.4 with native computer use capabilities and its own Windows apps integrations, and earlier, launched its own Codex AI coding application and hired the creator of the popular open source AI agent tool OpenClaw. Google has of course been steadily expanding Workspace integrations for AI agents.

Microsoft’s advantage is distribution. With hundreds of millions of M365 users across the enterprise, Copilot Cowork has a built-in audience that no standalone AI product can match.

By pairing that reach with what it considers the best available AI technology — even when that technology comes from a competitor to its $13 billion investment partner — Microsoft is betting that the enterprise agent market will be won not by the company with the best model but by the one that integrates most deeply into the workflows people already use.

Whether Copilot Cowork delivers on that promise will depend on execution quality and user trust — the same open questions facing every AI agent product on the market. But with Anthropic’s Claude technology running inside M365’s security perimeter and Nadella personally promoting the launch, Microsoft — like the rest of the tech industry — is clearly banking on the fact that the era of AI as a passive assistant is over. The next chapter is AI that does the work for you, without you.

Enterprise agentic AI requires a process layer most companies haven’t built

Presented by Celonis


85% of enterprises want to become agentic within three years — yet 76% admit their operations can’t support it. According to the Celonis 2026 Process Optimization Report, based on a survey of more than 1,600 global business leaders, organizations are aggressively pursuing AI-driven transformation. Yet most acknowledge that the foundational work — modernizing workflows, reducing process friction, and building operational resilience — remains unfinished. The ambition is clear. The infrastructure to execute on it is not.

To act autonomously and effectively, AI agents need optimized, AI-ready processes and the process data and operational context that only comes from process intelligence. Without that, they’re guessing. And 82% of decision-makers believe AI will fail to deliver return on investment (ROI) if it doesn’t understand how the business runs.

“The scale of the opportunity is truly remarkable: 89% of leaders see AI as their biggest competitive opportunity,” says Patrick Thompson, global SVP of customer transformation. “That’s not a marginal finding. What’s interesting is the shift in the framing. Leaders are confident that AI will transform operations. The question now is how to fuel their ambitions with the right AI enablers.”

Explaining the gap between ambition and reality

Right now, 85% of teams are using gen AI tools for everyday tasks, so the “will this work?” question is largely settled. The real question has shifted to: “Why isn’t it working the way we need it to?” And that’s a much harder problem, because it’s structural. It’s siloed teams. Systems that don’t talk to each other. AI that looks impressive in a demo but falters once it’s dropped into a real enterprise environment. That’s the wall companies are hitting.

So, despite the overwhelming ambition, only 19% of organizations use multi-agent systems today. It all comes down to an operational readiness problem, Thompson says.

“Nine in ten leaders are already using or exploring multi-agent systems, so the will is absolutely there, but ambition without infrastructure doesn’t get you very far,” he explains.

Until now, process has largely been a “good enough” problem, because processes that are messy and disconnected can still produce results, just inefficient and opaque. As long as the business is growing, there hasn’t been a burning urge to fix them. AI changed the calculus. If 82% of leaders believe AI can only deliver ROI with proper business context, then sub-optimal processes aren’t just an operational inconvenience, they’re actively blocking an AI strategy. Suddenly, process optimization isn’t a background IT project, but a prerequisite for competing.

“This is where structural modernization becomes critical,” he says. “Organizations that have invested in modernizing their data, systems, and processes are in a far stronger position to enable AI at scale.”

The other AI stopper: Lack of business context

AI will not be able to provide the strongest ROI possible until it understands the operational context of the business. That includes how KPIs are defined and calculated, any unique internal policies and procedures, how the organization is structured, and where the real decision authority sits.

This knowledge is usually trapped in different departments that have developed their own languages and systems over time. They don’t naturally share a common understanding. Bringing AI into that environment is something like dropping someone into a conversation that’s been going on for years, without any of the backstory.

Process intelligence becomes the connective layer — a shared operational language that grounds AI decisions in how the business actually runs.

Why AI adoption is also a change management problem

The AI adoption challenge is less a technology problem and more of a change-management and operating-model problem than many more leaders want to admit, because technology problems feel easier to solve. The data shows that only 6% of leaders cite resistance to change as a hurdle. The real blockers are siloed teams (54%) and a lack of coordination between departments (44%). And 93% of process and operations leaders explicitly state that process optimization is as much about people and culture as it is about tools and technology.

“When companies come to us looking for a technology fix, part of our job is helping them see that the operating model has to evolve alongside the tooling,” Thompson says. “You can’t bolt AI onto a broken process and expect it to work. True enterprise modernization means redesigning how teams, systems, and decisions connect, and AI only works when that modernization happens first.”

Making process optimization a strategic advantage

How do you make process optimization a strategic advantage, rather than another operational project? Connect it directly to outcomes that executives care about. When processes work, they go beyond IT metrics, directly affecting board-level concerns. A full 63% of leaders use process optimization to proactively manage risks, while 58% see faster decision-making.

Plus, the economic and geopolitical environment right now makes agility a survival skill. Look at the supply chain industry, where 66% already view process optimization as a critical business-wide initiative.

“That’s the mindset shift we’re trying to catalyze across the rest of the organization,” Thompson says. “It’s not maintenance work. It’s what lets you move fast when the world changes, and right now the world is moving constantly.”

Closing the readiness gap in enterprise agentic AI

To succeed, and even triumph, organizations must be ready to close the readiness gap, and they need to be honest about where they’re starting from, Thompson says.

“The biggest risk I see is companies continuing to layer AI on top of fragmented, opaque processes and then wondering why they’re not getting results,” he says. “Moving from static, traditional tools to real process intelligence, where you have live visibility into how your operations actually run, that’s the foundational shift that makes agentic AI viable.”

Without it, agents get deployed in the wrong places, can’t be integrated with existing systems, and organizations end up with expensive pilots that don’t scale. The call to action is clear: stop starting with tools and start with operational visibility.

“The leaders who will win in the agentic era aren’t necessarily the ones with the most sophisticated AI,” he says. “They’re the ones who’ve done the hard work of building a shared, accurate picture of their operations. Process intelligence is the starting point. It’s what enables enterprise modernization in practice, creating the operational clarity AI needs to deliver real ROI. Master your processes, give AI the context it needs, and then you can actually deploy it somewhere it will deliver.”


Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Dynamic UI for dynamic AI: Inside the emerging A2UI model

With agentic AI, businesses are conducting business more dynamically. Instead of traditional pre-programmed bots and static rules, agents can now “think” and invent alternate paths when unseen conditions arise. For instance, using a business domain ont…