The landscape of enterprise artificial intelligence shifted fundamentally today as OpenAI announced $110 billion in new funding from three of tech’s largest firms: $30 billion from SoftBank, $30 billion from Nvidia, and $50 billion from Amazon.
But while the former two players are providing money, OpenAI is going further with Amazon in a new direction, establishing an upcoming fully “Stateful Runtime Environment” on Amazon Web Services (AWS), the world’s most used cloud environment.
This signals OpenAI’s and Amazon’s vision of the next phase of the AI economy — moving from chatbots to autonomous “AI coworkers” known as agents — and that this evolution requires a different architectural foundation than the one that built GPT-4.
For enterprise decision-makers, this announcement isn’t just a headline about massive capital; it is a technical roadmap for where the next generation of agentic intelligence will live and breathe.
And especially for those enterprises currently using AWS, it’s great news, giving them more options with a new runtime environment from OpenAI coming soon (the companies have yet to announce a precise timeline for when it will arrive).
At the heart of the new OpenAI-Amazon partnership is a technical distinction that will define developer workflows for the next decade: the difference between “stateless” and “stateful” environments.
To date, most developers have interacted with OpenAI through stateless APIs. In a stateless model, every request is an isolated event; the model has no “memory” of previous interactions unless the developer manually feeds the entire conversation history back into the prompt. OpenAI’s prior cloud partner and major investor, Microsoft Azure, remains the exclusive third-party cloud provider for these stateless APIs.
The newly announced Stateful Runtime Environment, by contrast, will be hosted on Amazon Bedrock — a paradigm shift.
This environment allows models to maintain persistent context, memory, and identity. Rather than a series of disconnected calls, the stateful environment enables “AI coworkers” to handle ongoing projects, remember prior work, and move seamlessly across different software tools and data sources.
As OpenAI notes on its website: “Now, instead of manually stitching together disconnected requests to make things work, your agents automatically execute complex steps with ‘working context’ that carries forward memory/history, tool and workflow state, environment use, and identity/permission boundaries.”
For builders of complex agents, this reduces the “plumbing” required to maintain context, as the infrastructure itself now handles the persistent state of the agent.
The vehicle for this stateful intelligence is OpenAI Frontier, an end-to-end platform designed to help enterprises build, deploy, and manage teams of AI agents, launched back in early February 2026.
Frontier is positioned as a solution to the “AI opportunity gap”—the disconnect between model capabilities and the ability of a business to actually put them into production.
Key features of the Frontier platform include:
Shared Business Context: Connecting siloed data from CRMs, ticketing tools, and internal databases into a single semantic layer.
Agent Execution Environment: A dependable space where agents can run code, use computer tools, and solve real-world problems.
Built-in Governance: Every AI agent has a unique identity with explicit permissions and boundaries, allowing for use in regulated environments.
While the Frontier application itself will continue to be hosted on Microsoft Azure, AWS has been named the exclusive third-party cloud distribution provider for the platform.
This means that while the “engine” may sit on Azure, AWS customers will be able to access and manage these agentic workloads directly through Amazon Bedrock, integrated with AWS’s existing infrastructure services.
For now, OpenAI has launched a dedicated Enterprise Interest Portal on its website. This serves as the primary intake point for organizations looking to move past isolated pilots and into production-grade agentic workflows.
The portal is a structured “request for access” form where decision-makers provide:
Firmographic Data: Basic details including company size (ranging from startups of 1–50 to large-scale enterprises with 20,000+ employees) and contact information.
Business Needs Assessment: A dedicated field for leadership to outline specific business challenges and requirements for “AI coworkers”.
By submitting this form, enterprises signal their readiness to work directly with OpenAI and AWS teams to implement solutions like multi-system customer support, sales operations, and finance audits that require high-reliability state management.
The scale of the announcement was mirrored in the public statements from the key players on social media.
Sam Altman, CEO of OpenAI, expressed excitement about the Amazon partnership, specifically highlighting the “stateful runtime environment” and the use of Amazon’s custom Trainium chips.
However, Altman was quick to clarify the boundaries of the deal: “Our stateless API will remain exclusive to Azure, and we will build out much more capacity with them”.
Amazon CEO Andy Jassy emphasized the demand from his own customer base, stating, “We have lots of developers and companies eager to run services powered by OpenAI models on AWS”. He noted that the collaboration would “change what’s possible for customers building AI apps and agents”.
Early adopters have already begun to weigh in on the utility of the Frontier approach. Joe Park, EVP at State Farm, noted that the platform is helping the company accelerate its AI capabilities to “help millions plan ahead, protect what matters most, and recover faster”.
For CTOs and enterprise decision-makers, the OpenAI-Amazon-Microsoft triangle creates a new set of strategic choices. The decision of where to allocate budget now depends heavily on the specific use case:
For High-Volume, Standard Tasks: If your organization relies on standard API calls for content generation, summarization, or simple chat, Microsoft Azure remains the primary destination. These “stateless” calls are exclusive to Azure, even if they originate from an Amazon-linked collaboration.
For Complex, Long-Running Agents: If your goal is to build “AI coworkers” that require deep integration with AWS-hosted data and persistent memory across weeks of work, the AWS Stateful Runtime Environment is the clear choice.
For Custom Infrastructure: OpenAI has committed to consuming 2 gigawatts of AWS Trainium capacity to power Frontier and other advanced workloads. This suggests that enterprises looking for the most cost-efficient way to run OpenAI models at massive scale may find an advantage in the AWS-Trainium ecosystem.
Despite the massive infusion of Amazon capital, the legal and financial ties between Microsoft and OpenAI remain remarkably rigid. A joint statement released by both companies clarified that their “commercial and revenue share relationship remains unchanged”.
Crucially, Microsoft continues to maintain its “exclusive license and access to intellectual property across OpenAI models and products”. Furthermore, Microsoft will receive a share of the revenue generated by the OpenAI-Amazon partnership.
This ensures that while OpenAI is diversifying its infrastructure, Microsoft remains the ultimate beneficiary of OpenAI’s commercial success, regardless of which cloud the compute actually runs on.
The definition of Artificial General Intelligence (AGI) also remains a protected term in the Microsoft agreement. The contractual processes for determining when AGI has been reached—and the subsequent impact on commercial licensing—have not been altered by the Amazon deal.
Ultimately, OpenAI is positioning itself as more than a model or tool provider; it is an infrastructure player attempting to straddle the two largest clouds on Earth.
For the user, this means more choice and more specialized environments. For the enterprise, it means that the era of “one-size-fits-all” AI procurement is over.
The choice between Azure and AWS for OpenAI services is now a technical decision about the nature of the work itself: whether your AI needs to simply “think” (stateless) or to “remember and act” (stateful).
AI agents now carry more access and more connections to enterprise systems than any other software in the environment. That makes them a bigger attack surface than anything security teams have had to govern before, and the industry doesn’t yet have a framework for it. “If that attack vector gets utilized, it can result in a data breach, or even worse,” said Spiros Xanthos, founder and CEO of Resolve AI, speaking at a recent VentureBeat AI Impact Series event.
Traditional security frameworks are built around human interactions. There’s not yet an agreed-upon construct for AI agents that have personas and can work autonomously, noted Jon Aniano, SVP of product and CRM applications at Zendesk, at the same event. Agentic AI is moving faster than enterprises can build guardrails — and Model Context Protocol (MCP), while decreasing integration complexity, is making the problem worse.
Agentic AI is moving faster than enterprises can build guardrails around them, according to Aniano and other enterprises leaders. And Model Context Protocol (MCP), while decreasing integration complexity, doesn’t help.
“Right now it’s an unsolved problem because it’s the wild, wild West,” Aniano said. “We don’t even have a defined technical agent-to-agent protocol that all companies agree on. How do you balance user expectations versus what keeps your platform safe?”
Enterprises are increasingly hooking into MCP servers because they simplify integration between agents, tools and data. However, MCP servers tend to be “extremely permissive,” he said.
They are “actually probably worse than an API,” he contended, because APIs at least have more controls in place to impose upon agents.
Today’s agents are acting on behalf of humans based on explicit permissions, thus establishing human accountability. “But you might have tens, hundreds of agents in the future with their own identity, their own access,” said Xanthos. “It becomes a very complex matrix.”
Even as his startup is developing autonomous AI agents for site reliability engineering (SRE) and system management, he acknowledged that the industry “completely lacks the framework” for autonomous agents.
“It’s completely on us and to anybody who builds agents to figure out what restrictions to give them,” he said. And customers must be able to trust those decisions.
Some existing security tools do offer fine-grained access — Splunk, for instance, developed a method to provide access to certain indexes in underlying data stores, he noted — but most are broader and human-oriented.
“We’re trying to figure this out with existing tools,” he said. “But I don’t think they’re sufficient for the era of agents.”
At Zendesk and other customer relationship management (CRM) platform providers, AI is involved in a number of user interactions, Aniano noted — in fact, now it’s at a “volume and a scale that we haven’t contemplated as businesses and as a society.”
It can get tricky when AI is helping out human agents; the audit trail can become a labyrinth.
“So now you’ve got a human talking to a human that’s talking to an AI,” Aniano noted. “The human tells the AI to take action. Who’s at fault if it’s the wrong action?” This becomes even more complicated when there are “multiple pieces of AI and multiple humans” in the mix.
To prevent agents from going off the rails, Zendesk tends to be “very strict” about access and scope; however, customers can define their own guardrails based on their needs. In most cases, AI can access knowledge sources, but they’re not writing code or running commands on servers, Aniano said. If an AI does call an API, it is “declaratively designed” and sanctioned, and actions are specifically called out.
However, customer demand is flooding these scenarios and “we’re kind of holding the gates right now,” he said.
The industry must develop concrete standards for agent interactions. “We’re entering a world where, with things like MCP that can auto-discover tools, we’re going to have to create new methods of safety for deciding what tools these bots can interact with,” said Aniano.
When it comes to security, enterprises are rightly concerned when AI takes over authentication tasks, such as sending out and processing one-time passwords (OTP), SMS codes, or other two-step verification methods, he said. What happens if an AI mis-authenticates or misidentifies someone? This can lead to sensitive data leakage or open the door for attackers.
“There’s a spectrum now, and the end of that spectrum today is a human,” Aniano said. However, “the end of that spectrum tomorrow might be a specialized agent designed to do the same kind of gut feeling or human-level interaction.”
Customers themselves are on a spectrum of adoption and comfort. In certain companies — particularly financial services or other highly-regulated environments — humans still must be involved in authentication, Aniano noted. In other cases, legacy companies or old guards only trust humans to authenticate other humans.
He noted that Zendesk is experimenting with new AI agents that are “a little more connected to systems,” and working with a select group of customers around guardrailing.
In some future, agents may actually be more trusted than humans to do some tasks, and granted permissions “way beyond” what humans have today, Xanthos said. But we’re a long way from that, and, for the most part, the fear of something going wrong is what’s holding enterprises back.
“Which is a good fear, right? I’m not saying that it is a bad thing,” he said. Many enterprises simply aren’t yet comfortable with an agent doing all steps of a workflow or fully closing the loop by itself. They still want human review.
Resolve AI is on the cusp of giving agents standing authorization in a few cases that are “generally safe,” such as in coding; from there they’ll move to more open-ended scenarios that are not all that risky, Xanthos explained. But he acknowledged that there will always be very risky situations where AI mistakes could “mutate the state of the production system,” as he put it.
Ultimately, though: “There’s no going back, obviously; this is moving faster than maybe even mobile did. So the question is what do we do about it?”
Both speakers pointed to interim measures available within existing tooling. Xanthos noted that some tools — Splunk among them — already offer fine-grained index-level access controls that can be applied to agents. Aniano described Zendesk’s approach as a practical starting point: declaratively designed API calls with explicitly sanctioned actions, strict access and scope limits, and human review before expanding agent permissions.
The underlying principle, as Aniano put it: “We’re always checking those gates and seeing how we can widen the aperture” — meaning don’t grant standing authorization until you’ve validated each expansion.
Former Twitter co-founder Jack Dorsey’s new company Block — the parent of merchants payment system Square, mobile peer-to-peer payments Cash App, music streamer Tidal, and open source AI agentic system Goose — is sending shockwaves across the business world tonight after announcing a more than 40% headcount, cutting its workforce by more than 4,000 people out of a prior total of 10,000, despite its latest quarterly earnings statement released today showing $2.87 billion in gross profit up 24% year-over-year.
The culprit? Newfound AI efficiencies. As Dorsey put it in a note shared on his own former social network, X:
“we’re not making this decision because we’re in trouble. our business is strong. gross profit continues to grow, we continue to serve more and more customers, and profitability is improving. but something has changed. we’re already seeing that the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company. and that’s accelerating rapidly.
i had two options: cut gradually over months or years as this shift plays out, or be honest about where we are and act on it now. i chose the latter. repeated rounds of cuts are destructive to morale, to focus, and to the trust that customers and shareholders place in our ability to lead. i’d rather take a hard, clear action now and build from a position we believe in than manage a slow reduction of people toward the same outcome. a smaller company also gives us the space to grow our business the right way, on our own terms, instead of constantly reacting to market pressures.”
The core of this reorganization is a pivot toward an “intelligence-native” model. Dorsey argues that a significantly smaller team, leveraging the very tools they are building, can deliver more value than a traditional large-scale organization. Block is re-engineering its entire operational stack to be orchestrated by AI, moving away from human-intensive management hierarchies toward what it calls “agentic AI infrastructure”.
This includes four primary focus areas:
Customer Capabilities: Atomic features that allow customers to build directly on top of Block’s infrastructure.
Proactive Intelligence: Moving from reactive dashboards to tools like Moneybot that anticipate customer needs before they ask.
Intelligence Models: A system to orchestrate the company’s internal operations, aiming for extreme speed and product velocity.
Operational Orchestration: An AI model designed to manage the internal decision-making and risk-assessment processes of the firm.
The financial strength cited in the lede is driven by deep engagement in Cash App and Square. Cash App’s gross profit grew 33% YoY to $1.83 billion, while Square saw its strongest year on record for new volume added (NVA).
Specific product highlights include:
Cash App Green: This status program for “modern earners” — a segment of 125 million people including gig workers and freelancers — has become a cornerstone of the company’s engagement strategy.
Square AI: Now embedded in the Square Dashboard, it provides sellers with instant insights into staffing and customer behavior.
Consumer Lending: Cash App Borrow origination volume surged 223% YoY, proving to be a high-return product that manages income variability for users.
Block also exceeded the Rule of 40—the industry benchmark where the sum of gross profit growth and adjusted operating income margin exceeds 40%—for the first time in the fourth quarter.
Not everyone was convinced by Dorsey’s letter stating that AI efficiencies were the primary driver of the layoffs. As Will Slaughter wrote on X: “In 3 years from December 2019 to December 2022, Block $XYZ more than tripled its headcount from 3,900 to 12,500. Unwinding less than half an insane COVID overhiring binge has much more to do with Jack Dorsey’s managerial incompetence than whether AI is going to take your job.”
Entrepreneur Marcelo P. Lima offered a similar sentiment on X, writing in part: “Everyone will assume Jack Dorsey ‘greatest of all time’ is doing this because of AI. He’s not. Block has been massively bloated for years. Don’t forget, Jack was head of Twitter. When Elon took over, he fired 80% of staff within 5 months and the product got better. This was before generative AI and Claude Code.”
And yet, regardless of how heavily AI factored into these layoffs in particular, the outcome on the wider enterprise landscape may ultimately be the same. With Block’s stock price rising more than 24% on the news, the boards and leadership of other public companies will likely be forced to at least entertain the idea of similarly drastic cuts if they believe AI can replace human labor and drive greater organizational efficiencies.
As user @khuppy wrote on X: “By Q2, if you aren’t firing lots of employees, your board will fire you for being a dinosaur who doesn’t implement AI. It’s going to happen fast now. Feudalism, here we come…”
Clearly, companies across sectors but especially those in tech and services will be re-examining their headcount in light of Block’s latest move.
Despite the robust financial performance, the human cost is stark. The reduction from over 10,000 to just under 6,000 employees is one of the most drastic in fintech history. Dorsey’s internal note, while aimed at transparency, was met with a mix of awe at the technical vision and criticism of the timing.
Affected employees are receiving a severance package that includes 20 weeks of salary plus one week per year of tenure, equity vesting through May, and a $5,000 transition fund.
Dorsey noted that communication channels would stay open through Thursday evening so the team could say goodbye properly, stating, “i’d rather it feel awkward and human than efficient and cold.”
For enterprise decision-makers, Block’s move represents a fundamental challenge to the “growth at all costs” hiring model that has defined the last decade of tech.
Leadership teams should view this not merely as a cost-cutting measure, but as a strategic reset where organizational value is measured by the ratio of output to “intelligence-native” tools rather than total headcount. Executives should begin by auditing their own internal workflows to identify where agentic AI can consolidate roles and flatten management hierarchies before market pressures force a more reactive, less orderly contraction.
Even if not leading to as drastic of cuts, hiring slowdowns and freezes, Block’s move should likely prompt at least the kind of policy introduced separately by Shopify CEO Tobi Lutke nearly a year ago: “Before asking for more Headcount and resources, teams most demonstrate why they cannot get what they want done using AI.”
While the community reaction to Block’s layoffs highlights the potential for brand damage and morale loss, the 24% surge in Block’s stock price suggests that the public market is increasingly rewarding lean, automated efficiency over human-intensive scaling.
Decision-makers should evaluate their current “bloat” against the benchmark set by Dorsey: if a company of 6,000 can drive $12.20 billion in gross profit, the standard for organizational efficiency has been permanently raised.
In building LLM applications, enterprises often have to create very long system prompts to adjust the model’s behavior for their applications. These prompts contain company knowledge, preferences, and application-specific instructions. At enterprise scale, these contexts can push inference latency past acceptable thresholds and drive per-query costs up significantly.
On-Policy Context Distillation (OPCD), a new training framework proposed by researchers at Microsoft, helps bake the knowledge and preferences of applications directly into a model. OPCD uses the model’s own responses during training, which avoids some of the pitfalls of other training techniques. This improves the abilities of models for bespoke applications while preserving their general capabilities.
In-context learning allows developers to update a model’s behavior at inference time without modifying its underlying parameters. Updating parameters is typically a slow and expensive process. However, in-context knowledge is transient. This knowledge does not carry across different conversations with the model, meaning you have to feed the model the exact same massive set of instructions or documents every time. For an enterprise application, this might mean repeatedly pasting company policies, customer tickets, or dense technical manuals into the prompt. This eventually slows down the model, drives up costs, and can confuse the system.
“Enterprises often use long system prompts to enforce safety constraints (e.g., hate speech detection) or to provide domain-specific expertise (e.g., medical knowledge),” said Tianzhu Ye, co-author of the paper and researcher at Microsoft Research Asia, in comments provided to VentureBeat. “However, lengthy prompts significantly increase computational overhead and latency at inference time.”
The main idea behind context distillation is to train a model to internalize the information that you repeatedly insert into the context. Like other distillation techniques, it follows a teacher-student paradigm. The teacher is an AI model that receives the massive, detailed prompt. Because it has all the instructions and reference documents, it generates highly tailored responses. The student is a model being trained that only sees the main question and doesn’t have access to the full context. Its goal is simply to observe the teacher’s responses and learn to mimic its behavior.
Through this training process, the student model effectively compresses the complex instructions from the teacher’s prompt directly into its parameters. For an enterprise, the primary value happens at inference time. Because the student model has internalized the context, you can deploy it in your application without needing to paste in the lengthy instructions again. This makes the model significantly faster and with far less computational overhead.
However, classic context distillation relies on a flawed training method called “off-policy training,” where the model is trained on fixed datasets that were collected before the training process. This is problematic in several ways. During training, the student is only exposed to ground-truth data and teacher-generated answers, creating what Ye calls “exposure bias.” In production, the model must come up with its own token sequences to reach those answers. Because it never practiced making its own decisions or recovering from its own mistakes during training, it can easily derail when operating independently. It’s like showing a student videos of a professional driver and expecting them to learn driving without trial and error.
Another problem is the “forward Kullback-Leibler (KL) divergence” minimization measure used to train the model. Under this method, the model is graded on how similar its answers are to the teacher, which encourages “mode-covering” behavior, Ye says. The student model is often smaller or lacks the rich context the teacher had, meaning it simply lacks the capacity to perfectly replicate the teacher’s complex reasoning. Because the student is forced to try and cover all those possibilities anyway, its underlying guesses become overly broad and unfocused.
In real-world applications, this can result in hallucinations, where the AI gets confused and confidently makes things up because it is trying to mimic a depth of knowledge it does not actually possess. It also means that the model cannot generalize well to new tasks.
To fix the critical issues with the old teacher-student dynamic, the Microsoft researchers introduced On-Policy Context Distillation (OPCD). The most important shift in OPCD is that the student model learns from its own generation trajectories as opposed to a static dataset (which is why it is called “on-policy”). Instead of passively studying a dataset of the teacher’s perfect outputs, the student is given a task without seeing the massive instruction prompt and has to generate an answer entirely on its own.
As the student generates its answer, the teacher acts as a live instructor. The teacher has access to the full, customized prompt and evaluates the student’s output. At every step along the student’s generation, the system compares the student’s token distribution against what the context-aware teacher would do.
OPCD uses “reverse KL divergence” to grade the student. “By minimizing reverse KL divergence, it promotes ‘mode-seeking’ behavior. It focuses on high-probability regions of the student’s distribution,” Ye said. “It suppresses tokens that the student considers unlikely, even if the teacher’s belief assigned them high probability. This alignment helps the student correct its own mistakes and avoid the broad, hallucinatory distributions of standard distillation.”
Because the student model actively practices making its own decisions and learns to correct its own mistakes during training, it behaves more reliably when deployed in a live application. It successfully bakes complex business rules, safety constraints, or specialized knowledge directly into its permanent memory.
The researchers tested OPCD in two key areas: experiential knowledge distillation and system prompt distillation. For experiential knowledge distillation, the researchers wanted to see if an LLM could learn from its own past successes and permanently adopt those lessons. They tested this on models of various sizes, using mathematical reasoning problems.
First, the model solved problems and was asked to write down general rules it learned from its successes. Then, using OPCD, they baked those written lessons directly into the model’s parameters. The results showed that the models improved dramatically without needing the learned experience pasted into their prompts anymore. On complex math problems, an 8-billion-parameter model improved from a 75.0% baseline to 80.9%. For example, on the Frozen Lake navigation game, a small 1.7-billion parameter model initially had a success rate of 6.3%. After OPCD baked in the learned experience, its accuracy jumped to 38.3%.
The second set of experiments were on long system prompts. Enterprises often use massive system prompts to enforce strict behavioral guidelines, like maintaining a professional tone, ensuring medical accuracy, or filtering out toxic language. The researchers tested whether OPCD could permanently bake these dense behavioral rules into the models so they would not have to be sent with every single user query. Their experiments show that OPCD successfully internalized these complex rules and massively boosted performance. When testing a 3-billion parameter Llama model on safety and toxicity classification, the base model scored 30.7%. After using OPCD to internalize the safety prompt, its accuracy spiked to 83.1%. On medical question answering, the same model improved from 59.4% to 76.3%.
One of the key challenges of fine-tuning models is catastrophic forgetting, where the model becomes too focused on the fine-tune task and worse at general tasks. The researchers tracked out-of-distribution performance to test for this tunnel vision. When they distilled strict safety rules into a model, they immediately tested its ability to answer unrelated medical questions. OPCD successfully maintained the model’s general medical knowledge, outperforming the old off-policy methods by approximately 4 percentage points. It specialized without losing its broader intelligence.
While OPCD is a powerful tool for internalizing static knowledge and complex rules, it does not replace all external context methods. “RAG is better when the required information is highly dynamic or involves a massive, frequently updated external database that cannot be compressed into model weights,” Ye said.
For enterprise teams evaluating their pipelines, adopting OPCD does not require overhauling existing systems or investing in specialized hardware. “OPCD can be integrated into existing workflows with very little friction,” Ye said. “Any team already running standard RLVR [Reinforcement Learning from Verifiable Rewards] pipelines can adopt OPCD without major architectural changes.”
In practice, the student model acts as the policy model performing rollouts, while the frozen teacher model serves as a reference providing logits. The hardware requirements are highly accessible. According to Ye, enterprise teams can reproduce the researchers’ experiments using about eight A100 GPUs.
The data requirements are similarly lightweight. For experiential knowledge distillation, developers only need around 30 seed examples to generate solution traces. Because the technique is applied to previously unoptimized environments, even a small amount of data yields the majority of the performance improvement. For system prompt distillation, existing optimized prompts and standard task datasets are sufficient.
The researchers built their own implementation on verl, an open-source RLVR codebase, proving that the technique fits cleanly within conventional reinforcement learning frameworks. They plan to release their implementation as open source following internal reviews.
Looking ahead, OPCD paves the way for genuinely self-improving models that continuously adapt to bespoke enterprise environments. Once deployed, a model can extract lessons from real-world interactions and use OPCD to progressively internalize those characteristics without requiring manual supervision or data annotation from model trainers.
“This represents a fundamental paradigm shift in model improvement: the core improvements to the model would move from training time to test time,” Ye said. “Using the model—and allowing it to gather experience—would become the primary driver of its advancement.”
When your average daily token usage is 8 billion a day, you have a massive scale problem.
This was the case at AT&T, and chief data officer Andy Markus and his team recognized that it simply wasn’t feasible (or economical) to push everything through large reasoning models.
So, when building out an internal Ask AT&T personal assistant, they reconstructed the orchestration layer. The result: A multi-agent stack built on LangChain where large language model “super agents” direct smaller, underlying “worker” agents performing more concise, purpose-driven work.
This flexible orchestration layer has dramatically improved latency, speed and response times, Markus told VentureBeat. Most notably, his team has seen up to 90% cost savings.
“I believe the future of agentic AI is many, many, many small language models (SLMs),” he said. “We find small language models to be just about as accurate, if not as accurate, as a large language model on a given domain area.”
Most recently, Markus and his team used this re-architected stack along with Microsoft Azure to build and deploy Ask AT&T Workflows, a graphical drag-and-drop agent builder for employees to automate tasks.
The agents pull from a suite of proprietary AT&T tools that handle document processing, natural language-to-SQL conversion, and image analysis. “As the workflow is executed, it’s AT&T’s data that’s really driving the decisions,” Markus said. Rather than asking general questions, “we’re asking questions of our data, and we bring our data to bear to make sure it focuses on our information as it makes decisions.”
Still, a human always oversees the “chain reaction” of agents. All agent actions are logged, data is isolated throughout the process, and role-based access is enforced when agents pass workloads off to one another.
“Things do happen autonomously, but the human on the loop still provides a check and balance of the entire process,” Markus said.
AT&T doesn’t take a “build everything from scratch” mindset, Markus noted; it’s more relying on models that are “interchangeable and selectable” and “never rebuilding a commodity.” As functionality matures across the industry, they’ll deprecate homegrown tools in lieu of off the shelf options, he explained.
“Because in this space, things change every week, if we’re lucky, sometimes multiple times a week,” he said. “We need to be able to pilot, plug in and plug out different components.”
They do “really rigorous” evaluations of available options as well as their own; for instance, their Ask Data with Relational Knowledge Graph has topped the Spider 2.0 text to SQL accuracy leaderboard, and other tools have scored highly on the BERT SQL benchmark.
In the case of homegrown agentic tools, his team uses LangChain as a core framework, fine-tunes models with standard retrieval-augmented generation (RAG) and other in-house algorithms, and partners closely with Microsoft, using the tech giant’s search functionality for their vector store.
Ultimately, though, it’s important not to just fuse agentic AI or other advanced tools into everything for the sake of it, Markus advised. “Sometimes we over complicate things,” he said. “Sometimes I’ve seen a solution over engineered.”
Instead, builders should ask themselves whether a given tool actually needs to be agentic. This could include questions like: What accuracy level could be achieved if it was a simpler, single-turn generative solution? How could they break it down into smaller pieces where each piece could be delivered “way more accurately”?, as Markus put it.
Accuracy, cost and tool responsiveness should be core principles. “Even as the solutions have gotten more complicated, those three pretty basic principles still give us a lot of direction,” he said.
Ask AT&T Workflows has been rolled out to 100,000-plus employees. More than half say they use it every day, and active adopters report productivity gains as high as 90%, Markus said.
“We’re looking at, are they using the system repeatedly? Because stickiness is a good indicator of success,” he said.
The agent builder offers “two journeys” for employees. One is pro-code, where users can program Python behind the scenes, dictating rules for how agents should work. The other is no-code, featuring a drag-and-drop visual interface for a “pretty light user experience,” Markus said.
Interestingly, even proficient users are gravitating toward the latter option. At a recent hackathon geared to a technical audience, participants were given a choice of both, and more than half chose low code. “This was a surprise to us, because these people were all very competent in the programming aspect,” Markus said.
Employees are using agents across a variety of functions; for instance, a network engineer may build a series of them to address alerts and reconnect customers when they lose connectivity. In this scenario, one agent can correlate telemetry to identify the network issue and its location, pull change logs and check for known issues. Then, it can open a trouble ticket.
Another agent could then come up with ways to solve the issue and even write new code to patch it. Once the problem is resolved, a third agent can then write up a summary with preventative measures for the future.
“The [human] engineer would watch over all of it, making sure the agents are performing as expected and taking the right actions,” Markus said.
That same engineering discipline — breaking work into smaller, purpose-built pieces — is now reshaping how AT&T writes code itself, through what Markus calls “AI-fueled coding.”
He compared the process to RAG; devs use agile coding methods in an integrated development environment (IDE) along with “function-specific” build archetypes that dictates how code should interact.
The output is not loose code; the code is “very close to production grade,” and could reach that quality in one turn. “We’ve all worked with vibe coding, where we have an agentic kind of code editor,” Markus noted. But AI-fueled coding “eliminates a lot of the back and forth iterations that you might see in vibe coding.”
He sees this coding technique as “tangibly redefining” the software development cycle, ultimately shortening development timelines and increasing output of production-grade code. Non-technical teams can also get in on the action, using plain language prompts to build software prototypes.
His team, for instance, has used the technique to build an internal curated data product in 20 minutes; without AI, building it would have taken six weeks. “We develop software with it, modify software with it, do data science with it, do data analytics with it, do data engineering with it,” Markus said. “So it’s a game changer.”
ServiceNow is handling 90% of its own employee IT requests autonomously, resolving cases 99% faster than human agents. On Thursday it announced the product technology it wants to use to do the same for everyone else.
Organizations have spent three years running pilots that stall when AI gets to the execution layer. The agent can identify the problem and recommend a fix, then hand it back to a human because it lacks the permissions to finish the job or because no one trusts it to act autonomously inside a governed environment.
The gap most teams are hitting isn’t capability. It’s governance and workflow continuity.
ServiceNow’s answer is a new framework called Autonomous Workforce; a new employee-facing product called EmployeeWorks built on its December acquisition of Moveworks; and an underlying architectural approach it calls “role automation.”
ServiceNow has been building toward this for two decades. The platform started as a ticketing system, evolved into a workflow automation engine, and spent the last two years layering AI onto that foundation through its Now Assist product.
What’s different is that the new approach stops treating AI as a feature sitting on top of workflows and starts treating it as a worker operating inside them. That shift, from AI that assists to AI that executes, is where the broader enterprise market is headed. ServiceNow is making a specific architectural bet about how to get there.
The announcement has three parts: ServiceNow EmployeeWorks lets employees describe a problem in plain language and have it fixed without filing a ticket; Autonomous Workforce executes work end to end; and role automation is the architectural layer that governs how those specialists operate inside existing enterprise permissions.
Most enterprise AI assistants including Microsoft Copilot and Google Gemini require employees to know which tool handles which problem. Moveworks, which had 5.5 million enterprise users before the December acquisition, was built around a single entry point that routes across that ambiguity automatically.
Bhavin Shah, founder of Moveworks and now SVP at ServiceNow following the acquisition, framed the problem directly in a briefing with press and analysts.
“Over the last two years, organizations have raced to adopt AI, but in many cases that rush has created fragmented tools, disconnected AI experiences and employees bouncing between systems just to get simple things done,” he said.
ServiceNow is proposing a new architectural layer it calls role automation, and it differs from the agents most enterprises are already running.
Conventional AI agents are task-oriented: they’re given a goal, they reason toward it and in doing so they figure out what they’re allowed to do at runtime. That creates problems in enterprise environments where governance, audit trails and permission boundaries aren’t optional.
With role automation, an AI specialist does not reason its way into permissions. It inherits them. The same access control framework, CMDB(configuration management database) context, SLA (service level agreement) logic and entitlement rules that govern human workers on the ServiceNow platform govern the AI specialist from the moment it is deployed. It cannot exceed its defined scope. It cannot self-escalate privileges based on what it learns mid-task.
The company draws a three-tier distinction: task agents handle individual automation steps, agentic workflows mix deterministic and probabilistic execution, and role automation sits above both as a fully virtualized employee role with defined responsibilities and pre-inherited governance.
The first product built on this architecture, the Level 1 Service Desk AI Specialist, handles common IT requests end to end — password resets, software access provisioning and network troubleshooting — documenting each resolution and escalating to a human agent only when it hits something outside its defined scope.
Alan Rosa has seen what happens when AI governance fails in healthcare. As CISO and SVP of infrastructure and operations at CVS Health, he manages AI deployment across 300,000 employees where compliance isn’t optional.
Speaking at the same briefing, his framework for scaling AI maps directly onto what ServiceNow is claiming architecturally. CVS Health was already a customer of both ServiceNow and Moveworks before the December acquisition. Rosa said the combination of the two platforms is encouraging and that the potential is “coming to life,” though CVS Health has not committed publicly to deploying Autonomous Workforce.
“Boring is beautiful,” Rosa said. “Predictable. Stable. You have to start with responsible, explainable AI. No bias, no hallucinations, clear guardrails. Everyone understands the rules.”
On the temptation to chase the newest AI capabilities before governance is in place, he was direct: “Don’t chase butterflies. Focus on gritty, unsexy, operational use cases. The ones with real ROI that have an impact on people’s lives.”
Rosa’s approach treats AI as a continuously evolving set of capabilities requiring dynamic rather than static testing. CVS Health runs every AI use case through clinical, legal, privacy and security review before it touches production.
“Static review doesn’t cut it when AI is learning and adapting,” he said. “Wash, rinse, repeat.”
Rosa’s framework requires governance to be embedded in the deployment architecture from the start, not retrofitted after a problem surfaces. That is precisely the claim ServiceNow is making about role automation. AI specialists that inherit existing enterprise permissions and workflow logic are structurally less likely to break governance boundaries than agents that determine their own scope at runtime.
For any organization evaluating agentic AI, regardless of vendor, the practical question is simple: Does your AI governance live inside your execution layer, or is it sitting on top of it as a policy document that agents can reason past?
That is what ServiceNow is trying to solve with Autonomous Workforce and EmployeeWorks, baking governance and workflow context directly into the agentic layer rather than bolting it on afterward. For practitioners, the starting point is governance architecture, not capability. Before deploying any agentic AI, map where your permissions, workflow logic and audit requirements actually live. If that foundation isn’t in place, no agent framework will hold at enterprise scale.
“Scale and trust go together,” Rosa said. “If you lose trust, you lose the right to scale.”
Perplexity, the AI-powered search company valued at $20 billion, on Wednesday launched what it calls the most ambitious product in its three-year history: a multi-model agent orchestration platform called Computer that coordinates 19 different AI models to complete complex, long-running workflows entirely in the background.
The product, currently available only to Perplexity Max subscribers at $200 per month, is the company’s clearest articulation yet of a thesis it has been refining for more than a year: that AI models are not converging into general-purpose commodities but are instead specializing — and that the company best positioned to win the next era of AI is the one that can orchestrate all of them together.
“What has Perplexity been up to last two months? We’ve silently been working on the next big thing,” CEO Aravind Srinivas wrote on X, announcing that “Computer unifies every current capability of AI into a single system.” Srinivas said the system treats models as interchangeable tools rather than core products. “It’s multi-model by design,” he wrote. “When models specialise, they just become tools similar to the file system, CLI tools, connectors, browser, search.”
Computer arrives at a moment when the AI industry is grappling with a fundamental question: now that foundation models have become extraordinarily capable, who captures the value? The model makers — OpenAI, Anthropic, Google — or the companies that sit above them and turn raw intelligence into reliable, accurate products?
Perplexity is making a $20 billion bet on the latter.
At its core, Computer functions as what Perplexity describes as “a general-purpose digital worker” — a system that can accept a high-level objective from a user, decompose it into subtasks, and delegate those subtasks to whichever AI model is best suited for each one. The Verge described it as existing “somewhere between OpenClaw and Claude Cowork,” referring to the viral open-source autonomous agent and Anthropic’s enterprise collaboration tool, respectively.
The platform’s central reasoning engine runs on Anthropic’s Claude Opus 4.6, which handles orchestration logic and coding tasks. Google’s Gemini powers deep research queries. Google’s Nano Banana generates images, and Veo 3.1 handles video. xAI’s Grok is deployed for lightweight, speed-sensitive tasks. OpenAI’s GPT-5.2 manages long-context recall and expansive web search. In total, the system coordinates 19 models on the backend, according to the company.
That model roster is not fixed. Perplexity says new models can be added as they demonstrate strength in specific domains, and the existing lineup will shift as models evolve. Users can also step into the orchestrator role themselves, manually assigning subtasks to particular models if they prefer.
What makes Computer distinct from existing agent tools is its combination of scope and accessibility. A user can describe a desired outcome — say, “Plan a weeklong trip to Japan, find flights under $1,200, and build a full itinerary with restaurant reservations” — and Computer will autonomously break that project into components, assign each to the right model, and work on it in the background. Perplexity says the system can operate quietly for extended periods, checking in with the user only when it genuinely needs input.
The intellectual foundation of Computer rests on data that Perplexity has been collecting across its enterprise customer base — data that, according to the company, no other AI company has access to at the same scale.
At a recent press briefing that VentureBeat attended with other reporters in San Francisco, Perplexity executives shared enterprise usage statistics that illustrated a dramatic shift in how businesses use AI models. In January 2025, more than 90 percent of enterprise tasks on the Perplexity platform were spread across just two models. By December 2025, no single model commanded more than 25 percent of usage across businesses and task types.
That shift, executives said, was driven partly by increasingly intelligent model routing on Perplexity’s side, and partly by a simple reality: models are getting better at different things, not the same things. A new frontier model emerged on average every 17.5 days in 2025, and each one brought distinct strengths rather than uniform improvement.
Claude, for instance, has emerged as the model of choice for software engineering tasks — a reputation so strong that even OpenClaw, the viral autonomous agent created by Austrian programmer Peter Steinberger (who was subsequently hired by OpenAI), was originally built on Claude’s code capabilities. But Claude’s strengths in coding do not translate to writing or creative generation, where Gemini tends to outperform. And in long-context retrieval and broad web search, GPT-5.2 holds advantages.
“What we’ve learned in this time is that they are not commoditizing. They’re specializing,” a senior Perplexity executive said at the briefing, characterizing Claude Opus 4.6 as “a terrible writer” while noting its coding prowess, and adding: “Everybody has job security on that one.”
This specialization dynamic creates what Perplexity sees as a structural advantage. A marketing team using Claude, executives argued, will generally produce worse results than one using Gemini. An engineering team using Gemini will underperform one using Claude. No company operates with only one type of team — and no single model can serve all of them equally well.
Computer’s launch arrives in the immediate wake of OpenClaw, the open-source autonomous agent that went viral earlier this month and prompted OpenAI to hire its creator. OpenClaw captured the imagination of the AI community by demonstrating what a fully autonomous agent could accomplish when given broad access to a user’s entire digital ecosystem — files, email, messaging apps, API keys, and more.
But it also demonstrated the risks. In a widely shared incident this week, Meta AI security researcher Summer Yue posted screenshots on X of her frantic attempts to stop OpenClaw from deleting her entire email inbox — a process the agent had initiated and was refusing to halt. “I had to RUN to my Mac Mini like I was diffusing a bomb,” Yue wrote.
Perplexity has been vocal about why Computer runs entirely in the cloud rather than accessing a user’s local machine — an approach taken by rivals like Anthropic’s Claude and OpenAI’s Operator.
The company argues that local access creates unnecessary risk, comparing it to malware in how easily it can damage data or expose sensitive information. Computer instead operates inside what Perplexity describes as a safe and secure development sandbox, meaning security failures are contained and cannot spread to a user’s primary network or device. The company also said it has run thousands of tasks internally using Computer, from publishing web copy to building apps.
The distinction extends to accessibility. Where OpenClaw requires terminal access, API key configuration, and a dedicated machine (typically a Mac mini), Computer is designed to be invoked from a phone, a Slack message, or the Perplexity app.
At the press briefing, executives elaborated on the philosophy, positioning Computer’s browser agent capabilities — built on Perplexity’s Comet browser technology — as central to the product. One executive noted that Perplexity’s browser agent usage numbers are three to five times higher than ChatGPT’s agent numbers published by The Information in January, despite Perplexity’s much smaller user base.
Perplexity’s product ambitions are backed by a business that, by the company’s own metrics, is growing faster than its user base — and executives say the company has barely begun to focus on monetization.
At the press briefing, executives disclosed that Perplexity grew users by 3.7x in 2025 and revenue by 4.7x, meaning the company is extracting more value from its existing users over time. Consumer subscriptions remain the largest revenue component, but the enterprise business is ramping with what executives acknowledged is a remarkably lean operation.
“We only have five people on our enterprise sales team,” one executive said, before adding that the company’s revenue per employee working on deals may be unmatched in the industry. Another executive noted that 92 percent of the Fortune 500 have Perplexity usage — though that figure encompasses employees signing up with personal accounts and work email addresses for the consumer version, not necessarily formal enterprise contracts.
A common enterprise sales conversation, executives said, starts with: “Did you know that there’s already 3,000 of your employees using Perplexity, and they’re using the consumer version that doesn’t adhere to all of your security policies?”
Notably, Perplexity is not pursuing advertising revenue, even as competitors like OpenAI move toward ad-supported models. Executives said advertising is fundamentally misaligned with the company’s accuracy mission. “The challenge with ads is, you know, a user will just start doubting everything,” one executive said. The company confirmed it has taken no economics on its shopping integrations and expressed doubt that any shopping-based monetization would materialize this year.
On the question of an IPO, Srinivas indicated the company has “very good properties of a company that can go public” given its low capital expenditure and healthy margins, but stopped short of committing to a timeline. Another executive warned that “a lot of IPO talk is hype” and that “if you over promise and under deliver the market punches you severely.”
TestingCatalog also reported this week that a new “Usage and Credits” settings area has appeared in Perplexity’s development builds, which would let users purchase additional credits to extend usage — potentially easing backlash from subscribers who saw their Deep Research query limits cut from roughly 500 per day to as few as 20 per month between late 2025 and early 2026.
Perhaps the least-discussed but most strategically significant element of Perplexity’s story is its search API business — an infrastructure play that positions the company not just as a consumer product but as a foundational layer for the broader AI ecosystem.
At the press briefing, executives revealed that Perplexity launched its search API approximately four months ago and already has four of the “Mag Seven” — the seven largest technology companies by market capitalization — using it in production at significant scale. “You guys cover the Mag Seven, you know that they don’t turn on a feature in production unless they’ve run rigorous evals and compared it,” one executive told reporters.
This disclosure suggests that the world’s largest technology companies have evaluated Perplexity’s search index against alternatives and concluded it is better optimized for AI-native use cases — a fundamentally different optimization target than Google’s traditional index, which was designed for humans scanning lists of links.
“Everything in our index is optimized, not for a human to see 10 blue links,” one executive explained. “It’s for an AI to be able to take those snippets and consume it in this context window and then reason through it.”
The company also confirmed it has fully independent search infrastructure, no longer relying on any third-party APIs from Google or Bing for its index — a significant departure from its earlier years.
For Chinese open-source models, which Perplexity uses in its orchestration stack, the company runs all inference from its own U.S. data centers, post-training the models for accuracy, removing what executives described as “state-infused propaganda,” and building custom inference kernels. The company open-sourced its methodology for depropagandizing Chinese models for others to use as well.
The search API creates a powerful data flywheel, executives argued: Perplexity can observe which snippets its search ranker surfaces for a given query, then track which of those snippets the LLM actually uses in its final output. That feedback loop makes the next query on a similar topic smarter — an advantage that pure API search businesses like Exa cannot replicate because they lack the consumer product generating user queries and feedback.
Perplexity’s ambitions are not without complications. The company faces active lawsuits from multiple publishers, and the legal landscape grew more contentious this week.
As Business Insider’s Melia Russell reported, Perplexity filed a motion on February 24 in its ongoing legal battle with Dow Jones (publisher of The Wall Street Journal) and the New York Post, alleging that the publishers “cherry-picked” responses from Perplexity’s search engine to support their copyright claims. The company said it identified hundreds of prompts the publishers submitted that “were clear attempts to induce copyright-infringing answers,” including one instance where a user allegedly hit the “retry” button more than 50 times.
At the press briefing, Perplexity executives framed the broader copyright debate in historical terms, noting that waves of lawsuits have accompanied every major technology shift since radio. They expressed confidence that AI companies will ultimately prevail, particularly on the question of whether underlying knowledge — as distinct from unique creative expression — can be freely accessed by AI systems. “Countries have copyright law for one reason: to promote innovation,” one executive said, noting that the law protects unique expression while keeping the underlying knowledge open.
On user agents specifically, executives argued that a user’s AI agent is legally and technologically an extension of the user, not an independent actor. In the Amazon lawsuit, which challenges Perplexity’s ability to act as a purchasing agent on behalf of users, one executive offered a pointed analogy: “What Amazon’s claiming is that you shouldn’t be able to have your personal shopper be employed by you. It needs to be employed by them. They want you to use Rufus.”
Executives also clarified the company’s approach to citations, noting that citing a source like The New York Times (which is currently suing the company) does not necessarily mean Perplexity crawled that publication directly. “We can get the summary of that somewhere else, but we cite, we always try to cite that original source,” one executive said. “So drive that traffic to the New York Times if somebody clicks instead of driving them to a summary.”
Computer’s launch crystallizes a tension that has been building in the AI industry for months. The major model makers — OpenAI, Anthropic, Google — have been racing to build end-to-end products that keep users within their ecosystems. OpenAI’s Codex and ChatGPT, Anthropic’s Claude Code and Cowork, Google’s Gemini — all assume that one model family can handle the full range of user needs.
Perplexity is making the opposite bet: that the future belongs to the orchestration layer, not the model layer. It is a bet with historical parallels. In the early days of cloud computing, the companies that built the best abstraction layers above commodity infrastructure — not the infrastructure providers themselves — often captured outsized value. Perplexity is positioning itself as that abstraction layer for AI.
The risk, of course, is that model makers could restrict API access or degrade service to platform competitors. Srinivas has said he isn’t worried, noting that he received congratulations from Anthropic and Google after Computer’s launch and that model makers benefit when their systems are part of broader workflows. But the AI industry’s history of platform dynamics suggests this détente may not last forever.
For enterprise technology leaders evaluating their AI strategies, Computer raises a practical question: should organizations standardize on a single model provider’s ecosystem, accepting its limitations in exchange for simplicity? Or should they invest in multi-model orchestration, gaining access to the best capabilities across providers at the cost of additional complexity?
Perplexity is betting that as models continue to specialize and the gap between their respective strengths widens, the answer will become obvious. The company’s enterprise usage data — showing a market that went from two-model dominance to no-model dominance in just 12 months — suggests the shift is already underway.
Computer is currently available to Perplexity Max subscribers, with a rollout to Pro and Enterprise users planned in the coming weeks. The company has also announced a developer event on March 11, where it plans to share more details about its search API, ranking embeddings, and the infrastructure powering its orchestration stack.
For years, the “last mile” of digital transformation has been littered with forgotten PDFs and ignored training manuals.
Organizations spend millions on sophisticated software like SAP or Salesforce, only for employees to struggle with basic navigation. Now, as the era of agentic AI arrives, companies face a double-edged sword: they must teach human employees to collaborate with AI, while simultaneously teaching AI agents to navigate the labyrinthine interfaces of the modern enterprise.
One idea that seems to be gaining momentum among AI-forward businesses: using screen recordings and tutorials/walkthroughs of someone performing an enterprise task — be it creating a new ticket or processing an invoice — and training AI to replicate the flow based on the screen capture. Just this week, a startup called Standard Intelligence went viral on X showing an early demo of open-ended version of this for the physical and digital world.
But the truth is, there are already players tackling this problem for the enterprise itself square-on: case-in-point, Guidde, an Israel startup born during the video-centric years of the COVID-19 pandemic, today announced an oversubscribed $50 million Series B funding round led by PSG Equity to address this exact knowledge infrastructure crisis.
Instead of feeding an agent a static PDF manual, Guidde provides high-fidelity “Video Ground Truth”—a rich stream of data captured from real human experts as they navigate complex software.
The investment signals a shift in how the tech industry views documentation—not as a static byproduct of work, but as the critical telemetry needed to train the next generation of autonomous digital agents.
At its core, Guidde is an AI Digital Adoption Platform (ADAP). However, its technological breakthrough lies in what happens behind the scenes during a recording.
Guidde isn’t just recording pixels; it is capturing every click, scroll, and latent interaction with the HTML page—the subtle pauses, the specific scroll depths, and the corrections a human makes when a system lags. This telemetry transforms raw video into a Vision-Language-Action (VLA) training set.
Meanwhile, the platform’s Magic Redaction automatically obscures sensitive data like passwords or credit card numbers during capture, ensuring materials remain secure and HIPAA-aligned.
“Every time you click a button, you drag-and-drop, you scroll, you type, we gather the interaction… all of it, we do cleanse it—there’s no private information,” explained Guidde co-founder and CEO Yoav Einav in an exclusive interview with VentureBeat.
Under the hood, the platform captures the underlying metadata and DOM (Document Object Model) changes synchronized with the video frames. The differentiator is the telemetry hidden beneath the surface.
This rich metadata creates a “digital world model” of enterprise software. And because each enterprise uses its own unique mix of apps and processes, Guidde is creating a data moat that allows enterprise agents to reason through legacy UIs with the same spatial awareness as a human, ensuring that automation actually works in a production environment rather than just a lab demo.
For a human, it’s a tutorial. For an AI agent, it is a high-fidelity map of the interface. This allows agents to “see” and reason through complex UIs the way humans do, solving the “last mile” of automation where agents previously failed due to lack of specific enterprise and in-situ usage context.
In a sense, Guidde is building a “self-driving car” like a Waymo for computer usage.
The platform has evolved into three distinct products designed to scale with an organization’s maturity:
Guidde Create: The engine for subject matter experts to turn workflows into documentation in minutes.
Guidde Broadcast: A personalized recommendation engine—often compared to Netflix—that delivers answers inside the tools people actually use. It knows who the user is and what department they are in to surface relevant content exactly when needed.
Guidde Discover: The newly launched “agentic” pillar. Like Waze mapping roads by observing drivers, Discover maps software routes by tracking how employees work. It understands the workflow, creates the content, and updates it automatically when the UI changes.
The most non-obvious aspect of Guidde’s growth is its dual-purpose mission. “We’re the only platform that trains both humans and agents,” Einav stated.
As companies roll out AI tools like Microsoft 365 Copilot or ServiceNow agents, they hit a proficiency gap. One of Guidde’s largest customers revealed they were paying over $1 million a year for a sophisticated AI tool, yet “nobody knows how to use them because they did like a 30-minute training session, and then that’s it.” Guidde closes this gap by providing “bite-sized” video tutorials in the flow of work.
Simultaneously, these videos train the AI agents themselves. Foundation models like Gemini or GPT-4 often hallucinate when tasked with specific enterprise workflows because they weren’t trained on the highly specific, internal “vanilla workflows” found in private enterprise systems. Guidde provides the “starting point,” the “metadata,” and the “x, y coordinates of the button” that an agent needs to complete an action without getting stuck.
To maintain this level of accuracy, Guidde employs a multimodal infrastructure. The system doesn’t rely on a single model; instead, it uses a “fleet” of models that evaluate one another.
Google Gemini: Generally used for visual tasks like analyzing PDFs or PowerPoints.
Anthropic Claude: Leveraged for writing the storyline and narrative scripts.
Feedback Loops: When a user edits a video, that data is fed back into the model to prevent the same mistakes from occurring in future captures.
This approach allows Guidde to replace a legacy stack of six or seven disconnected tools—Loom for capture, Adobe Premiere for editing, 11Labs for text-to-speech, and Synthesia for avatars—with a single, AI-native platform. “We basically pack everything for you,” Einav says, “and automate the entire process based on your brand guidelines.”
The genesis of Guidde lies in a frustration familiar to any product leader. Before founding the company, Einav and co-founder Dan Sahar spent years mastering video traffic at Qwilt, a company they started in 2010 to analyze how people watched Netflix and Disney+.
When COVID-19 hit, they saw a massive opportunity to apply that video expertise to the workplace. They observed that short video explainers could increase free-to-paid account conversions by 30%, but the friction of creating them was unsustainable.
In an interview, Einav recalled the “tedious work” of the old world: “My team in Israel were creating the content, someone in the US with a US accent was doing the narration, someone in the marketing team would write the script… and someone in the enablement team would do the edit.” This fragmented workflow meant a single video took two to three weeks to produce. “And then two weeks later, the product changes, and you need to redo it from scratch,” Einav added.
Guidde was built to collapse this cycle into seconds. By automating the “Magic Capture” of a workflow, the platform generates a structured narrative script and professional AI voiceover instantly. This removes the editing bottleneck, transforming subject matter experts into “training powerhouses.”
Guidde’s pricing structure reflects its transition from a utility to a core piece of enterprise infrastructure:
Free: $0 (Up to 25 videos, web-app support).
Pro: $18/creator/month (Unlimited videos, brand kits).
Business: $39/creator/month (Unlimited text-to-voice, analytics).
Enterprise: Custom pricing (Multi-language translation, SSO, Magic Redaction).
The platform’s impact is already visible in the numbers: a 41% reduction in video creation time and 34% fewer inbound support tickets.
For customers like Emerson, this translates to 40–60% quicker guide creation. Support teams, in particular, are finding they can offload 80% of their ticket volume with agents—but only if those agents have the content to be useful.
“The agent without the content is useless,” Einav warns, noting that most enterprise documentation is either years out of date or entirely undocumented.
Guidde already claims 4,500 enterprise customers and seeks to expand this number with its new round of funding. Support and operations leaders have been vocal about the platform’s ease of use. Christopher Cummings, VP of Client Experience at DocNetwork, highlighted its ability to provide “quick, personalized video responses to customer questions.”
Meanwhile, Wren Cotrone, a Director of Customer Support, noted that “Once you set the branding the way you want, you can really zoom through this stuff.”
Ronen Nir, Managing Director at PSG, summarized the investment thesis: “Guidde is solving one of the biggest blockers to successful AI adoption: the knowledge infrastructure.”
The paradigm shift from text-only LLMs to agentic video intelligence is the defining trend of 2026. Guidde’s Series B signals that the “ground truth” for enterprise agents will come from raw video observation, not static documentation.
By capturing how work gets done across 10s of millions of workflows, Guidde is building a dataset that few others possess.
As Einav put it: “It starts with humans in the loop, and over time moves toward full autonomy.” For the modern enterprise, the map is no longer a static document—it’s a living, breathing video intelligence layer that guides both the workforce and the agents that support them.
Gong, the revenue intelligence company that has spent a decade turning recorded sales calls into data, today launched what it calls Mission Andromeda — its most ambitious platform release to date, bundling a new AI-powered coaching product, a sales-focused chatbot, unified account management tools, and open interoperability with rival AI systems through the Model Context Protocol.
The release arrives at a pivotal moment. The revenue technology market is consolidating at a pace that would have been unthinkable two years ago, and Gong — still a private company with roughly $300 million in annual recurring revenue — finds itself at the center of a category that Gartner only formally defined three months ago. Mission Andromeda is Gong’s answer to a basic question facing every enterprise AI vendor in 2026: Can you move beyond surfacing insights and actually change how people work?
“The whole show, Andromeda, is basically a collection of very significant capabilities that take us a huge step forward,” Eilon Reshef, Gong’s co-founder and chief product officer, told VentureBeat in an interview ahead of the launch. He described it as an effort to make revenue teams “more productive as individuals” and to give leaders “better decisions” — positioning the release not as a feature dump, but as an operating system upgrade.
Mission Andromeda contains four main components, each targeting a different layer of the sales workflow.
The headliner is Gong Enable, a brand-new product with its own pricing tier — Reshef described it as “in the tens of dollars per seat per month” — that attacks what the company sees as a gaping hole in most sales organizations: the disconnect between training and performance. Highspot and Seismic announced their intent to merge in February 2026, creating a combined enablement giant, and Gong is now moving directly onto their turf.
Gong Enable has three pieces. The first, AI Call Reviewer, analyzes completed customer calls and grades reps based on their organization’s own methodology. When asked whether this operates in real time, Reshef was direct: “For that particular agent, it’s post-call, because obviously you want to grade the whole call as a whole — maybe you didn’t do anything in minute one, minute 30.” The second piece, AI Trainer, lets reps practice high-stakes conversations — pricing objections, renewal risk scenarios — against AI-generated simulations built from the company’s own winning call patterns. The third, Initiative Tracking, links coaching programs to revenue metrics so leaders can see whether new behaviors actually show up in live deals.
Beyond Enable, the launch includes Gong Assistant, a conversational AI chatbot purpose-built for revenue teams that lets users ask questions about customer calls inside the platform. The release also introduces Account Console and Account Boards, which unify customer activity, risk signals, and next steps into a single view for sales and post-sales teams. And rounding out the package is built-in support for the Model Context Protocol, the open standard originally developed by Anthropic, enabling Gong to exchange data with AI systems from Microsoft, Salesforce, HubSpot, and others.
In a market where every company wants to claim proprietary AI supremacy, Reshef described a notably pragmatic approach to the models powering the new features. Gong uses both internal models and foundation models from external providers, he said, noting that “four out of the five leading AI companies, LLM, are basically Gong customers.”
The company picks models task by task. “Based on the product or task at hand, we pick the right model,” he said. “We would sometimes swap in and out a model if we feel it’s best for our customers and they get more and more power.” Reshef drew a clear line between what needs a large language model and what does not: “Our revenue prediction models are not using LLMs, but kind of the core interaction chatbots — of course, you’re going to use the foundation model.”
This approach contrasts with competitors that have hitched their wagon to a single AI provider. It also reflects a philosophical choice: Gong’s real moat, Reshef suggested, is not the models themselves but the data underneath — what the company calls the Revenue Graph, its proprietary layer that captures phone calls, Zoom meetings, emails, text messages, WhatsApp conversations, and more, stitching them together into a connected intelligence layer.
Storing and analyzing every customer conversation a sales team has raises obvious questions about privacy and data governance. Reshef was eager to address them head-on.
“We’ve been around the block for a long while — a little bit over a decade — with AI first,” he said. “Over the years, we’ve developed exactly those capabilities that are the most boring pieces of AI, which is: how do you collect the right data? How do you manage it? How do you manage permissions about it, retention policies, right to be forgotten?”
On the sensitive question of whether Gong trains its AI on customer data across accounts, Reshef drew a firm boundary. Training, he explained, happens per customer: “The majority of the training happens based on each customer’s data.” He pointed to large accounts like Cisco, which he said has 20,000 Gong users — enough data to train the AI Trainer from within their own environment. “AI Trainer can go mine what’s working in their environment. It might not work in their competitor’s environment — maybe their benefits are different, their objections are different.”
Cross-customer training, he said, happens “only in very, very rare cases, very safe based — like transcription. But we don’t do it for business-specific processes.”
Gong’s support for Model Context Protocol is perhaps the most strategically significant piece of the launch. The company now offers built-in client and server support for MCP, enabling organizations to connect Gong with other AI systems while maintaining clear controls over data access, usage, and provenance. Gong first announced MCP support in October 2025 at its Celebrate conference, where it revealed initial integrations with Microsoft Dynamics 365, Microsoft 365 Copilot, Salesforce Agentforce, and HubSpot CRM. Today’s launch builds on that foundation.
But Reshef did not sugarcoat MCP’s limitations. “MCP is very immature when it comes to security,” he told VentureBeat. The protocol lets enterprise AI systems share data and context, but trust remains the enterprise’s responsibility. He explained a two-sided model: Gong can pull data from partners like Zendesk through certified integrations, and simultaneously makes its own MCP server available so that tools like Microsoft Copilot can query Gong’s data. “It’s up to the company which connections they actually feel are secure enough,” he said. “The safest ones are the ones that we’ve kind of like certified in a way. But MCP is an open protocol. They can connect it to their own systems. We have no control over this.”
That candor matters. As MCP adoption accelerates across the enterprise software stack, security teams are scrambling to understand what happens when agentic AI systems start talking to each other without humans in the loop. Gong appears to be betting that transparency about the protocol’s immaturity will build more trust than marketing bravado.
When asked for hard numbers, Reshef offered a mix of platform-wide results and measured candor about the newest features. Existing Gong customers report roughly a 50 percent reduction in sales rep ramp time and 10 to 15 percent improvements in win rates, he said.
But on Gong Enable specifically, he acknowledged the product is still brand new. “The trainer has been in the market for literally, you know, days, a week,” he said. “I would probably lie to you if I said, ‘Hey, we’re already seeing people crushing it after taking three or four courses.'” For the earlier version of Enable that includes the AI Call Reviewer, however, he said customers are “definitely seeing a very high kind of skill improvement” and are attributing increases in win rates and quota attainment to those gains — though he conceded that “it’s always hard to do 100 percent attribution.”
Morningstar, one of Gong’s early adopters, offered a pre-launch endorsement. Rae Cheney, Director of Sales Enablement Technology at Morningstar, said in a statement that Gong Enable helped the firm “spend less time on status updates and more time on the work that actually moves deals.”
One of the more interesting threads in Reshef’s remarks concerned his view of AI autonomy — or rather, its limits. He pushed back on what he called a “common misperception about AI” — that it operates completely autonomously.
“There has to be a person in the middle, which I call operator,” he said. “It could be RevOps. It could be enablement. In the case of training, it could be analysts. Sometimes it could be even business leaders.” Those operators, he argued, are responsible for a “repeatable process of AI doing something, measuring the AI” and adjusting over time.
This philosophy extends to the AI Call Reviewer’s feedback. Gong does not dictate what the system trains on — enablement leaders choose. “We don’t decide what they want to train on. We let them choose,” Reshef said. “You iterate, you optimize, you see how it goes, and there has to be somebody in the organization who’s responsible for making sure this aligns with the business needs.”
That stance puts Gong at odds with the more aggressive “autonomous agent” rhetoric emerging from some competitors, and it may resonate with enterprise buyers who remain cautious about letting AI run unsupervised in revenue-critical workflows.
Mission Andromeda does not exist in a vacuum. The revenue AI landscape has been reshaped by a remarkable wave of consolidation over the past six months.
In a category-defining move, Clari and Salesloft merged in December 2025 to form what they called a “Revenue AI powerhouse,” combining roughly $450 million in ARR under new CEO Steve Cox. Just two weeks ago, Highspot and Seismic signed a definitive agreement to merge, creating a combined entity worth more than $6 billion focused on AI-powered sales enablement — the very same territory Gong is now invading with Enable.
Meanwhile, Gong was named a Leader in the inaugural 2025 Gartner Magic Quadrant for Revenue Action Orchestration, published in December. The company placed highest among the 12 vendors evaluated on both the “Ability to Execute” and “Completeness of Vision” axes and ranked first in all four evaluated use cases in Gartner’s companion Critical Capabilities report.
In his interview, Reshef did not name competitors directly, but he drew a clear contrast. “We’ve built a product from the ground up. It’s all organic,” he said. “All of the other players in the field have sort of stitched together tools. And obviously you can’t just get it to be a coherent product if you just stitch together tools. Some of them even have multiple logins.” That is a thinly veiled shot at the merged Clari-Salesloft entity, which Forrester has described as presenting a “bifurcated approach” — Salesloft serving frontline users while Clari supports management insights.
Reshef also pointed to growth as a competitive weapon. “We’re growing at the top deck side in terms of SaaS companies,” he said, adding that Gong hired roughly 200 R&D employees this year and plans to hire another 200. “It’s kind of a flywheel where we can invest more in R&D, we make the product better, we get more capabilities, more flexibility, more enterprise customers.”
Any discussion of Gong’s trajectory inevitably raises the question of a public offering. When asked directly, Reshef declined to comment: “I wouldn’t comment on IPO at this stage. No.”
The company has been on a clear growth arc. Gong has raised approximately $584 million to date, with its last official funding round valuing it at $7.25 billion — the culmination of a series of rapid jumps from $750 million in 2019 to $2.2 billion in 2020. The company reached an annual sales run rate of approximately $300 million in January 2025, driven largely by the adoption of AI, according to Calcalist.
But that valuation has since slipped. As Calcalist reported in November 2025, Gong is conducting a secondary round for company employees and investors at a valuation of roughly $4.5 billion — well below its 2021 peak. The offering is being conducted through Nasdaq’s private market platform and was in advanced stages at the time of the report. It is not yet clear whether the company has repriced employee options, some of which were issued at significantly higher valuations than the current secondary round. Gong told Calcalist that it “regularly receives inquiries from potential investors” but as a private company does not “engage in speculation.”
The structured quarterly launch cadence that Mission Andromeda inaugurates — complete with galactic naming conventions and coordinated product narratives — certainly resembles the kind of predictable, story-driven approach that public market investors reward. Reshef framed it differently: “We felt like having quarterly launches with a name, a mission, and a story around it makes it easier to work… It’s a good way to educate the market on a regular basis.”
Reshef’s most revealing comment came when he laid out the company’s long-term thesis: Gong aims to increase productivity for revenue professionals by 50 percent. “We’re not there yet,” he admitted. “I think we’re like at 20 to 30 — whatever, hard to measure.”
He broke the productivity gain into two categories. The first is making high-complexity human tasks — like conducting a live Zoom sales call — better, through coaching, training, and review. “I think there’s going to be a long while, if ever, that Zoom conversations are going to get replaced by bots,” he said. The second is automating the manual drudgery: call preparation, post-meeting summaries, follow-up emails, account research briefs.
The distinction matters because it frames Gong’s ambition not as replacing salespeople but as making them dramatically more effective — a message calibrated to appeal to the thousands of revenue leaders who control Gong’s buying decisions. Whether that thesis holds will depend on whether Gong Enable, the AI Trainer, and the rest of Mission Andromeda can deliver measurable gains in a market that has been burned before by tools that promise insight but struggle to change behavior.
Gong currently serves more than 5,000 companies worldwide. The Clari-Salesloft merger has produced a rival with deeper combined resources. The Highspot-Seismic combination is assembling a sales enablement colossus. And a new Gartner category means every enterprise buyer now has a framework for comparison shopping. The next twelve months will test whether Mission Andromeda is the release that cements Gong’s position at the center of the revenue AI category — or the last big swing before the consolidated giants close in.
“Our mission is to be at the forefront,” Reshef said. “If everybody else is doing 20 percent, we’re going to do 50. If everybody is going to do 50, we’re going to do 80.”
In the revenue AI wars, that kind of confidence is easy to project. Delivering on it, with brand-new products still days old and a market being remade around you in real time, is something else entirely.
Claude Code has become increasingly popular in the first year since its launch, and especially in recent months, as developers and non-technical users alike flock to AI unicorn Anthropic’s hit coding agent to create full applications and websites in days, on their own, that would’ve taken months and technical teams without. It’s not a stretch to say it helped spur the “vibe coding” boom — using plain English instead of programming languages to write software.
But it’s all been restricted to the desktop Claude Code apps and Terminal command-line interfaces and integrated development environments (IDEs) — until today. Now, Anthropic has added a new mode, Remote Control, that lets users issue commands to Claude Code from their iPhone and Android smartphones — starting with subscribers to Anthropic’s Claude Max ($100-$200 USD monthly) subscription tier.
Anthropic posted on X saying Remote Control will also make its way to Claude Pro ($20 USD monthly) subscribers in the future.
Announced earlier today by Claude Code Product Manager Noah Zweben, Remote Control is a synchronization layer that bridges local CLI environments with the Claude mobile app and web interface.
The feature allows developers to initiate a complex task in their terminal and maintain full control of it from a phone or tablet, effectively decoupling the AI agent from the physical workstation.
Currently, Remote Control is available as a Research Preview for subscribers on the Claude Max tier. While access for Claude Pro ($20/month) users is expected shortly, the feature remains a high-end tool for power users and is notably absent from Team or Enterprise plans during this initial phase.
To access the feature, users must follow this guide and update to Claude version 2.1.52 and execute the command claude remote-control or use the in-session slash command /rc. Once active, the terminal displays a QR code that, when scanned, opens a responsive, synchronized session in the Claude mobile app.
The messaging behind the release centers on the preservation of a developer’s “flow state.”
In his announcement, Zweben framed the update as a lifestyle upgrade rather than just a technical one, encouraging users to “take a walk, see the sun, walk your dog without losing your flow.”
This “Remote Control” is not a cloud-based replacement for local development, but a portal into it. According to official documentation, the core value is that “Claude keeps running on your machine, and you can control the session from the Claude app.”
This ensures that local context—filesystem access, environment variables, and Model Context Protocol (MCP)servers—remains active and reachable even if the user is miles away from their desk.
Claude Code Remote Control functions as a secure bridge between your local terminal and Anthropic’s cloud interface, which provides the Anthropic AI models, Opus 4.6 and Sonnet 4.6, that power Claude Code.
When you run the command, your desktop machine initiates an outbound connection to Anthropic’s API for serving the models — meaning you aren’t opening any “inbound” ports or exposing your computer to the open web. Instead, your local machine polls the API for instructions.
When you visit the session URL or use the Claude app, you are essentially using those devices as a “remote window” to view and command the process still running on your computer. Your files and MCP servers never leave your machine; only the chat messages and tool results flow through the encrypted bridge.
To get started, ensure you are on a Pro or Max plan and have authenticated your CLI using the /login command. Simply navigate to your project directory and run claude remote-control to initialize the session. The terminal will then generate a unique session URL and a QR code (toggleable via the spacebar) for your mobile device.
Once you open that link on your phone, tablet, or another browser, the two surfaces stay in perfect sync—allowing you to start a task at your desk and continue it from the couch while maintaining full access to your local filesystem and project configuration.
Prior to this official release, the developer community went to great lengths to “hack” mobile access into their terminal-based workflows.
Power users frequently relied on a patchwork of third-party tools like Tailscale for secure tunneling, Termius or Termux for mobile SSH access, and Tmux for session persistence.
Some developers even built complex custom WebSocket bridges just to get a responsive mobile UI for their local Claude sessions.
These unofficial solutions, while functional, were often brittle and prone to timeout issues. Remote Control replaces these workarounds with a native streaming connection that requires no port forwarding or complex VPN configurations.
It also includes automatic reconnection logic: if a user’s laptop sleeps or the network drops, the session remains alive in the background and reconnects as soon as the host machine is back online.
The launch of Remote Control serves as an “escalation of force” in what has become a dominant business for Anthropic. As of February 2026, Claude Code has hit a $2.5 billion annualized run rate — a figure that has more than doubled since the start of the year alone.
Claude Code is currently experiencing its “ChatGPT moment,” surging to 29 million daily installs within Visual Studio Code. Its efficiency is no longer theoretical; recent analysis suggests that 4% of all public GitHub commits worldwide are now authored by Claude Code.
By extending this power to mobile, Anthropic is further entrenching its lead in the “agentic” coding space, moving beyond simple autocomplete to a world where the AI acts as an autonomous collaborator.
The move toward mobile terminal control signals a broader shift in the software market. We are entering an era where AI tools are writing roughly 41% of all code. For developers, this translates to a migration from “line-by-line” typing to “strategic oversight.”
This trend is likely to accelerate as mobile-tethered agents become the norm. The barrier between “idea” and “production” is collapsing, enabling a single developer to manage complex systems that previously required entire DevOps teams. This shift has already rattled the broader tech market; shares of major cybersecurity firms like CrowdStrike and Datadog fell as much as 11% following the launch of Claude Code’s automated security scanning features.
As Claude Code moves from the desk to the pocket, the definition of a “software engineer” is being rewritten. In the coming year, the industry may see a surge in “one-person unicorns”—startups built and maintained almost entirely via mobile agentic commands—marking the end of the manual coding era as we knew it.