admin.codes » Category » Orchestration

Visual imitation learning: Guidde trains AI agents on human ‘expert video’ instead of documentation

For years, the “last mile” of digital transformation has been littered with forgotten PDFs and ignored training manuals.

Organizations spend millions on sophisticated software like SAP or Salesforce, only for employees to struggle with basic navigation. Now, as the era of agentic AI arrives, companies face a double-edged sword: they must teach human employees to collaborate with AI, while simultaneously teaching AI agents to navigate the labyrinthine interfaces of the modern enterprise.

One idea that seems to be gaining momentum among AI-forward businesses: using screen recordings and tutorials/walkthroughs of someone performing an enterprise task — be it creating a new ticket or processing an invoice — and training AI to replicate the flow based on the screen capture. Just this week, a startup called Standard Intelligence went viral on X showing an early demo of open-ended version of this for the physical and digital world.

But the truth is, there are already players tackling this problem for the enterprise itself square-on: case-in-point, Guidde, an Israel startup born during the video-centric years of the COVID-19 pandemic, today announced an oversubscribed $50 million Series B funding round led by PSG Equity to address this exact knowledge infrastructure crisis.

Instead of feeding an agent a static PDF manual, Guidde provides high-fidelity “Video Ground Truth”—a rich stream of data captured from real human experts as they navigate complex software.

The investment signals a shift in how the tech industry views documentation—not as a static byproduct of work, but as the critical telemetry needed to train the next generation of autonomous digital agents.

Technology: from video capture to world models

At its core, Guidde is an AI Digital Adoption Platform (ADAP). However, its technological breakthrough lies in what happens behind the scenes during a recording.

Guidde isn’t just recording pixels; it is capturing every click, scroll, and latent interaction with the HTML page—the subtle pauses, the specific scroll depths, and the corrections a human makes when a system lags. This telemetry transforms raw video into a Vision-Language-Action (VLA) training set.

Meanwhile, the platform’s Magic Redaction automatically obscures sensitive data like passwords or credit card numbers during capture, ensuring materials remain secure and HIPAA-aligned.

“Every time you click a button, you drag-and-drop, you scroll, you type, we gather the interaction… all of it, we do cleanse it—there’s no private information,” explained Guidde co-founder and CEO Yoav Einav in an exclusive interview with VentureBeat.

Under the hood, the platform captures the underlying metadata and DOM (Document Object Model) changes synchronized with the video frames. The differentiator is the telemetry hidden beneath the surface.

This rich metadata creates a “digital world model” of enterprise software. And because each enterprise uses its own unique mix of apps and processes, Guidde is creating a data moat that allows enterprise agents to reason through legacy UIs with the same spatial awareness as a human, ensuring that automation actually works in a production environment rather than just a lab demo.

For a human, it’s a tutorial. For an AI agent, it is a high-fidelity map of the interface. This allows agents to “see” and reason through complex UIs the way humans do, solving the “last mile” of automation where agents previously failed due to lack of specific enterprise and in-situ usage context.

In a sense, Guidde is building a “self-driving car” like a Waymo for computer usage.

Product: three pillars of Guidd-ance

The platform has evolved into three distinct products designed to scale with an organization’s maturity:

Guidde Create: The engine for subject matter experts to turn workflows into documentation in minutes.
Guidde Broadcast: A personalized recommendation engine—often compared to Netflix—that delivers answers inside the tools people actually use. It knows who the user is and what department they are in to surface relevant content exactly when needed.
Guidde Discover: The newly launched “agentic” pillar. Like Waze mapping roads by observing drivers, Discover maps software routes by tracking how employees work. It understands the workflow, creates the content, and updates it automatically when the UI changes.

Training humans how to use AI — and AI using humans

The most non-obvious aspect of Guidde’s growth is its dual-purpose mission. “We’re the only platform that trains both humans and agents,” Einav stated.

As companies roll out AI tools like Microsoft 365 Copilot or ServiceNow agents, they hit a proficiency gap. One of Guidde’s largest customers revealed they were paying over $1 million a year for a sophisticated AI tool, yet “nobody knows how to use them because they did like a 30-minute training session, and then that’s it.” Guidde closes this gap by providing “bite-sized” video tutorials in the flow of work.

Simultaneously, these videos train the AI agents themselves. Foundation models like Gemini or GPT-4 often hallucinate when tasked with specific enterprise workflows because they weren’t trained on the highly specific, internal “vanilla workflows” found in private enterprise systems. Guidde provides the “starting point,” the “metadata,” and the “x, y coordinates of the button” that an agent needs to complete an action without getting stuck.

The multimodal advantage

To maintain this level of accuracy, Guidde employs a multimodal infrastructure. The system doesn’t rely on a single model; instead, it uses a “fleet” of models that evaluate one another.

Google Gemini: Generally used for visual tasks like analyzing PDFs or PowerPoints.
Anthropic Claude: Leveraged for writing the storyline and narrative scripts.
Feedback Loops: When a user edits a video, that data is fed back into the model to prevent the same mistakes from occurring in future captures.

This approach allows Guidde to replace a legacy stack of six or seven disconnected tools—Loom for capture, Adobe Premiere for editing, 11Labs for text-to-speech, and Synthesia for avatars—with a single, AI-native platform. “We basically pack everything for you,” Einav says, “and automate the entire process based on your brand guidelines.”

Video-first origin story

The genesis of Guidde lies in a frustration familiar to any product leader. Before founding the company, Einav and co-founder Dan Sahar spent years mastering video traffic at Qwilt, a company they started in 2010 to analyze how people watched Netflix and Disney+.

When COVID-19 hit, they saw a massive opportunity to apply that video expertise to the workplace. They observed that short video explainers could increase free-to-paid account conversions by 30%, but the friction of creating them was unsustainable.

In an interview, Einav recalled the “tedious work” of the old world: “My team in Israel were creating the content, someone in the US with a US accent was doing the narration, someone in the marketing team would write the script… and someone in the enablement team would do the edit.” This fragmented workflow meant a single video took two to three weeks to produce. “And then two weeks later, the product changes, and you need to redo it from scratch,” Einav added.

Guidde was built to collapse this cycle into seconds. By automating the “Magic Capture” of a workflow, the platform generates a structured narrative script and professional AI voiceover instantly. This removes the editing bottleneck, transforming subject matter experts into “training powerhouses.”

Licensing and market impact

Guidde’s pricing structure reflects its transition from a utility to a core piece of enterprise infrastructure:

Free: $0 (Up to 25 videos, web-app support).
Pro: $18/creator/month (Unlimited videos, brand kits).
Business: $39/creator/month (Unlimited text-to-voice, analytics).
Enterprise: Custom pricing (Multi-language translation, SSO, Magic Redaction).

The platform’s impact is already visible in the numbers: a 41% reduction in video creation time and 34% fewer inbound support tickets.

For customers like Emerson, this translates to 40–60% quicker guide creation. Support teams, in particular, are finding they can offload 80% of their ticket volume with agents—but only if those agents have the content to be useful.

“The agent without the content is useless,” Einav warns, noting that most enterprise documentation is either years out of date or entirely undocumented.

Community and industry early reception

Guidde already claims 4,500 enterprise customers and seeks to expand this number with its new round of funding. Support and operations leaders have been vocal about the platform’s ease of use. Christopher Cummings, VP of Client Experience at DocNetwork, highlighted its ability to provide “quick, personalized video responses to customer questions.”

Meanwhile, Wren Cotrone, a Director of Customer Support, noted that “Once you set the branding the way you want, you can really zoom through this stuff.”

Ronen Nir, Managing Director at PSG, summarized the investment thesis: “Guidde is solving one of the biggest blockers to successful AI adoption: the knowledge infrastructure.”

Why this matters now

The paradigm shift from text-only LLMs to agentic video intelligence is the defining trend of 2026. Guidde’s Series B signals that the “ground truth” for enterprise agents will come from raw video observation, not static documentation.

By capturing how work gets done across 10s of millions of workflows, Guidde is building a dataset that few others possess.

As Einav put it: “It starts with humans in the loop, and over time moves toward full autonomy.” For the modern enterprise, the map is no longer a static document—it’s a living, breathing video intelligence layer that guides both the workforce and the agents that support them.

Orchestration

Gong launches ‘Mission Andromeda’ with AI sales coaching, chatbot and open MCP connections to rivals

Gong, the revenue intelligence company that has spent a decade turning recorded sales calls into data, today launched what it calls Mission Andromeda — its most ambitious platform release to date, bundling a new AI-powered coaching product, a sales-focused chatbot, unified account management tools, and open interoperability with rival AI systems through the Model Context Protocol.

The release arrives at a pivotal moment. The revenue technology market is consolidating at a pace that would have been unthinkable two years ago, and Gong — still a private company with roughly $300 million in annual recurring revenue — finds itself at the center of a category that Gartner only formally defined three months ago. Mission Andromeda is Gong’s answer to a basic question facing every enterprise AI vendor in 2026: Can you move beyond surfacing insights and actually change how people work?

“The whole show, Andromeda, is basically a collection of very significant capabilities that take us a huge step forward,” Eilon Reshef, Gong’s co-founder and chief product officer, told VentureBeat in an interview ahead of the launch. He described it as an effort to make revenue teams “more productive as individuals” and to give leaders “better decisions” — positioning the release not as a feature dump, but as an operating system upgrade.

The new products target every layer of the sales workflow, from coaching to account management

Mission Andromeda contains four main components, each targeting a different layer of the sales workflow.

The headliner is Gong Enable, a brand-new product with its own pricing tier — Reshef described it as “in the tens of dollars per seat per month” — that attacks what the company sees as a gaping hole in most sales organizations: the disconnect between training and performance. Highspot and Seismic announced their intent to merge in February 2026, creating a combined enablement giant, and Gong is now moving directly onto their turf.

Gong Enable has three pieces. The first, AI Call Reviewer, analyzes completed customer calls and grades reps based on their organization’s own methodology. When asked whether this operates in real time, Reshef was direct: “For that particular agent, it’s post-call, because obviously you want to grade the whole call as a whole — maybe you didn’t do anything in minute one, minute 30.” The second piece, AI Trainer, lets reps practice high-stakes conversations — pricing objections, renewal risk scenarios — against AI-generated simulations built from the company’s own winning call patterns. The third, Initiative Tracking, links coaching programs to revenue metrics so leaders can see whether new behaviors actually show up in live deals.

Beyond Enable, the launch includes Gong Assistant, a conversational AI chatbot purpose-built for revenue teams that lets users ask questions about customer calls inside the platform. The release also introduces Account Console and Account Boards, which unify customer activity, risk signals, and next steps into a single view for sales and post-sales teams. And rounding out the package is built-in support for the Model Context Protocol, the open standard originally developed by Anthropic, enabling Gong to exchange data with AI systems from Microsoft, Salesforce, HubSpot, and others.

Gong uses four different LLM providers and says the real moat is the data, not the models

In a market where every company wants to claim proprietary AI supremacy, Reshef described a notably pragmatic approach to the models powering the new features. Gong uses both internal models and foundation models from external providers, he said, noting that “four out of the five leading AI companies, LLM, are basically Gong customers.”

The company picks models task by task. “Based on the product or task at hand, we pick the right model,” he said. “We would sometimes swap in and out a model if we feel it’s best for our customers and they get more and more power.” Reshef drew a clear line between what needs a large language model and what does not: “Our revenue prediction models are not using LLMs, but kind of the core interaction chatbots — of course, you’re going to use the foundation model.”

This approach contrasts with competitors that have hitched their wagon to a single AI provider. It also reflects a philosophical choice: Gong’s real moat, Reshef suggested, is not the models themselves but the data underneath — what the company calls the Revenue Graph, its proprietary layer that captures phone calls, Zoom meetings, emails, text messages, WhatsApp conversations, and more, stitching them together into a connected intelligence layer.

Recording every sales conversation raises obvious privacy questions, and Gong says it has spent a decade answering them

Storing and analyzing every customer conversation a sales team has raises obvious questions about privacy and data governance. Reshef was eager to address them head-on.

“We’ve been around the block for a long while — a little bit over a decade — with AI first,” he said. “Over the years, we’ve developed exactly those capabilities that are the most boring pieces of AI, which is: how do you collect the right data? How do you manage it? How do you manage permissions about it, retention policies, right to be forgotten?”

On the sensitive question of whether Gong trains its AI on customer data across accounts, Reshef drew a firm boundary. Training, he explained, happens per customer: “The majority of the training happens based on each customer’s data.” He pointed to large accounts like Cisco, which he said has 20,000 Gong users — enough data to train the AI Trainer from within their own environment. “AI Trainer can go mine what’s working in their environment. It might not work in their competitor’s environment — maybe their benefits are different, their objections are different.”

Cross-customer training, he said, happens “only in very, very rare cases, very safe based — like transcription. But we don’t do it for business-specific processes.”

MCP gives Gong an open door to rival platforms, but security remains an unsolved problem across the industry

Gong’s support for Model Context Protocol is perhaps the most strategically significant piece of the launch. The company now offers built-in client and server support for MCP, enabling organizations to connect Gong with other AI systems while maintaining clear controls over data access, usage, and provenance. Gong first announced MCP support in October 2025 at its Celebrate conference, where it revealed initial integrations with Microsoft Dynamics 365, Microsoft 365 Copilot, Salesforce Agentforce, and HubSpot CRM. Today’s launch builds on that foundation.

But Reshef did not sugarcoat MCP’s limitations. “MCP is very immature when it comes to security,” he told VentureBeat. The protocol lets enterprise AI systems share data and context, but trust remains the enterprise’s responsibility. He explained a two-sided model: Gong can pull data from partners like Zendesk through certified integrations, and simultaneously makes its own MCP server available so that tools like Microsoft Copilot can query Gong’s data. “It’s up to the company which connections they actually feel are secure enough,” he said. “The safest ones are the ones that we’ve kind of like certified in a way. But MCP is an open protocol. They can connect it to their own systems. We have no control over this.”

That candor matters. As MCP adoption accelerates across the enterprise software stack, security teams are scrambling to understand what happens when agentic AI systems start talking to each other without humans in the loop. Gong appears to be betting that transparency about the protocol’s immaturity will build more trust than marketing bravado.

Early customers report faster ramp times and higher win rates, but the newest features are still days old

When asked for hard numbers, Reshef offered a mix of platform-wide results and measured candor about the newest features. Existing Gong customers report roughly a 50 percent reduction in sales rep ramp time and 10 to 15 percent improvements in win rates, he said.

But on Gong Enable specifically, he acknowledged the product is still brand new. “The trainer has been in the market for literally, you know, days, a week,” he said. “I would probably lie to you if I said, ‘Hey, we’re already seeing people crushing it after taking three or four courses.'” For the earlier version of Enable that includes the AI Call Reviewer, however, he said customers are “definitely seeing a very high kind of skill improvement” and are attributing increases in win rates and quota attainment to those gains — though he conceded that “it’s always hard to do 100 percent attribution.”

Morningstar, one of Gong’s early adopters, offered a pre-launch endorsement. Rae Cheney, Director of Sales Enablement Technology at Morningstar, said in a statement that Gong Enable helped the firm “spend less time on status updates and more time on the work that actually moves deals.”

Reshef insists AI still needs a human operator, putting Gong at odds with the autonomous agent hype

One of the more interesting threads in Reshef’s remarks concerned his view of AI autonomy — or rather, its limits. He pushed back on what he called a “common misperception about AI” — that it operates completely autonomously.

“There has to be a person in the middle, which I call operator,” he said. “It could be RevOps. It could be enablement. In the case of training, it could be analysts. Sometimes it could be even business leaders.” Those operators, he argued, are responsible for a “repeatable process of AI doing something, measuring the AI” and adjusting over time.

This philosophy extends to the AI Call Reviewer’s feedback. Gong does not dictate what the system trains on — enablement leaders choose. “We don’t decide what they want to train on. We let them choose,” Reshef said. “You iterate, you optimize, you see how it goes, and there has to be somebody in the organization who’s responsible for making sure this aligns with the business needs.”

That stance puts Gong at odds with the more aggressive “autonomous agent” rhetoric emerging from some competitors, and it may resonate with enterprise buyers who remain cautious about letting AI run unsupervised in revenue-critical workflows.

A wave of mega-mergers is reshaping the revenue AI market, and Gong is racing to stay ahead of the combined giants

Mission Andromeda does not exist in a vacuum. The revenue AI landscape has been reshaped by a remarkable wave of consolidation over the past six months.

In a category-defining move, Clari and Salesloft merged in December 2025 to form what they called a “Revenue AI powerhouse,” combining roughly $450 million in ARR under new CEO Steve Cox. Just two weeks ago, Highspot and Seismic signed a definitive agreement to merge, creating a combined entity worth more than $6 billion focused on AI-powered sales enablement — the very same territory Gong is now invading with Enable.

Meanwhile, Gong was named a Leader in the inaugural 2025 Gartner Magic Quadrant for Revenue Action Orchestration, published in December. The company placed highest among the 12 vendors evaluated on both the “Ability to Execute” and “Completeness of Vision” axes and ranked first in all four evaluated use cases in Gartner’s companion Critical Capabilities report.

In his interview, Reshef did not name competitors directly, but he drew a clear contrast. “We’ve built a product from the ground up. It’s all organic,” he said. “All of the other players in the field have sort of stitched together tools. And obviously you can’t just get it to be a coherent product if you just stitch together tools. Some of them even have multiple logins.” That is a thinly veiled shot at the merged Clari-Salesloft entity, which Forrester has described as presenting a “bifurcated approach” — Salesloft serving frontline users while Clari supports management insights.

Reshef also pointed to growth as a competitive weapon. “We’re growing at the top deck side in terms of SaaS companies,” he said, adding that Gong hired roughly 200 R&D employees this year and plans to hire another 200. “It’s kind of a flywheel where we can invest more in R&D, we make the product better, we get more capabilities, more flexibility, more enterprise customers.”

Gong declines to discuss IPO timing, but its structured launch cadence tells a story of its own

Any discussion of Gong’s trajectory inevitably raises the question of a public offering. When asked directly, Reshef declined to comment: “I wouldn’t comment on IPO at this stage. No.”

The company has been on a clear growth arc. Gong has raised approximately $584 million to date, with its last official funding round valuing it at $7.25 billion — the culmination of a series of rapid jumps from $750 million in 2019 to $2.2 billion in 2020. The company reached an annual sales run rate of approximately $300 million in January 2025, driven largely by the adoption of AI, according to Calcalist.

But that valuation has since slipped. As Calcalist reported in November 2025, Gong is conducting a secondary round for company employees and investors at a valuation of roughly $4.5 billion — well below its 2021 peak. The offering is being conducted through Nasdaq’s private market platform and was in advanced stages at the time of the report. It is not yet clear whether the company has repriced employee options, some of which were issued at significantly higher valuations than the current secondary round. Gong told Calcalist that it “regularly receives inquiries from potential investors” but as a private company does not “engage in speculation.”

The structured quarterly launch cadence that Mission Andromeda inaugurates — complete with galactic naming conventions and coordinated product narratives — certainly resembles the kind of predictable, story-driven approach that public market investors reward. Reshef framed it differently: “We felt like having quarterly launches with a name, a mission, and a story around it makes it easier to work… It’s a good way to educate the market on a regular basis.”

Gong’s 50 percent productivity target reveals where it thinks the future of sales is heading

Reshef’s most revealing comment came when he laid out the company’s long-term thesis: Gong aims to increase productivity for revenue professionals by 50 percent. “We’re not there yet,” he admitted. “I think we’re like at 20 to 30 — whatever, hard to measure.”

He broke the productivity gain into two categories. The first is making high-complexity human tasks — like conducting a live Zoom sales call — better, through coaching, training, and review. “I think there’s going to be a long while, if ever, that Zoom conversations are going to get replaced by bots,” he said. The second is automating the manual drudgery: call preparation, post-meeting summaries, follow-up emails, account research briefs.

The distinction matters because it frames Gong’s ambition not as replacing salespeople but as making them dramatically more effective — a message calibrated to appeal to the thousands of revenue leaders who control Gong’s buying decisions. Whether that thesis holds will depend on whether Gong Enable, the AI Trainer, and the rest of Mission Andromeda can deliver measurable gains in a market that has been burned before by tools that promise insight but struggle to change behavior.

Gong currently serves more than 5,000 companies worldwide. The Clari-Salesloft merger has produced a rival with deeper combined resources. The Highspot-Seismic combination is assembling a sales enablement colossus. And a new Gartner category means every enterprise buyer now has a framework for comparison shopping. The next twelve months will test whether Mission Andromeda is the release that cements Gong’s position at the center of the revenue AI category — or the last big swing before the consolidated giants close in.

“Our mission is to be at the forefront,” Reshef said. “If everybody else is doing 20 percent, we’re going to do 50. If everybody is going to do 50, we’re going to do 80.”

In the revenue AI wars, that kind of confidence is easy to project. Delivering on it, with brand-new products still days old and a market being remade around you in real time, is something else entirely.

Data, Orchestration, technology

Anthropic just released a mobile version of Claude Code called Remote Control

Claude Code has become increasingly popular in the first year since its launch, and especially in recent months, as developers and non-technical users alike flock to AI unicorn Anthropic’s hit coding agent to create full applications and websites in days, on their own, that would’ve taken months and technical teams without. It’s not a stretch to say it helped spur the “vibe coding” boom — using plain English instead of programming languages to write software.

But it’s all been restricted to the desktop Claude Code apps and Terminal command-line interfaces and integrated development environments (IDEs) — until today. Now, Anthropic has added a new mode, Remote Control, that lets users issue commands to Claude Code from their iPhone and Android smartphones — starting with subscribers to Anthropic’s Claude Max ($100-$200 USD monthly) subscription tier.

Anthropic posted on X saying Remote Control will also make its way to Claude Pro ($20 USD monthly) subscribers in the future.

The mobile command center

Announced earlier today by Claude Code Product Manager Noah Zweben, Remote Control is a synchronization layer that bridges local CLI environments with the Claude mobile app and web interface.

The feature allows developers to initiate a complex task in their terminal and maintain full control of it from a phone or tablet, effectively decoupling the AI agent from the physical workstation.

Currently, Remote Control is available as a Research Preview for subscribers on the Claude Max tier. While access for Claude Pro ($20/month) users is expected shortly, the feature remains a high-end tool for power users and is notably absent from Team or Enterprise plans during this initial phase.

To access the feature, users must follow this guide and update to Claude version 2.1.52 and execute the command claude remote-control or use the in-session slash command /rc. Once active, the terminal displays a QR code that, when scanned, opens a responsive, synchronized session in the Claude mobile app.

Less screen time, more IRL time: philosophy of flow

The messaging behind the release centers on the preservation of a developer’s “flow state.”

In his announcement, Zweben framed the update as a lifestyle upgrade rather than just a technical one, encouraging users to “take a walk, see the sun, walk your dog without losing your flow.”

This “Remote Control” is not a cloud-based replacement for local development, but a portal into it. According to official documentation, the core value is that “Claude keeps running on your machine, and you can control the session from the Claude app.”

This ensures that local context—filesystem access, environment variables, and Model Context Protocol (MCP)servers—remains active and reachable even if the user is miles away from their desk.

Architecture, security, and setup

Claude Code Remote Control functions as a secure bridge between your local terminal and Anthropic’s cloud interface, which provides the Anthropic AI models, Opus 4.6 and Sonnet 4.6, that power Claude Code.

When you run the command, your desktop machine initiates an outbound connection to Anthropic’s API for serving the models — meaning you aren’t opening any “inbound” ports or exposing your computer to the open web. Instead, your local machine polls the API for instructions.

When you visit the session URL or use the Claude app, you are essentially using those devices as a “remote window” to view and command the process still running on your computer. Your files and MCP servers never leave your machine; only the chat messages and tool results flow through the encrypted bridge.

To get started, ensure you are on a Pro or Max plan and have authenticated your CLI using the /login command. Simply navigate to your project directory and run claude remote-control to initialize the session. The terminal will then generate a unique session URL and a QR code (toggleable via the spacebar) for your mobile device.

Once you open that link on your phone, tablet, or another browser, the two surfaces stay in perfect sync—allowing you to start a task at your desk and continue it from the couch while maintaining full access to your local filesystem and project configuration.

From brittle community hacks to official solution

Prior to this official release, the developer community went to great lengths to “hack” mobile access into their terminal-based workflows.

Power users frequently relied on a patchwork of third-party tools like Tailscale for secure tunneling, Termius or Termux for mobile SSH access, and Tmux for session persistence.

Some developers even built complex custom WebSocket bridges just to get a responsive mobile UI for their local Claude sessions.

These unofficial solutions, while functional, were often brittle and prone to timeout issues. Remote Control replaces these workarounds with a native streaming connection that requires no port forwarding or complex VPN configurations.

It also includes automatic reconnection logic: if a user’s laptop sleeps or the network drops, the session remains alive in the background and reconnects as soon as the host machine is back online.

The $2.5 billion-dollar agent

The launch of Remote Control serves as an “escalation of force” in what has become a dominant business for Anthropic. As of February 2026, Claude Code has hit a $2.5 billion annualized run rate — a figure that has more than doubled since the start of the year alone.

Claude Code is currently experiencing its “ChatGPT moment,” surging to 29 million daily installs within Visual Studio Code. Its efficiency is no longer theoretical; recent analysis suggests that 4% of all public GitHub commits worldwide are now authored by Claude Code.

By extending this power to mobile, Anthropic is further entrenching its lead in the “agentic” coding space, moving beyond simple autocomplete to a world where the AI acts as an autonomous collaborator.

Future outlook: vibe coding everywhere

The move toward mobile terminal control signals a broader shift in the software market. We are entering an era where AI tools are writing roughly 41% of all code. For developers, this translates to a migration from “line-by-line” typing to “strategic oversight.”

This trend is likely to accelerate as mobile-tethered agents become the norm. The barrier between “idea” and “production” is collapsing, enabling a single developer to manage complex systems that previously required entire DevOps teams. This shift has already rattled the broader tech market; shares of major cybersecurity firms like CrowdStrike and Datadog fell as much as 11% following the launch of Claude Code’s automated security scanning features.

As Claude Code moves from the desk to the pocket, the definition of a “software engineer” is being rewritten. In the coming year, the industry may see a surge in “one-person unicorns”—startups built and maintained almost entirely via mobile agentic commands—marking the end of the manual coding era as we knew it.

Orchestration

Anthropic says Claude Code transformed programming. Now Claude Cowork is coming for the rest of the enterprise.

Anthropic opened its virtual “Briefing: Enterprise Agents” event on Tuesday with a provocation. Kate Jensen, the company’s head of Americas, told viewers that the hype around enterprise AI agents in 2025 “turned out to be mostly premature,” with many pilots failing to reach production. “It wasn’t a failure of effort, it was a failure of approach, and it’s something we heard directly from our customers,” Jensen said.

The implicit promise: Anthropic has figured out the right approach, and it starts with the playbook that made Claude Code one of the most consequential developer tools of the past year. “In 2025 Claude transformed how developers work, and in 2026 it will do the same for knowledge work,” Jensen said. “The magic behind Claude Code is simple. When you can delegate hard challenges, you can focus on the work that actually matters. Cowork brings that same power to knowledge workers.”

That framing is central to understanding what Anthropic announced on Tuesday. The company rolled out a sweeping set of enterprise capabilities for Claude Cowork, the AI productivity platform it first released in research preview in January. Scott White, head of product for Claude Enterprise, described the ambition plainly during the keynote: “Cowork makes it possible for Claude to deliver polished, near final work. It goes beyond drafts and suggestions — actual completed projects and deliverables.”

The product updates are dense but consequential. Enterprise administrators can now build private plugin marketplaces tailored to their organizations, connecting to private GitHub repositories as plugin sources and controlling which plugins employees can access. Anthropic introduced new prebuilt plugin templates spanning HR, design, engineering, operations, financial analysis, investment banking, equity research, private equity, and wealth management. The company also shipped new MCP connectors for Google Drive, Google Calendar, Gmail, DocuSign, Apollo, Clay, Outreach, SimilarWeb, MSCI, LegalZoom, FactSet, WordPress, and Harvey — dramatically extending Claude’s reach into the software ecosystem that enterprises already use. And Claude can now pass context seamlessly between Cowork, Excel, and PowerPoint, including across multiple files, without requiring users to restart when switching applications.

White emphasized that the system is designed to feel native to each organization rather than generic. “We’ve heard loud and clear from enterprises — you want Claude to work the way that your company works, not just Claude for legal, but Cowork for legal at your company,” he said. “That’s exactly what today’s launches deliver.”

Real-world results from Spotify, Novo Nordisk, and Salesforce hint at what’s coming

To ground the product announcements in measurable outcomes, Anthropic showcased three enterprise deployments that illustrate both the scale and the variety of impact the company claims Claude can deliver.

At Spotify, engineers had long struggled with code migrations — the slow, manual work of updating and modernizing code across thousands of services. Jensen explained that after integrating Claude directly into the system Spotify’s engineers use daily, “any engineer can kick off a large-scale migration just by describing what they need in plain English.” The company reports up to a 90% reduction in engineering time, over 650 AI-generated code changes shipped per month, and roughly half of all Spotify updates now flowing through the system.

At Novo Nordisk, the pharmaceutical giant built an AI-powered platform called NovoScribe with Claude as its intelligence layer, targeting the grueling process of producing regulatory documentation for new medicines. Staff writers had previously averaged just over two reports per year. After deploying Claude, Jensen said, “documentation creation went from 10 plus weeks to 10 minutes. That’s a 95% reduction in resources for verification checks. Medicines are reaching patients faster.” Jensen also noted that Novo Nordisk used Claude Code to build the platform itself, enabling contributions from non-engineers — their digitalization strategy director, who holds a PhD in molecular biology rather than engineering, now prototypes features using natural language. “A team of 11 is operating like a team many times its size,” Jensen said.

Salesforce, meanwhile, uses Claude models to help power AI in Slack, reporting a 96% satisfaction rate for tools like its Slack bot and saving customers an estimated 97 minutes per week through summarization and recap features. The partnership reflects Anthropic’s broader ecosystem strategy: Jensen described the companies featured at the event as “Claude partners and domain experts with the data and trusted relationships that make Claude work in the real world.”

Enterprise leaders reveal the messy reality behind AI transformation

Perhaps the most illuminating segment of the event was a panel discussion featuring executives from Thomson Reuters, the New York Stock Exchange, and Epic, who provided candid assessments of AI’s enterprise reality that went well beyond the polished case studies.

Sridhar Masam, CTO of the New York Stock Exchange, described his organization as “rewiring our engineering process” with Claude Code and building internal AI agents using the Claude Agent SDK that can take instructions from a Jira ticket all the way to a committed piece of code. But he also identified fundamental shifts in how leaders must think. “The accountability is shifting,” he said. “Traditionally, we are so used to building deterministic platforms. You write code requirements and build. And now, with AI being probabilistic, the accountability doesn’t end when the project goes live, but on a daily basis, monitoring the behavior and outcomes.” He described a new paradigm beyond “buy versus build” — what he called “assembly,” the practice of combining multiple models, multiple vendors, platforms, data, and internal capabilities into solutions. And he noted that highly regulated industries must shift “from risk avoidance to risk calibration,” because simply avoiding AI is no longer a competitive option.

Steve Haske from Thomson Reuters, whose Co-Counsel product has reached a million users, was frank about the gap between what the technology can do and what organizations are ready for. “The tools are in many senses ahead of the change management,” he said. “A general counsel’s office, a law firm, a tax and accounting firm, an audit firm, need to rewire the processes to be able to take advantage of the benefits that the tools provide. And I think it’s 18 months away before that sort of change management catches up with the standard of the tool.” He also stressed an “ironclad guarantee” to Co-Counsel customers that “their input will not be part of our AI output,” and urged enterprise leaders to be “feverish” about protecting institutional intellectual property.

Seth Hain from Epic — the healthcare technology company behind MyChart — offered a finding that may foreshadow where enterprise AI adoption is truly heading. “Over half of our use of Claude Code is by non-developer roles across the company,” Hain said, describing how support and implementation staff had adopted the tool in ways the company never anticipated. Hain also described a deliberate trust-building strategy: Epic’s first AI capability was a medical record summarization that included links to the underlying source material, giving clinicians the ability to verify and build confidence before the company introduced more autonomous agent capabilities.

A year of Claude Code and MCP adoption explains why this moment feels different

Tuesday’s announcements cannot be understood in isolation. They are essentially the culmination of a year in which Anthropic transformed itself from a research-focused AI lab into a company with genuine enterprise distribution and developer ecosystem gravity.

The trajectory began with Claude Code, which Jensen noted had taken coding use cases “from assisting on tiny tasks to AI writing 90 or sometimes even 100% of the code, with enterprises shipping in weeks what once took many quarters.” But the deeper structural shift was the adoption of MCP — the Model Context Protocol — which has become the connective tissue allowing Claude to reach into and act upon data across an organization’s entire technology stack. Where previous AI tools were constrained to the information users manually fed them, MCP-connected Claude can pull context from Slack threads, Google Drive documents, CRM records, and financial systems simultaneously. This is what makes the plugin architecture announced Tuesday fundamentally different from earlier chatbot-style enterprise AI: it turns Claude into a reasoning layer that sits across an organization’s existing infrastructure rather than alongside it.

The implications for the broader AI industry are profound. Anthropic is effectively building a platform play — private plugin marketplaces, portable file-based plugins, and an expanding library of MCP connectors — that echoes the ecosystem strategies of earlier platform giants like Salesforce and Microsoft. The difference is velocity: Anthropic is compressing into months the kind of ecosystem development that previously took years. The company’s willingness to ship sector-specific plugin templates for investment banking, equity research, and wealth management alongside general-purpose tools signals that it sees no bright line between platform and application, between enabling partners and competing with them.

This strategic ambiguity is precisely what has spooked Wall Street. IBM shares suffered their worst single-day loss since October 2000 — down nearly 13.2% — on Monday after Anthropic published a blog post about using Claude Code to modernize COBOL, the decades-old programming language that runs on IBM’s mainframe systems. Enterprise software stocks had already been under heavy pressure since the initial Cowork announcement on January 30, with companies like ServiceNow, Salesforce, Snowflake, Intuit, and Thomson Reuters all experiencing steep declines. Cybersecurity companies tumbled after the company unveiled Claude Code Security on February 20.

Yet Tuesday’s event triggered a partial reversal that revealed something important about how markets are processing AI disruption. Companies named as Anthropic partners and integration targets — Salesforce, DocuSign, LegalZoom, Thomson Reuters, FactSet — all rallied, some sharply. Thomson Reuters surged more than 11%. The market appears to be drawing a new distinction: companies integrated into Anthropic’s ecosystem may benefit, while those standing outside it face existential risk.

Anthropic’s own economist warns that AI’s impact will be uneven — and fast

Peter McCrory, Anthropic’s head of economics, presented data from the Anthropic Economic Index that offered a sober counterweight to the event’s product optimism. Using privacy-preserving methods to analyze how people and businesses use Claude, McCrory’s team has tracked AI’s diffusion across more than 150 countries and every US state.

The headline finding is striking: a year ago, roughly a third of all US jobs had at least a quarter of their associated tasks appearing in Claude usage data. That figure has now risen to approximately one in every two jobs. “The scope of impact is broadening out throughout the economy as the tools and as the technology becomes more capable,” McCrory said. He characterized AI as a “general purpose technology” in the economic sense — meaning virtually no facet of the economy will be unaffected.

McCrory drew a critical distinction between automation, where Claude simply executes a task, and augmentation, where it collaborates with a human on more complex work. When businesses embed Claude through the API, he noted, “we see overwhelmingly Claude is being embedded in automated ways” — a pattern consistent with how transformative technologies have historically diffused through the economy.

On the question of job displacement, McCrory was measured but direct. He noted that “roles that typically require more years of schooling have the largest productivity or efficiency gains,” suggesting a dynamic economists call skill-biased technical change. He expressed concern about “jobs that are pure implementation” — citing data entry workers and technical writers as examples where Claude is already being used for tasks central to those occupations. But he emphasized that no evidence of widespread labor displacement has materialized yet, and pointed to forthcoming research that would introduce methodology for monitoring whether highly exposed workers are beginning to experience it.

His advice to enterprise leaders cut to the heart of the organizational challenge. “It might not just be about fundamental capabilities of the model,” McCrory said. “Do you have the right sort of data ecosystem, data infrastructure to provide the right information at the right time?” If the knowledge Claude needs to execute a sophisticated task exists only in a coworker’s head, he argued, “that’s not a technical problem, per se. That’s an organizational problem.”

The question every enterprise leader is now asking — and why no one has the answer yet

Jensen described a concept Anthropic calls “the thinking divide” — the growing gap between organizations that embed AI across employees, processes, and products simultaneously, and those that treat it as a point solution. The companies on the right side of that divide, she argued, will compound their advantage over time. Those on the wrong side “will find themselves falling further and further behind.”

Whether Anthropic ultimately functions as the rising tide that lifts the enterprise software ecosystem or the wave that swamps it remains genuinely uncertain. The same event that triggered a rally in shares of Anthropic’s named partners has also accelerated a broader reckoning for legacy software companies that cannot yet articulate how they fit into an AI-native world. McCrory, the economist, counseled humility. “Capabilities are moving very, very quickly,” he said. “It might represent an innovation in the method of innovation. So it’s not just making us better at the things that we do — it’s helping us discover new ways to do things.”

Thomson Reuters’ Haske perhaps put it most practically. “As leaders, we all have to get personally involved and personally invested in using the tools,” he said. “We’ve got to move fast. This environment is changing quickly. We cannot afford to get left behind.”

A Fortune 10 CIO recently told Jensen that enterprises would need to fit a decade of innovation into the next few years. The CIO smiled and said: “We’re going to do it in one with you.” Whether that confidence proves prescient or premature, one thing is clear from Tuesday’s event — the window for figuring it out is closing faster than most boardrooms realize.

Data, Orchestration, technology

Kilo launches KiloClaw, allowing anyone to deploy hosted OpenClaw agents into production in 60 seconds

In the rapidly evolving landscape of artificial intelligence, the distance between a developer’s idea and a functioning agent has historically been measured in hours of configuration, dependency conflicts, and terminal-induced headaches.

That friction point changed today. Kilo, the AI infrastructure startup backed by GitLab co-founder Sid Sijbrandij, has announced the general availability of KiloClaw, a fully managed service designed to deploy a production-ready OpenClaw agent in under 60 seconds.

By eliminating the “SSH, Docker, and YAML” barriers that have gatekept high-end AI agents, Kilo is betting that the next phase of software development—often called “vibe coding”—will be defined not just by the quality of a model, but by the reliability of the infrastructure that hosts it.

Technology: Re-engineering the agentic sandbox

OpenClaw has emerged as a viral phenomenon, amassing over 161,000 GitHub stars by offering a capability that many proprietary tools lack: the ability to actually perform tasks—controlling browsers, managing files, and connecting to over 50 chat platforms like WhatsApp and Signal.

However, as Kilo co-founder and CEO Scott Breitenother noted in an exclusive interview with VentureBeat, “OpenClaw itself isn’t the hard part… getting it running is”.

The technical architecture of KiloClaw is a departure from the “Mac Mini on a desk” model that many early adopters have relied on. Instead of requiring users to provision their own hardware or Virtual Private Servers (VPS), KiloClaw runs on a multi-tenant Virtual Machine (VM) architecture powered by Fly.io, a Chicago remote-first startup offering a developer-focused public cloud. This setup provides a level of isolation and security that is difficult for individual developers to replicate.

“What we’re doing is making KiloClaw the safest way to claw,” Breitenother explained during the interview. “We have a virtual machine that is a hosted OpenClaw instance, and we’re handling all that network security, sandboxing, and proxies that an enterprise company would require. We are essentially running multi-tenant, hosted OpenClaw”.

To ensure security, KiloClaw utilizes two distinct proxies that sit outside the VM to manage traffic and protect the instance from the open internet. This prevents the common “user error” of accidentally exposing an agent’s API keys or leaving a local instance vulnerable to external attacks. “It’s going to be better than [a local setup] in every single way,” Breitenother asserted. “If you were to set it up yourself, you’d probably miss a setting and end up with it accidentally on the internet or exposing an API key”.

Product: The ‘mech suit’ and the 3 am crash

A primary pain point for OpenClaw users is the “3 am crash”—the tendency for locally hosted Node.js processes to die silently overnight without health monitoring or auto-restart capabilities. KiloClaw addresses this with built-in process monitoring and a cloud-native “always on” state.

Unlike standard Kilo Code workflows, which spin up a terminal session only when a developer initiates a command, KiloClaw is persistent. “KiloClaw is just running and listening,” said Breitenother. “It’s always on, waiting for your WhatsApp message or your Slack message. It has to be always on. That’s a different paradigm—always-on infrastructure to engage with”.

This persistence allows for a suite of “agentic affordances” that Kilo calls an “exoskeleton for the mind”:

Scheduled automations: Users can set cron jobs for the agent to perform research, monitor repositories, or generate reports while the human user is offline.
Persistent memory: Utilizing a “Memory Bank” system, the agent stores context in structured Markdown files within the repository, ensuring it retains the state of a project even if the underlying model is swapped.
Cross-platform command: The agent can be triggered from Slack, Telegram, or a terminal, maintaining a unified execution state across all entry points.

Breitenother highlighted the shift in the developer’s role during the interview: “We’ve actually moved our engineers to be product owners. The time they freed up from writing code, they’re actually doing much more thinking. They’re setting the strategy for the product”.

The “gateway” advantage: 500+ models, no lock-in

A core component of the KiloClaw architecture is its native integration with the Kilo Gateway. While the original OpenClaw was initially tied closely to Anthropic’s models, KiloClaw allows users to toggle between over 500 different models from providers like OpenAI, Google, and MiniMax, as well as open-weight models like Qwen or GLM.

“Your preferred model today may not be the same—and honestly shouldn’t be the same—a month and a half from now,” Breitenother said, emphasizing the speed of the industry. “You may want different models for different tasks. Maybe you use Opus for something complex, or you switch to a tighter-budget open-weight model for routine work”.

This flexibility is supported by Kilo’s transparent pricing model. The company offers “zero markup” on AI tokens, charging users the exact API rates provided by the model vendors. For power users, this is managed through Kilo Pass, a subscription tier that provides bonus credits (e.g., $199/month for $278.60 in credits) to subsidize high-volume agentic work.

How to get started with KiloClaw right now

Sign in or register: Navigate to the Kilo Code application on the web (desktop) at app.kilo.ai and sign in using your existing account. Kilo supports several authentication methods, including GitHub and Google OAuth.
Create your instance: Select the “Claw” tab from the side navigation menu to access the KiloClaw dashboard. Click the “Create Instance” button to begin provisioning your agent (see image above for where to find it).

Choose your model: Select a default AI model to power your agent from the dropdown menu. Users can choose from a wide array of options, including free (for the time being) models like MiniMax.

Configure messaging channels (optional): During setup, you can optionally connect your agent to Discord, Telegram, or Slack and communicate with your KiloClaw agent directly over those channels — instead of on the Kilo Code website. But to move faster, you may skip this step and are always able to add these supported bot keys and configure these channels later in the instance settings.
Provision and start: Click “Create and Provision” to set up your virtual machine. Once the instance is provisioned, click “Start” to boot the agent, which typically takes only a few second
Verify and access: Click the “Open” button to enter the OpenClaw interface. For security, you will need to click “Access Code” to generate a one-time verification token that validates your device for the first time.
Begin vibe coding: Once verified, you can begin interacting with your agent directly in the chat interface. The agent will remain running 24/7 on a dedicated virtual machine, listening for commands across all connected platforms.

According to Brendan O’Leary, Developer Relations at Kilo Code and former Developer Evangelist at GitLab, users unsure which model to select should consult PinchBench, an open-source benchmarking tool developed to evaluate models on 23 real-world agentic tasks, such as email sorting and blog post generation.

Benchmarking the agentic era: the launch of PinchBench, a new open-source benchmarking suite specifically for Claw tasks

To help developers navigate the choice between 500+ models, Kilo has also released PinchBench, an open-source benchmark specifically for agentic workloads.

While traditional benchmarks like MMLU or HumanEval test chat prompts in isolation, PinchBench tests agents on 23 real-world, multi-step tasks such as calendar management and multi-source research.

The project was spearheaded by O’Leary, who noted during a demonstration that the benchmark was “kind of inspired by… other little kind of fun benches” like those created by developer YouTuber Theo Browne (@t3dotgg), CEO/Founder of Ping Labs.

O’Leary explained that while existing benchmarks are often highly specialized, he wanted a way to “benchmark the kind of things that we asked OpenClaw to do”.

He has personally run the benchmark “hundreds and hundreds of times against OpenClaw” to ensure its accuracy, and taking a page out of Browne’s book (er, video playbook?), also launched a YouTube series to find out if KiloClaw can handle various tasks, entitled, fittingly, “Will It Claw?”

To maintain high standards of evaluation for subjective tasks like writing blog posts, O’Leary designed a system where a high-end “judge model”—specifically Claude 3.5 Opus—is used to grade the output of other models. “We actually have… not the model under test, but always Opus… [judge] the output of each of the models,” O’Leary stated, adding that the judge model even provides specific notes on execution quality.

The benchmark allows users to view a scatter plot comparing “Cost to Intelligence,” identifying which models offer the highest proficiency for the lowest price. This specific visualization is a priority for O’Leary, who noted it is “my favorite graph for looking at models… how much do you spend versus how much is the success rate”.

For those who prefer to host their own infrastructure, O’Leary has made the process entirely transparent, providing a “skill file that people can download” so they can “benchmark their own OpenClaw instance” independently

“We’re doing this work anyway to know which defaults we should recommend,” Breitenother added in a separate interview. “We decided to open source it because the individual developer shouldn’t have to think about which model is best for the job. We want to give people more and more information”.

O’Leary expanded on this philosophy, describing the benchmark as being “kind of like the Olympics in a lot of ways,” where tasks range from “very objectively graded” to those requiring a more nuanced assessment.

Industry context: Distinguishing from the growing OpenClaw family of offshoots

KiloClaw enters a market increasingly crowded with OpenClaw variants. Projects like Nanoclaw have gained traction for being lightweight, while companies like Runlayer have targeted the enterprise “Virtual Private Server” niche.

However, Kilo distinguishes itself by refusing to “fork” the code. “It’s not a fork, and that’s what’s important,” Breitenother stated. “OpenClaw moves so quickly that we are hosting the actual OpenClaw [version]. It is literally OpenClaw on a really well-tuned, well-set-up managed virtual machine”.

This ensures that as the core OpenClaw project evolves, KiloClaw users receive updates automatically without manual “git pull” operations.

This “open core” philosophy extends to the licensing. While KiloClaw is a paid hosted service, the underlying Kilo CLI and core extensions remain MIT-licensed. This allows for community auditing—a critical feature for security-conscious enterprises.

Conclusion: toward an agentic future

The launch of KiloClaw marks a strategic move by Kilo to expand its user base beyond “wonky” developers to enterprise managers and non-technical professionals. By offering a “one-click” path to a production agent, the company is attempting to democratize the “magical moments” of AI.

According to a release provided to VentureBeat by Kilo ahead of the launch, in the first two weeks, more than 3,500 developers joined the waitlist. These early adopters have been “really pushing KiloClaw in all kinds of directions,” using it to automate everything from Discord management to repository maintenance.

“Our mission is to build the best all-in-one AI work platform,” Breitenother concluded. “Whether you are a developer, a product manager, or a data engineer, we want all of these personas to experience the magic of the exoskeleton for the mind”.

KiloClaw is available now, offering 7 days of free compute for all new users. With thousands of developers already having cleared the waitlist, the era of the managed AI agent appears to have arrived—no Mac Mini required.

Orchestration

How Smarsh built an AI front door for regulated industries — and drove 59% self-service adoption

Presented by Salesforce

Smarsh, a global provider of cloud-native, AI-driven solutions that capture, archive, and analyze communications data and intelligence for highly regulated industries, set an ambitious goal: use AI to scale its workforce and increase productivity by 30%. But its customer service team had already identified the real challenge — customers were navigating a maze of products, documentation, and compliance requirements.

The solution wasn’t just more automation. It was a single, intelligent entry point into support.

“At the team level we asked ourselves, how can we become a better support organization for our regulated industry customers given that we keep on acquiring companies and have so many products to support?” says Rohit Khanna, Smarsh chief customer officer. “How do we harness the knowledge we have internally and present that to these customers in a way that makes our teams more efficient, and customer service more effective?”

In practice, that meant building an intelligent, human-centric “front door” trained on Smarsh’s proprietary knowledge. The system centralizes the support journey, distilling complex AI infrastructure into a simple, practical experience. Customers bypass complex navigation trees and describe what they need in plain language, and the AI directs them to the right solution — reducing the friction of traditional self-service.

Archie, the Smarsh AI support agent

Smarsh named its AI support agent “Archie.” While many AI initiatives stall during the last mile — the difficult transition from a successful pilot to a durable, production-scale operation — Smarsh avoided this by building on a deeply unified platform. The company chose Salesforce’s Agentforce 360 Platform to ensure Archie had the shared context, controlled execution, and orchestration required for an agentic enterprise.

By deploying Agentforce rather than a bespoke DIY solution, Smarsh ensures Archie can plan and execute work across systems for smarter self-service and faster resolutions. This approach allows Smarsh to move work forward automatically across data and workflows, achieving greater efficiency without compromising the strict compliance rigor required by their industry.

As a result, the company expects to see a 20% increase in its customer self-service success rates, 25% faster issue resolution compared to traditional self-service search and browse methods, and a 30% boost in service representative productivity.

The bleeding edge of customer service AI

Both generative and agentic AI are rewriting the customer service playbook, yet the technology’s nascency can create intimidating hurdles. An organization can reap major rewards by moving decisively when launching AI initiatives, but it still requires care, forethought, and the right partnerships, Khanna says. Part of that is careful vendor choice.

“We’re a Salesforce shop,” he shared. “We use a core set of Salesforce products, including Data 360, Agentforce Service, Agentforce Sales and more, so it was wise to hang our hat on an AI agent provided to us by Salesforce rather than buying something from outside. We know that in the beginning, as new tech comes, it will be challenging, but Salesforce is up to the task and we’ll evolve together.”

From day one, effective AI has demanded a single non-negotiable prerequisite: clean, secure data. Grounding generative AI in an organization’s verified corporate knowledge and internal data slashes hallucination risk while delivering a significantly better user experience. Smarsh, however, didn’t wait for the industry to catch up. The company anticipated this need nearly half a decade ago, spending years meticulously rationalizing, annotating, and anonymizing its data to prepare for this exact moment.

“A lot of people run into challenges and don’t complete their AI projects because the data’s not ready and it’s not there,” Khanna says. “We started out strong, right out of the gate because our data was already clean and locked down, and today we’re in production with a service agent as we speak.”

Prioritizing data trust

Given Smarsh’s focus on regulatory compliance, Archie was introduced to replace the company’s previous self-service customer support chatbot. Janine Deegan, digital support program manager at Smarsh, worked with the Salesforce admin team on Smarsh’s Agentforce deployment.

“With Archie, the goal was to move beyond experimentation and make AI genuinely usable in a regulated environment. It wasn’t as simple as just switching on an agent; we had to build a system that gave that raw intelligence the context and control our industry actually requires, which is why we chose Salesforce,” Deegan says. “By connecting our documentation directly to Agentforce, which is backed by the Salesforce Trust Layer, we turned our static data into a live, trusted resource that handles the precision needed for a regulated space.”

Given its criticality, Khanna adds that maintaining pristine, secure documentation and data requires constant vigilance. To guarantee this, Smarsh erased the lines between departments, fusing the documentation team with the AI team. Now the two work in a tight loop: all of the material the document team produces, the AI team checks, verifies, and opens it up to the LLM.

AI and regulatory compliance

“We’re in a compliance world. We’re custodians of archival data for all of our financial institutions, and our data is so sacred that we don’t give it away, ” Khanna explains. “We have to be very cognizant of security and identity as we open up our systems to agentic AI.”

Infosec requirements were a critical consideration for rolling out Agentforce. Smarsh is regularly audited not just by regulatory bodies but also by the banks and financial institutions that have to comply with stringent data protection rules and ask for model risk management, (MRM).

“The safety regulators and banks ask for MRM,” Khanna says. “They say, ‘Tell me that all my data is not going to the public because it’s connecting with an LLM. Tell me about the LLM. Tell me about the model you’re using.’ We worked with Salesforce so we could get MRM approval for our customers. And thanks to Salesforce’s knowledge base and documentation, we’re always able to explain to these regulatory bodies what and why Archie is answering.”

Boosting customer adoption

Customer buy-in is always a major challenge when it comes to new AI tools, and Archie was no exception. On the initial rollout of the new interface, some customers were confused by the new text box in the center of their screen and didn’t immediately understand how to interact with it.

“We learned the hard way that we needed better change management, and to make sure our industry customers understood they could simply ask questions in natural language,” Khanna says.

Personalization, they soon realized, was the key to gen AI adoption.

“Once customers had a better understanding of how Archie could be used for more efficient self-service, suddenly our adoption rate went up to 59%,” he says. “Personalization was very critical for us. Now we see the uptake, and we hope to see that continue when we roll out Archie to the rest of our products.”

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Orchestration

Google clamps down on Antigravity ‘malicious usage’, cutting off OpenClaw users in sweeping ToS enforcement move

Google caused controversy among some developers this weekend and today, Monday, February 23rd, after restricting their usage of its new Antigravity “vibe coding” platform, alleging “maliciously usage.”

Some users who had been using the open source autonomous AI agent OpenClaw in conjunction with agents built on Antigravity, as well as those who had connected OpenClaw agents to their Gmails, claimed on social media that they lost access to their Google accounts.

According to Google, said users had been using Antigravity to access a larger number of Gemini tokens via third-party platforms like OpenClaw, which overwhelmed the system for other Antigravity customers.

This move has cut off several users, underscoring the architectural and trust issues that can arise with OpenClaw. The timing of Google’s crackdown is particularly pointed. Just one week ago, on February 15, OpenAI CEO Sam Altman announced that OpenClaw creator Peter Steinberger had joined OpenAI to lead its “next generation of personal agents.” While OpenClaw remains an open-source project under an independent foundation, it is now financially backed and strategically guided by Google’s primary rival.

By cutting off OpenClaw’s access to Antigravity, Google isn’t just protecting its server load; it is effectively severing a pipeline that allows an OpenAI-adjacent tool to leverage Google’s most advanced Gemini models.

Google DeepMind engineer and former CEO and founder of Windsurf, Varun Mohan, said in an X post that the company noticed “malicious usage” that led to service degradation.

“We’ve been seeing a massive increase in malicious usage of the Antigravity backend that has tremendously degraded the quality of service for our users. We needed to find a path to quickly shut off access to these users that are not using the product as intended. We understand that a subset of these users were not aware that this was against our ToS [Terms of Service] and will get a path for them to come back on but we have limited capacity and want to be fair to our actual users,” the post said.

A Google DeepMind spokesperson told VentureBeat that the move is not to permanently ban the use of Antigravity to access third-party platforms, but to align its use with the platform’s terms of service.

Unsurprisingly, Google’s move has caused a furor among OpenClaw users, including from OpenClaw creator Peter Steinberger, who announced that OpenClaw will remove Google support as a result.

Infrastructure and connection uncertainty

OpenClaw emerged as a way for individual users to run shell commands and access local files, fulfilling a major promise of AI agents: efficiently running workflows for users.

But, as VentureBeat has frequently pointed out, it can often run into security and guardrail issues. There are companies building ways for enterprise customers to access OpenClaw securely and with a governance layer, though OpenClaw is so new that we should expect more announcements soon.

However, Google’s move was not framed as a security issue but rather as one of access and runtime, further showing that there is still significant uncertainty when users want to bring in something like OpenClaw into their workflow.

This is not the first time developers and power users of agentic AI found their access curtailed. Last year, Anthropic throttled access to Claude Code after the company claimed some users were abusing the system by running it 24/7.

What this does highlight is the disconnect between companies like Google and OpenClaw users. OpenClaw offered many interesting possibilities for creating workflows with agents. However, because it is continually evolving, users may inadvertently run afoul of ToS or rate limits.

Mohan said Google is working to bring the banned users back, but whether this means the company will amend its ToS or figure out a secure connection between OpenClaw agents and Antigravity models remains to be seen.

For developers, the message is clear: the era of “bring your own agent” to a frontier model is ending. Providers are now prioritizing vertically integrated experiences where they can capture 100% of the telemetry and subscription revenue, often at the expense of the open-source interoperability that defined the early days of the LLM boom.

Affected users

Several users said on both the Y Combinator chat boards and X that they no longer had access to their Google accounts after running OpenClaw instances for certain Google products.

Google’s move mirrors a broader industry shift toward “walled garden” agent ecosystems. Earlier this year, Anthropic introduced “client fingerprinting” to ensure that its Claude Code environment remains the exclusive interface for its models, effectively locking out third-party wrappers like OpenClaw. For developers, the message is clear: the era of “bring your own agent” to a frontier model is ending. Providers are now prioritizing vertically integrated experiences where they can capture 100% of the telemetry and subscription revenue, often at the expense of the open-source interoperability that defined the early days of the LLM boom.

Some have said they will no longer use Google or Gemini for their projects. Right now, people who still want to keep using Antigravity will need to wait until Google figures out a way for them to use OpenClaw and access Gemini tokens in a manner Google deems “fair.”

Google DeepMind reiterated that it had only cut access to Antigravity, not to other Google applications.

Conclusion: the enterprise takeaway

For enterprise technical decision-makers, the “Antigravity Ban” serves as a definitive case study in the risks of agentic dependency. As the industry moves from chatbots to autonomous agents, the following realities must now dictate strategy:

Platform fragility is the new normal: The sudden lockout of $250/month “Ultra” users proves that even high-paying enterprise customers have little leverage when a provider decides to change its “fair use” definitions. Relying on OAuth-based third-party wrappers for core business logic is now a high-risk gamble.
The rise of local-first governance: With OpenClaw moving toward an OpenAI-backed foundation and Google/Anthropic tightening their clouds, enterprises should prioritize agent frameworks that can run “local-first” or within VPCs. The “token loophole” that OpenClaw exploited is being closed; future agentic scale will require direct, high-cost API contracts rather than subsidized consumer seats.
Account portability as a requirement: The fact that users “lost access to their Google accounts” underscores the danger of bundling development environments with primary identity providers. Decision-makers should decouple AI development from core corporate identity (SSO) where possible to avoid a single ToS violation paralyzing an entire team’s communications.

Ultimately, the Antigravity incident marks the end of the “Wild West” for AI agents. As Google and OpenAI stake their claims, the enterprise must choose between the stability of the walled garden or the complexity (and cost) of truly independent, self-hosted infrastructure.

Orchestration

One engineer made a production SaaS product in an hour: here’s the governance system that made it possible

Every engineering leader watching the agentic coding wave is eventually going to face the same question: if AI can generate production-quality code faster than any team, what does governance look like when the human isn’t writing the code anymore?

Most teams don’t have a good answer yet. Treasure Data, a SoftBank-backed customer data platform serving more than 450 global brands, now has one, though they learned parts of it the hard way.

The company today officially announced Treasure Code, a new AI-native command-line interface that lets data engineers and platform teams operate its full CDP through natural language, with Claude Code handling creation and iteration underneath. It was built by a single engineer.

The company says the coding itself took roughly 60 minutes. But that number is almost beside the point. The more important story is what had to be true before those 60 minutes were possible, and what broke after.

“From a planning standpoint, we still have to plan to derisk the business, and that did take a couple of weeks,” Rafa Flores, Chief Product Officer at Treasure Data, told VentureBeat. “From an ideation and execution standpoint, that’s where you kind of just blend the two and you just go, go, go. And it’s not just prototyping, it’s rolling things out in production in a safe way.”

Build the governance layer first

Before even a single line of code was written, Treasure Data had to answer a harder question: what does the system need to be prohibited from doing, and how do you enforce that at the platform level rather than hoping the code respects it?

The guardrails Treasure Data built live upstream of the code itself. When any user connects to the CDP through Treasure Code, access control and permission management are inherited directly from the platform. Users can only reach resources they already have permission for. PII cannot be exposed. API keys cannot be surfaced. The system cannot speak disparagingly about a brand or competitor.

“We had to get CISOs involved. I was involved. Our CTO, heads of engineering, just to make sure that this thing didn’t just go rogue,” Flores said.

This foundation made the next step possible: letting AI generate 100% of the codebase, with a three-tier quality pipeline enforcing production standards throughout.

The three-tier pipeline for AI code generation

The first tier is an AI-based code reviewer also using Claude Code.

The code reviewer sits at the pull request stage and runs a structured review checklist against every proposed merge, checking for architectural alignment, security compliance, proper error handling, test coverage and documentation quality. When all criteria are satisfied it can merge automatically. When they aren’t, it flags for human intervention.

The fact that Treasure Data built the code reviewer in Claude Code is not incidental. It means the tool validating AI-generated code was itself AI-generated, a proof point that the workflow is self-reinforcing rather than dependent on a separate human-written quality layer.

The second tier is a standard CI/CD pipeline running automated unit, integration and end-to-end tests, static analysis, linting and security checks against every change. The third is human review, required wherever automated systems flag risk or enterprise policy demands sign-off.

The internal principle Treasure Data operates under: AI writes code, but AI does not ship code.

Why this isn’t just Cursor pointed at a database

The obvious question for any engineering team is why not just point an existing tool like Cursor at your data platform, or expose it as an MCP server and let Claude Code query it directly.

Flores argued the difference is governance depth. A generic connection gives you natural language access to data but inherits none of the platform’s existing permission structures, meaning every query runs with whatever access the API key allows.

Treasure Code inherits Treasure Data’s full access control and permissioning layer, so what a user can do through natural language is bounded by what they’re already authorized to do in the platform.

The second distinction is orchestration. Because Treasure Code connects directly to Treasure Data’s AI Agent Foundry, it can coordinate sub-agents and skills across the platform rather than executing single tasks in isolation: the difference between telling an AI to run an analysis and having it orchestrate that analysis across omni-channel activation, segmentation and reporting simultaneously.

What broke anyway

Even with the governance architecture in place, the launch didn’t go cleanly, and Flores was candid about it.

Treasure Data initially made Treasure Code available to customers without a go-to-market plan. The assumption was that it would stay quiet while the team figured out next steps. Customers found it anyway. More than 100 customers and close to 1,000 users adopted it within two weeks, entirely through organic discovery.

“We didn’t put any go-to-market motions behind it. We didn’t think people were going to find it. Well, they did,” Flores said. “We were left scrambling with, how do we actually do the go-to-market motions? Do we even do a beta, since technically it’s live?”

The unplanned adoption also created a compliance gap. Treasure Data is still in the process of formally certifying Treasure Code under its Trust AI compliance program, a certification it had not completed before the product reached customers.

A second problem emerged when Treasure Data opened skill development to non-engineering teams. CSMs and account directors began building and submitting skills without understanding what would get approved and merged, creating significant wasted effort and a backlog of submissions that couldn’t clear the repository’s access policies.

Enterprise validation and what’s still missing

Thomson Reuters is among the early adopters. Flores said that the company had been attempting to build an in-house AI agent platform and struggling to move fast enough. It connected with Treasure Data’s AI Agent Foundry to accelerate audience segmentation work, then extended into Treasure Code to customize and iterate more rapidly.

The feedback, Flores said, has centered on extensibility and flexibility, and the fact that procurement was already done, removing a significant enterprise barrier to adoption.

The gap Thomson Reuters has flagged, and that Flores acknowledges the product doesn’t yet address, is guidance on AI maturity. Treasure Code doesn’t tell users who should use it, what to tackle first, or how to structure access across different skill levels within an organization.

“AI that allows you to be leveraged, but also tells you how to leverage it, I think that’s very differentiated,” Flores said. He sees it as the next meaningful layer to build.

What engineering leaders should take from this

Flores has had time to reflect on what the experience actually taught him, and he was direct about what he’d change. Next time, he said, the release would stay internal first.

“We will release it internally only. I will not release it to anyone outside of the organization,” he said. “It will be more of a controlled release so we can actually learn what we’re actually being exposed to at lower risk.”

On skill development, the lesson was to establish clear criteria for what gets approved and merged before opening the process to teams outside engineering, not after.

The common thread in both lessons is the same one that shaped the governance architecture and the three-tier pipeline: speed is only an advantage if the structure around it holds. For engineering leaders evaluating whether agentic coding is ready for production, the Treasure Data experience translates into three practical conclusions.

Governance infrastructure has to precede the code, not follow it. The platform-level access controls and permission inheritance were what made it safe to let AI generate freely. Without that foundation, the speed advantage disappears because every output requires exhaustive manual review.
A quality gate that doesn’t depend entirely on humans is not optional at scale.
Build a quality gate that doesn’t depend entirely on humans. AI can review every pull request consistently, without fatigue, and check policy compliance systematically across the entire codebase. Human review remains essential, but as a final check rather than the primary quality mechanism.
Plan for organic adoption. If the product works, people will find it before you’re ready. The compliance and go-to-market gaps Treasure Data is still closing are a direct result of underestimating that.

“Yes, vibe coding can work if done in a safe way and proper guardrails are in place,” Flores said. “Embrace it in a way to find means of not replacing the good work you do, but the tedious work that you can probably automate.”

Orchestration

Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding

As agentic AI workflows multiply the cost and latency of long reasoning chains, a team from the University of Maryland, Lawrence Livermore National Labs, Columbia University and TogetherAI has found a way to bake 3x throughput gains directly into a model’s weights.

Unlike speculative decoding, which requires a separate drafting model, this approach requires no additional infrastructure — just a single special token added to the model’s existing architecture.

The limits of next-token prediction

Next-token prediction — generating text one token per forward pass — creates a throughput ceiling that becomes painfully expensive when models need to produce thousands of tokens. This bottleneck is especially problematic in reasoning models, which frequently generate thousands of “chain of thought” tokens before producing the final response, leading to a slow and expensive user experience.

Multi-token prediction (MTP) offers an alternative training paradigm that allows a language model to produce multiple tokens simultaneously in a single forward pass. For example, the model can be trained to predict a block of tokens all at once instead of just the immediate next token.

John Kirchenbauer, a doctorate candidate in computer science at the University of Maryland and co-author of the paper, told VentureBeat that as we move toward agentic workflows, the focus is shifting from overall throughput to single-user speed. “Today, with ultra-long thinking traces being the norm and agentic outer loops multiplying out those costs even further, latency is becoming as equally important a dimension of overall serving efficiency as gross tokens per second per hardware unit (tps/GPU),” Kirchenbauer said. He said that while standard batched next-token prediction is already optimal for overall throughput, the new approach “strive[s] to saturate the GPU with just a single user’s query to decrease latency for that single user.”

Other methods exist, but they come with drawbacks. “It’s worth noting that speculative decoding, and diffusion LLMs as an efficiency focused alternative to next token prediction (NTP), are both latency focused acceleration techniques,” Kirchenbauer said. But speculative decoding requires deploying and managing an auxiliary “drafting” model, which spends more absolute compute to draft and verify. MTP, on the other hand, “leverages a similar sort of tradeoff, it’s just simpler to serve and scientifically interesting in its own right.”

Current MTP paradigms have limitations, however. The standard objective for training a language model for MTP involves comparing its predictions against ground-truth text from a dataset. The pitfall is that this standard training teaches the model to predict the probability of a token at a specific position independently, rather than caring about the joint relationship between a sequence of tokens.

If a model tries to predict multiple tokens at once using this standard method, two major problems occur. The first is grammatical mismatch. For example, if a model predicts two words following the prefix “The zookeeper fed the,” it might sample independently and produce a mismatched phrase like “panda meat” or “lion bamboo” instead of “panda bamboo” and “lion meat.”

The second issue is degenerate repetition. Because typical text is unpredictable, a model trying to predict a token 100 positions into the future against a standard dataset will just predict “the,” since it is the most common word in English. This results in the model outputting nonsense like “…the the the…” for far-future positions.

Multi-token prediction via self-distillation

To solve the issues of generating multiple tokens, the researchers propose a novel training technique that uses a student-teacher scheme. A student model, which is the model learning to predict multiple tokens, generates a deterministic multi-token block. A teacher model, acting as a strong standard next-token prediction language model, evaluates that block. The teacher acts as a critic, calculating how likely and coherent the student’s proposed sequence is. If the student proposes a mismatched phrase like “lion bamboo,” the teacher assigns it a high loss, teaching the student to avoid that construction.

The paradigm is inspired by on-policy reinforcement learning because the student model is not simply memorizing static text. It generates a full rollout (sequence of actions in RL parlance) instantly in parallel on a single forward pass and receives a reward based on how good the teacher thinks it is. Unlike static supervised methods where training pairs are fixed in advance, the feedback here is dynamic, generated from the student’s own outputs in real time. The strong teacher also verifies the coherence of the tokens, which prevents the student model from learning degenerate outputs like repeated words.

For developers, the beauty of this approach lies in its simplicity. “There are truly no modifications to the architecture except for the addition of a special token,” Kirchenbauer said. By co-opting an unused slot in a model’s existing embedding matrix to act as an <MTP> mask token, the technique converts sequential operations into parallel ones. “Any standard next token prediction language model can be adapted in this way… the internal implementation — MoE, windowed attention, SSM layers, etc. — are left untouched and present no barrier to adaptation.”

For engineering teams, this means the adaptation can be applied to models already in production without rebuilding pipelines.

Generating multiple tokens at the same time can still hurt the accuracy of the response at inference time. To maximize generation speed without sacrificing the quality of the output, the authors introduce an adaptive decoding strategy called ConfAdapt.

ConfAdapt evaluates a confidence threshold, such as 90%, at each step. The model generates a block of tokens, but it only keeps the tokens that meet or exceed this high-confidence threshold. When the upcoming text is highly predictable or structural, the model’s confidence is very high. It will accept and output a large chunk of tokens all at once, saving significant computational time on easy tokens. It then focuses its costly single-token passes on harder tokens that require more computational effort.

Putting multi-token prediction to the test

To see how the training paradigm performed in practice, the researchers applied their method to popular open-weight instruction-tuned models. They tested the strong general-purpose model Llama-3.1-8B-Magpie and the smaller, efficient Qwen3-4B-Instruct-2507, which is often chosen for cost-sensitive enterprise deployments. Both models were tuned on MetaMathQA, a dataset of synthetic grade school math problems that rely heavily on reasoning traces.

The experiments revealed a clear sweet spot between speed and accuracy. Using the ConfAdapt strategy, the Llama-3.1-8B model achieved a 3x speedup with less than a 3% drop in accuracy on math benchmarks. The Qwen3-4B model achieved the same 3x speedup with a slightly higher 7% drop in accuracy. More aggressive settings could hit 5x speedups, though they came with steeper accuracy penalties.

How this translates to real-world tasks depends on predictability. “As the ConfAdapt approach naturally tailors the acceleration to the inherent entropy in the domain, when the model ‘knows’ exactly what comes next it can emit it in a single pass,” he noted, leading to massive acceleration on predictable tasks, while using more steps for uncertain outputs.

The speedups also transferred across domains that were not included in the multi-token prediction training phase. This included tasks within the same domain as the training data, like math and reasoning, as well as open-ended tasks such as creative writing and summarization.

Despite this transfer learning, enterprises deploying these models for specialized tasks shouldn’t rely on it entirely. “Our recommendation would be to tune/adapt the model for MTP using samples from the special industrial domain,” Kirchenbauer said. “The best performance is likely achieved if the MTP adaptation is performed using prompts from the deployment domain.”

Serving compatibility and the road ahead

The research team released their trained models on Hugging Face and will soon release the code for their MTP framework. Infrastructure teams integrating these models into vLLM or SGLang will need to account for changes in how batching and KV caching are handled — but that’s a one-time engineering investment, not an ongoing burden. However, Kirchenbauer sees “no clear barriers to integration” and confirmed the team is “working with some systems experts to identify the shortest path to integration.”

Kirchenbauer’s advice for teams wanting to test the released models: start with toy prompts like counting or repeating a phrase to see ConfAdapt’s gains in action, then adapt the model using samples from your specific deployment domain for best results. “Overall we do expect that a production-ready implementation of our approach could simplify the lifecycle of building and deploying low-latency agentic models,” Kirchenbauer concluded. “While existing acceleration techniques for NTP models focus almost solely on inference harnesses and logic, our approach just bakes some of the complexity into the model itself making it largely complementary to existing work.”

Orchestration

AI Agents are delivering real ROI — Here’s what 1,100 developers and CTOs reveal about scaling them

Presented by DigitalOcean

From refactoring codebases to debugging production code, AI agents are already proving their value. But scaling them in production remains the exception, not the rule.

In DigitalOcean’s 2026 Currents research report, based on a survey of more than 1,100 developers, CTOs, and founders, 67% of organizations using agents report productivity gains. Meanwhile, 60% of respondents say applications and agents represent the greatest long-term value in the AI stack. Yet, only 10% are scaling agents in production.

The top blocker? Forty-nine percent cite the high cost of inference. It’s not just the price of a single API call. It’s the compounding cost as agents chain tasks and run autonomously. Nearly half of respondents now spend 76–100% of their AI budget on inference alone. This is a problem DigitalOcean is working to solve. What’s needed is infrastructure designed around inference economics: predictable performance, cost control under load, and fewer moving parts. That’s how 2026 becomes the year agents graduate from pilot to product.

52% of companies are actively implementing AI solutions (including agents)

Just a year ago when we ran this survey, only 35% of respondents were actively implementing AI solutions — most were still in exploration mode or running their first projects. Now it’s 52%. The shift from “let’s see what this can do” to “let’s put this into production” is well underway.

There’s an agent boom underneath these numbers. 46% of those respondents are specifically deploying AI agents, autonomous systems that execute tasks on their own rather than wait for instructions at every step. OpenClaw (formerly Moltbot and Clawdbot) is one recent example, an open-source assistant that connects to messaging apps, browses the web, executes shell commands, and runs tasks autonomously.

Where are those agents going? Mostly into code and operations:

54% said code generation and refactoring, making it the clear frontrunner
49% are automating internal operations
45% are building customer support and chatbots
43% are focused on business logic and task orchestration
41% are using agents for written content generation
27% are pursuing marketing workflow automation
21% are conducting data analysis

Developers are leading the charge here. For example, Y Combinator shared that a quarter of its Winter 2025 startups were building with codebases that are 95% AI-generated. Then there’s what Andrej Karpathy calls “vibe coding” — describing what you want in plain language and letting the AI write the code.

The tooling has split to match different workflows. Cursor bakes AI into a VS Code fork for inline edits and rapid iteration. Claude Code runs in the terminal for deeper work across entire repositories. But both have moved well beyond autocomplete. These tools now operate in agentic loops, reading files, running tests, identifying failures, and iterating until the build passes. You describe a feature. The agent implements it. Some sessions stretch for hours — no one at the keyboard.

But agents aren’t just for engineers. They’re making their way into marketing, customer success, and ops. We see this internally at DigitalOcean, too. Experimental showcases and hack days have surfaced demos of AI workflows to test ad copy at scale, personalize emails, and prioritize growth experiments.

67% of organizations using agents report measurable productivity improvements

The productivity question is the one everyone’s asking: are agents actually delivering results, or is this still hype? The data suggests the former. Overall, 67% of organizations using agents report measurable productivity improvements. And for some, the gains are substantial: 9% of respondents reported productivity increases of 75% or more.

When asked what outcomes they’ve observed from using AI agents:

53% said productivity and time savings for employees
44% reported the creation of new business capabilities
32% noted a reduced need to hire additional staff
27% saw measurable cost savings
26% reported improved customer experience

Internal research at Anthropic explores what these technologies unlock: when the company studied how its own engineers use Claude Code, it found that more than a quarter of AI-assisted work consisted of tasks that simply wouldn’t have been done otherwise. That includes scaling projects and building internal tools. It also includes exploratory work that previously wasn’t worth the time investment — but now is.

What pushes those productivity numbers even higher? Agents are learning to work together. Google’s release of the Agent Development Kit as an open-source framework marked a shift from single-purpose agents to coordinated multi-agent systems that can discover one another, exchange information, and collaborate regardless of vendor or framework.

That said, 14% have yet to see a benefit, and 19% say it’s too early to measure. From what we’re seeing, 2025 was largely a year of prototyping and experimentation, with 2026 shaping up to be when more teams move agents into production.

60% bet on applications and agents as the biggest opportunity in AI

Budgets follow the results. AI remains an active area of investment for the vast majority of organizations: only 4% of respondents said they don’t expect to invest in AI over the next 12 months. And where organizations are seeing productivity gains, they’re doubling down — on the application layer, not foundational infrastructure.

When asked where respondents expect budget growth over the next 12 months, 37% pointed to applications and agents, more than double the share for infrastructure (14%) or platforms (17%). The long-term view is even stronger: 60% see applications and agents as the greatest opportunity in the AI stack, compared to just 19% for infrastructure.

Market data backs this up. According to one report, the application layer captured $19 billion in 2025 — more than half of all generative AI spending. Coding tools led at $4 billion, representing 55% of departmental AI spend and the single largest category across the entire stack. Organizations are betting that the application layer, where AI actually touches users and workflows, will matter more than the underlying components.

49% say the cost of running AI at scale is their top barrier to growth

Agents only work if you can run them. And right now, inference is the bottleneck. Unlike training, which is a fixed upfront investment to build the model, each prompt to an agent generates tokens that incur a cost. That cost compounds with every reasoning step, retry, and self-correction cycle. At scale, this turns inference into an operational expense that can exceed the original investment in the model itself.

When we asked respondents what limits their ability to scale AI, 49% identified the high cost of inference at scale as their top barrier. This tracks with where budgets are going: 44% of respondents now spend the majority of their AI budget (76-100%) on inference, not training.

But solving for inference shouldn’t fall on developers.

The complexity of optimizing GPU configurations, managing parallelization strategies, and fine-tuning model serving infrastructure is not the kind of work most teams should be doing themselves. That’s infrastructure-level complexity, and cloud providers need to absorb it.

At DigitalOcean, this is central to how we think about our Gradient™ AI Inference Cloud. We’re investing in inference optimization so that the teams we serve don’t have to. Character.ai is a good example: they came to us needing to lower inference costs without sacrificing performance or latency. By migrating to our inference cloud platform and working closely with our team and AMD, they doubled their production inference throughput and reduced their cost per token by 50%.

That kind of outcome is what becomes possible when the platform does the heavy lifting. As agents move from pilots to production, the companies that scale successfully will be the ones that aren’t stuck solving inference on their own.

Wade Wegner is Chief Ecosystem and Growth Officer at DigitalOcean.

Orchestration