Presented by SAP
SAP consulting projects today involve a vast amount of documentation, multiple stakeholders, and compressed timelines, which often require manual knowledge retrieval from online SAP documentation. At the same time, cloud ERP programs now demand faster design cycles, continuous enhancements rather than big-bang rollouts, and near-real-time decision-making. Joule for Consultants, SAP’s conversational AI solution, was designed to help meet these expectations and support consultants throughout their daily tasks, from reconciling best practices and validating design considerations, to navigating SAP’s expanding AI, data, and application landscape.
The result: consultants work more productively than ever before, with superior results, and deliver faster, high-quality SAP cloud transformations.
That promise attracted early attention from KPMG firms, which became some of the largest SAP enablers participating in the Joule early access program, and one of SAP’s largest customers overall. The organization has onboarded 29 KPMG member firms around the world to this point, and now thousands of KPMG consultants are using Joule for Consultants in their daily work.
“For us it wasn’t about experimenting,” says Valentino Koester, global head of the SAP360 and SAP AI program at KPMG International. “It was more about positioning our people and member firm clients at the forefront of AI-enabled consulting.”
“Competitive pressure is intense in the SAP implementation market,” Koester says. “The core asset you have as a consultancy is knowledge and experience, bundled in all kinds of ways. AI and Joule for Consultants allow us to scale that knowledge instantly across our global network, making sure customers can access it from all over the world, no matter who they talk to in the organization.”
Whether it’s a junior consultant or a senior manager, Joule ensures SAP best practices, industry benchmarks, and the innovation an organization has invested in for years is not siloed somewhere in a team, or somewhere in a country where it can’t be shared at speed and with accuracy.
“This makes our teams more agile in responding to emerging client needs or regulatory shifts, when new market entrants emerge or technology changes,” Koester says. “The agility our consultants gain allows us to advise customers not just reactively, but proactively. In many cases, it becomes a form of early forecasting.”
For example, consultants can spot potential supply chain risks early through AI-enabled process mining they’ve implemented for a customer, or apply analytical capabilities with SAP Analytics Cloud before those risks actually materialize.
KPMG customer transformations typically follow KPMG’s Transformation methodology, which is closely aligned with SAP’s RISE methodology. RISE moves through six SAP Activate phases: discover, prepare, explore, realize, deploy, and run, and each has recurring challenges that can slow momentum if not addressed early, Koester explains.
In the discover phase, teams invest heavily in business-case modeling, benchmarking, and stakeholder alignment — activities that are time-intensive and difficult to complete without full visibility. The prepare phase introduces extensive project mobilization, reporting demands, early risk identification, and governance setup, any of which can stall progress before execution begins.
During the explore and realize phases, long design workshops and piles of documentation can bog down decision-making. Defects and bottlenecks must be identified as early as possible, or they risk cascading downstream as rework. In the deploy and run phases, organizations must develop and deliver training content while overcoming change resistance, which requires sustained communication to maintain adoption. Once live, continuous KPI monitoring and process optimization helps prevent issues from settling into the operating model and eroding value over time. AI can help consultants perform these tasks, as well as any necessary review and corrections, with a higher degree of accuracy, and more quickly than ever before.
“By adopting tools like Joule for Consultants, we want to enhance the work our professionals do, making them more effective and more productive, so they have more time to focus on what matters most,” Koester said. “That’s the client relationship, strategic decision-making, and delivering measurable business outcomes. “
Joule for Consultants isn’t only reducing repetitive work; it’s also reshaping how KPMG approaches SAP-enabled transformations. The AI tool has surfaced insights that traditionally required deep expertise and immediate recall, enhancing consultants’ ability to respond to market dynamics and competitive pressure.
For example, in early design workshops with customers, unless there were highly experienced consultants on site who, in real time, could answer every question about business processes, as well as explain the design considerations and all the technical logic in the new system, consultants frequently had to defer answers or validate them later, especially when novel questions or edge cases came up, slowing the momentum of the transformation.
“With Joule for Consultants, we were hoping to validate the guidance our consultants can give on the spot, in real time, to maintain implementation momentum and strengthen client confidence in their SAP system and KPMG advisors,” Koester said. “Our people can instantly surface SAP best practices, guidelines, and risk scenarios during workshops now, allowing our teams to move forward, helping reduce delay within the engagement.”
They’re seeing the same success during internal learning and enablement or sales-related activities. Joule has supercharged KPMG’s internal SAP University, a learning program for new employees. Recently, junior consultants have been able to prepare and present complex and technical RFP responses with confidence, despite limited experience, as Joule guided them through each step in a structured, high-quality way.
To ensure a smooth internal roll out process to analysts, KPMG positioned Joule not as the introduction of a new tool, but as the adoption of a new way of working.
“We emphasized the organizational impact it would have, with Joule enabling smarter, more effective ways of working,” Koester said. “Consultants quickly saw benefits, such as less time spent searching for technical information and more time to spend advising their customers and doing the functional work we’re very strong at.”
Responsible AI remained a core pillar of the rollout. In addition to aligning the initiative with KPMG’s Trusted AI Framework, every participating KPMG firm conducted a risk assessment to mitigate potential risks for clients and ensure compliant, secure use of Joule for Consultants. Early adoption therefore began with awareness-building conversations across the network, helping teams understand what these “new ways of working” look like and where AI can genuinely support delivery. KPMG also appointed a dedicated enablement team to ensure Joule for Consultants is integrated into the broader organizational picture.
“We’re working week after week on increasing the active adoption by consultants, while also using the chance to get their feedback,” he explains. “Our grassroots innovation feedback tools don’t just ask what they like and dislike, but what successful use cases they’re uncovering, and how much time they think they’re saving.”
The goal, he adds, isn’t to automate consulting, but to enable consultants to do their jobs better. And because Joule for Consultants is continuing to evolve, KPMG expects its role to expand significantly over time. SAP continues to enhance feature sets and response quality with each release, while it works on developing AI agents that collaborate with consultants and automate some portions of selected workflows intelligently. KPMG is collaborating closely with SAP to responsibly integrate these capabilities into more phases of transformation projects as they mature, so that Joule becomes a part of its overall approach to consulting.
“If early signals hold, Joule for Consultants isn’t just a helpful tool — it’s on track to become a standard in how SAP projects are delivered,” Koester says.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
Anthropic has released Claude Code v2.1.0, a notable update to its “vibe coding” development environment for autonomously building software, spinning up AI agents, and completing a wide range of computer tasks, according to Head of Claude Code Boris Cherny in a post on X last night.
The release introduces improvements across agent lifecycle control, skill development, session portability, and multilingual output — all bundled in a dense package of 1,096 commits.
It comes amid a growing wave of praise for Claude Code from software developers and startup founders on X, as they increasingly use the system — powered by Anthropic’s Claude model family, including the flagship Opus 4.5 — to push beyond simple completions and into long-running, modular workflows.
Claude Code was originally released as a “command line” tool back in February 2025, almost a year ago, alongside Anthropic’s then cutting-edge Claude Sonnet 3.7 large language model (LLM). It has been updated various times since then, as Anthropic has also advanced its underlying LLMs.
The new version, Claude Code 2.1.0 introduces infrastructure-level features aimed at developers deploying structured workflows and reusable skills. These changes reduce the manual scaffolding required to manage agents across sessions, tools, and environments — letting teams spend less time on configuration and more time on building.
Key additions include:
Hooks for agents, skills, and slash commands, enabling scoped PreToolUse, PostToolUse, and Stop logic. This gives developers fine-grained control over state management, tool constraints, and audit logging — reducing unexpected behavior and making agent actions easier to debug and reproduce.
Hot reload for skills, so new or updated skills in ~/.claude/skills or .claude/skills become available immediately without restarting sessions. Developers can iterate on skill logic in real time, eliminating the stop-start friction that slows down experimentation.
Forked sub-agent context via context: fork in skill frontmatter, allowing skills and slash commands to run in isolated contexts. This prevents unintended side effects and makes it safer to test new logic without polluting the main agent’s state.
Wildcard tool permissions (e.g., Bash(npm *), Bash(*-h*)) for easier rule configuration and access management. Teams can define broader permission patterns with fewer rules, reducing configuration overhead and the risk of mismatched permissions blocking legitimate workflows.
Language-specific output via a language setting, enabling workflows that require output in Japanese, Spanish, or other languages. Global teams and multilingual projects no longer need post-processing workarounds to localize Claude’s responses.
Session teleportation via /teleport and /remote-env slash commands, which allow claude.ai subscribers to resume and configure remote sessions at claude.ai/code. Developers can seamlessly move work between local terminals and the web interface — ideal for switching devices or sharing sessions with collaborators.
Improved terminal UX, including Shift+Enter working out of the box in iTerm2, Kitty, Ghostty, and WezTerm without modifying terminal configs. This removes a common setup frustration and lets developers start working immediately in their preferred terminal.
Unified Ctrl+B behavior for backgrounding both agents and shell commands simultaneously. Developers can push long-running tasks to the background with a single keystroke, freeing up the terminal for other work without losing progress.
New Vim motions including ; and , to repeat f/F/t/T motions, yank operator (y, yy, Y), paste (p/P), text objects, indent/dedent (>>, <<), and line joining (J). Power users who rely on Vim-style editing can now work faster without switching mental models or reaching for the mouse.
MCP list_changed notifications, allowing MCP servers to dynamically update their available tools, prompts, and resources without requiring reconnection. This keeps workflows running smoothly when tool configurations change, avoiding interruptions and manual restarts.
Agents continue after permission denial, allowing subagents to try alternative approaches rather than stopping entirely. This makes autonomous workflows more resilient, reducing the need for human intervention when an agent hits a permissions wall.
Beyond the headline features, this release includes numerous quality-of-life improvements designed to reduce daily friction and help developers stay in flow.
/plan command shortcut to enable plan mode directly from the prompt — fewer keystrokes to switch modes means less context-switching and faster iteration on complex tasks.
Slash command autocomplete now works when / appears anywhere in input, not just at the beginning. Developers can compose commands more naturally without backtracking to the start of a line.
Real-time thinking block display in Ctrl+O transcript mode, giving developers visibility into Claude’s reasoning as it happens. This makes it easier to catch misunderstandings early and steer the agent before it goes down the wrong path.
respectGitignore support in settings.json for per-project control over @-mention file picker behavior. Teams can keep sensitive or irrelevant files out of suggestions, reducing noise and preventing accidental exposure of ignored content.
IS_DEMO environment variable to hide email and organization from the UI, useful for streaming or recording sessions. Developers can share their work publicly without leaking personal or company information.
Skills progress indicators showing tool uses as they happen during execution. Developers get real-time feedback on what Claude is doing, reducing uncertainty during long-running operations and making it easier to spot issues mid-flight.
Skills visible in slash command menu by default from /skills/ directories (opt-out with user-invocable: false in frontmatter. Custom skills are now more discoverable, helping teams adopt shared workflows without hunting through documentation.
Improved permission prompt UX with Tab hint moved to footer, cleaner Yes/No input labels with contextual placeholders. Clearer prompts mean fewer mistakes and faster decisions when approving tool access.
Multiple startup performance optimizations and improved terminal rendering performance, especially for text with emoji, ANSI codes, and Unicode characters. Faster startup and smoother rendering reduce waiting time and visual distractions, keeping developers focused on the task at hand.
The release also addresses numerous bug fixes, including a security fix where sensitive data (OAuth tokens, API keys, passwords) could be exposed in debug logs, fixes for session persistence after transient server errors, and resolution of API context overflow when background tasks produce large output. Together, these fixes improve reliability and reduce the risk of data leaks or lost work.
Claude Code 2.1.0 arrives in the midst of a significant shift in developer behavior. Originally built as an internal tool at Anthropic, Claude Code is now gaining real traction among external power users — especially those building autonomous workflows, experimenting with agent tooling, and integrating Claude into terminal-based pipelines.
According to X discussions in late December 2025 and early January 2026, enthusiasm surged as developers began describing Claude Code as a game-changer for “vibe coding,” agent composition, and productivity at scale.
@JsonBasedman captured the prevailing sentiment: “I don’t even see the timeline anymore, it’s just ‘Holy shit Claude code is so good’…”
“Claude Code addiction is real,” opined Matt Shumer, co-founder and CEO of Hyperwrite/Otherside AI, in another X post.
Non-developers have embraced the accessibility. @LegallyInnovate, a lawyer, noted: “Trying Claude code for the first time today. I’m a lawyer not a developer. It’s AMAZING. I am blown away and probably not even scratching the surface. “
Some users are shifting away from popular alternatives — @troychaplin switched from Cursor, calling Claude Code “so much better!” for standalone use.
Claude Code has even fueled discussion that Anthropic has actually achieved artificial generalized intelligence, AGI, the so-called “holy grail” of artificial systems development — something that outperforms humans at most “economically valuable work,” according to the definition offered by Anthropic rival OpenAI.
@deepfates argued that Claude Code may not be AGI, but that “if Claude Code is good enough to to do that, combine ideas on the computer, then I think it is ‘artificial general intellect’ at least. And that is good enough to create a new frontier…”
A clear pattern emerges: users who engage with Claude Code as an orchestration layer — configuring tools, defining reusable components, and layering logic — report transformative results. Those treating it as a standard AI assistant often find its limitations more apparent.
Claude Code 2.1.0 doesn’t try to paper over those divisions — it builds for the advanced tier. Features like agent lifecycle hooks, hot-reloading of skills, wildcard permissioning, and session teleportation reinforce Claude Code’s identity as a tool for builders who treat agents not as chatbots, but as programmable infrastructure.
In total, these updates don’t reinvent Claude Code, but they do lower friction for repeat users and unlock more sophisticated workflows. For teams orchestrating multi-step agent logic, Claude Code 2.1.0 makes Claude feel less like a model — and more like a framework.
Claude Code is available to Claude Pro ($20/month), Claude Max ($100/month), Claude Team (Premium Seat, $150 per month) with and Claude Enterprise (variable pricing) subscribers.
The /teleport and /remote-env commands require access to Claude Code’s web interface at claude.ai/code. Full installation instructions and documentation are available at code.claude.com/docs/en/setup.
With reusable skills, lifecycle hooks, and improved agent control, Claude Code continues evolving from a chat-based coding assistant into a structured environment for programmable, persistent agents.
As enterprise teams and solo builders increasingly test Claude in real workflows — from internal copilots to complex bash-driven orchestration — version 2.1.0 makes it easier to treat agents as first-class components of a production stack.
Anthropic appears to be signaling that it views Claude Code not as an experiment, but as infrastructure. And with this release, it’s building like it means it.
Nvidia CEO Jensen Huang said last year that we are now entering the age of physical AI. While the company continues to offer LLMs for software use cases, Nvidia is increasingly positioning itself as a provider of AI models for fully AI-powered systems — including agentic AI in the physical world.
At CES 2026, Nvidia announced a slate of new models designed to push AI agents beyond chat interfaces and into physical environments.
Nvidia launched Cosmos Reason 2, the latest version of its vision-language model designed for embodied reasoning. Cosmos Reason 1, released last year, introduced a two-dimensional ontology for embodied reasoning and currently leads Hugging Face’s physical reasoning for video leaderboard.
Cosmos Reason 2 builds on the same ontology while giving enterprises more flexibility to customize applications and enabling physical agents to plan their next actions, similar to how software-based agents reason through digital workflows.
Nvidia also released a new version of Cosmos Transfer, a model that lets developers generate training simulations for robots.
Other vision-language models, such as Google’s PaliGemma and Pixtral Large from Mistral, can process visual inputs, but not all commercially available VLMs support reasoning.
“Robotics is at an inflection point. We are moving from specialist robots limited to single tasks to generalist specialist systems,” said Kari Briski, Nvidia vice president for generative AI software, in a briefing with reporters. She was referring to robots that combine broad foundational knowledge with deep task-specific skills. “These new robots combine broad fundamental knowledge with deep proficiency and complex tasks.”
She added that Cosmos Reason 2 “enhances the reasoning capabilities that robots need to navigate the unpredictable physical world.”
Briski noted that Nvidia’s roadmap follows “the same pattern of assets across all of our open models.”
“In building specialized AI agents, a digital workforce, or the physical embodiment of AI in robots and autonomous vehicles, more than just the model is needed,” Briski said. “First, the AI needs the compute resources to train, simulate the world around it. Data is the fuel for AI to learn and improve and we contribute to the world’s largest collection of open and diverse datasets, going beyond just opening the weights of the models. The open libraries and training scripts give developers the tools to purpose-build AI for their applications, and we publish blueprints and examples to help deploy AI as systems of models.”
The company now has open models specifically for physical AI in Cosmos, robotics, with the open-reasoning vision-language-action (VLA) model Gr00t and its Nemotron models for agentic AI.
Nvidia is making the case that open models across different branches of AI form a shared enterprise ecosystem that feeds data, training, and reasoning to agents in both the digital and physical worlds.
Briski said Nvidia plans to continue expanding its open models, including its Nemotron family, beyond reasoning to include a new RAG and embeddings model to make information more readily available to agents. The company released Nemotron 3, the latest version of its agentic reasoning models, in December.
Nvidia announced three new additions to the Nemotron family: Nemotron Speech, Nemotron RAG and Nemotron Safety.
In a blog post, Nvidia said Nemotron Speech delivers “real-time low-latency speech recognition for live captions and speech AI applications” and is 10 times faster than other speech models.
Nemotron RAG is technically comprised of two models: an embedding model and a rerank model, both of which can understand images to provide more multimodal insights that data agents will tap.
“Nemotron RAG is on top of what we call the MMTab, or the Massive Multilingual Text Embedding Benchmark, with strong multilingual performance while using less computing power memory, so they are a good fit for systems that must handle a lot of requests very quickly and with low delay,” Briski said.
Nemotron Safety detects sensitive data so AI agents do not accidentally unleash personally identifiable data.
Fintech Brex is betting that the future of enterprise AI isn’t better orchestration — it’s less of it.
As generative AI agents move from copilots to autonomous systems, Brex CTO James Reggio says traditional agent orchestration frameworks are becoming a constraint rather than an enabler. Instead of relying on a central coordinator or rigid workflows, Brex has built what it calls an “Agent Mesh”: a network of narrow, role-specific agents that communicate in plain language and operate independently — but with full visibility.
“Our goal is to use AI to make Brex effectively disappear,” Reggio told VentureBeat. “We’re aiming for total automation.”
Brex learned that for its purposes, agents need to work in narrow, specific roles to be more modular, flexible, and auditable.
Reggio said the architectural goal is to enable every manager in an enterprise “to have a single point of contact within Brex that’s handling the totality of their responsibilities, be it spend management, requesting travel, or approving spend limit requests.”
The financial services industry has long embraced AI and machine learning to handle the massive amounts of data it processes. But when it comes to bringing AI models and agents, the industry took a more cautious road at the beginning. Now, more financial services companies, including Brex, have launched AI-powered platforms and several agentic workflows.
Brex’s first foray into generative AI was with its Brex Assistant, released in 2023, which helped customers automate certain finance and expense tasks. It provides suggestions to complete expenses, automatically fills in information, and follows up on expenses that violate policies.
Reggio acknowledges that Brex Assistant works, but it’s not enough. “I think to some degree, it remains a bit of a technology where we don’t entirely know the limits of it,” he said. “There’s quite a large number of patterns that need to exist around it that are kind of being developed by the industry as the technology matures and as more companies build with it.”
Brex Assistant uses multiple models, including Anthropic’s Claude and custom Brex-models, as well as OpenAI’s API. The assistant automates some tasks but is still limited in how low-touch it can be.
Reggio said Brex Assistant still plays a big role in the company’s autonomy journey, mainly because its Agent Mesh product flows into the application.
The consensus in the industry is that multi-agent ecosystems, in which agents communicate to accomplish tasks, require an orchestration framework to guide them.
Reggio, on the other hand, has a different take. “Deterministic orchestration infrastructure … was a solution for the problems that we saw two years ago, which was that agents, just like the models, hallucinate a lot,” Reggio said. “They’re not very good with multiple tools, so you need to give them these degrees of freedom, but in a more structured, rigid system. But as the models get better, I think it’s starting to hold back the range of possibilities that are expanding.”
More traditional agent orchestration architectures either focus on a single agent that does everything or, more commonly, coordinator/orchestrator plus tool agents that explicitly define workflows. Reggio said both frameworks are too rigid and solve issues more commonly seen in traditional software than in AI.
The difference, Reggio argues, is structural:
Traditional orchestration: predefined workflows, central coordinator, deterministic paths
Agent Mesh: event-driven, role-specialized agents, message-based coordination
Agent Mesh relies on stitching together networks of many small agents, each specializing in a single task. The agents, once again using the hybrid mix of models as with the Brex Assistant, communicate with other agents “in plain English” over a shared message stream. A routing model quickly determines which tools to invoke, he said.
A single reimbursement request triggers several tasks: a compliance check to align with expense policies, budget validation, receipt matching, and then payment initiation. While an agent can certainly be coded to do all of that, this method is “brittle and error-prone,” and it responds to new information shared through a message stream anyway.
Reggio said the idea is to disambiguate all of those separate tasks and assign them to smaller agents instead. He likened the architecture to a Wi-Fi mesh, where no single node controls the system — reliability emerges from many small, overlapping contributors.
“We basically found a really good fit with the idea of embodying specific roles as agents on top of the best platform to manage specific responsibilities, much like how you might delegate accounts payable to one team versus expense management to another team,” Reggio said.
Brex defines three core ideas in the Agent Mesh architecture:
Config, where definitions of the agent, model, tools and subscription live
MessageStream, a log of every message, tool call and state transition
Clock, which ensures deterministic ordering
Brex also built evaluations into the system, in which the LLM acts as a judge, and an audit agent reviews each agent’s decisions to ensure they adhere to accuracy and behavioral policies.
Brex says it has seen substantial efficiency gains among its customers in its AI ecosystem. Brex did not provide third-party benchmarks or customer-specific data to validate those gains.
But Reggio said enterprise customers using Brex Assistant and the company’s machine learning systems “are able to achieve 99% automation, especially for customers that really leaned into AI.”
This is a marked improvement from the 60 to 70% Brex customers who were able to automate their expense processes before the launch of Brex Assistant.
The company is still early in its autonomy journey, Reggio said. But if the Agent Mesh approach works, the most successful outcome may be invisible: employees no longer thinking about expenses at all.
For decades, we have adapted to software. We learned shell commands, memorized HTTP method names and wired together SDKs. Each interface assumed we would speak its language. In the 1980s, we typed ‘grep’, ‘ssh’ and ‘ls’ into a shell; by the mid-2000s, we were invoking REST endpoints like GET /users; by the 2010s, we imported SDKs (client.orders.list()) so we didn’t have to think about HTTP. But underlying each of those steps was the same premise: Expose capabilities in a structured form so others can invoke them.
But now we are entering the next interface paradigm. Modern LLMs are challenging the notion that a user must choose a function or remember a method signature. Instead of “Which API do I call?” the question becomes: “What outcome am I trying to achieve?” In other words, the interface is shifting from code → to language. In this shift, Model Context Protocol (MCP) emerges as the abstraction that allows models to interpret human intent, discover capabilities and execute workflows, effectively exposing software functions not as programmers know them, but as natural-language requests.
MCP is not a hype-term; multiple independent studies identify the architectural shift required for “LLM-consumable” tool invocation. One blog by Akamai engineers describes the transition from traditional APIs to “language-driven integrations” for LLMs. Another academic paper on “AI agentic workflows and enterprise APIs” talks about how enterprise API architecture must evolve to support goal-oriented agents rather than human-driven calls. In short: We are no longer merely designing APIs for code; we are designing capabilities for intent.
Why does this matter for enterprises? Because enterprises are drowning in internal systems, integration sprawl and user training costs. Workers struggle not because they don’t have tools, but because they have too many tools, each with its own interface. When natural language becomes the primary interface, the barrier of “which function do I call?” disappears. One recent business blog observed that natural‐language interfaces (NLIs) are enabling self-serve data access for marketers who previously had to wait for analysts to write SQL. When the user just states intent (like “fetch last quarter revenue for region X and flag anomalies”), the system underneath can translate that into calls, orchestration, context memory and deliver results.
To understand how this evolution works, consider the interface ladder:
|
Era |
Interface |
Who it was built for |
|
CLI |
Shell commands |
Expert users typing text |
|
API |
Web or RPC endpoints |
Developers integrating systems |
|
SDK |
Library functions |
Programmers using abstractions |
|
Natural language (MCP) |
Intent-based requests |
Human + AI agents stating what they want |
Through each step, humans had to “learn the machine’s language.” With MCP, the machine absorbs the human’s language and works out the rest. That’s not just UX improvement, it’s an architectural shift.
Under MCP, functions of code are still there: data access, business logic and orchestration. But they’re discovered rather than invoked manually. For example, rather than calling “billingApi.fetchInvoices(customerId=…),” you say “Show all invoices for Acme Corp since January and highlight any late payments.” The model resolves the entities, calls the right systems, filters and returns structured insight. The developer’s work shifts from wiring endpoints to defining capability surfaces and guardrails.
This shift transforms developer experience and enterprise integration. Teams often struggle to onboard new tools because they require mapping schemas, writing glue code and training users. With a natural-language front, onboarding involves defining business entity names, declaring capabilities and exposing them via the protocol. The human (or AI agent) no longer needs to know parameter names or call order. Studies show that using LLMs as interfaces to APIs can reduce the time and resources required to develop chatbots or tool-invoked workflows.
The change also brings productivity benefits. Enterprises that adopt LLM-driven interfaces can turn data access latency (hours/days) into conversation latency (seconds). For instance, if an analyst previously had to export CSVs, run transforms and deploy slides, a language interface allows “Summarize the top five risk factors for churn over the last quarter” and generate narrative + visuals in one go. The human then reviews, adjusts and acts — shifting from data plumber to decision maker. That matters: According to a survey by McKinsey & Company, 63% of organizations using gen AI are already creating text outputs, and more than one-third are generating images or code. (While many are still in the early days of capturing enterprise-wide ROI, the signal is clear: Language as interface unlocks new value.
In architectural terms, this means software design must evolve. MCP demands systems that publish capability metadata, support semantic routing, maintain context memory and enforce guardrails. An API design no longer needs to ask “What function will the user call?”, but rather “What intent might the user express?” A recently published framework for improving enterprise APIs for LLMs shows how APIs can be enriched with natural-language-friendly metadata so that agents can select tools dynamically. The implication: Software becomes modular around intent surfaces rather than function surfaces.
Language-first systems also bring risks and requirements. Natural language is ambiguous by nature, so enterprises must implement authentication, logging, provenance and access control, just as they did for APIs. Without these guardrails, an agent might call the wrong system, expose data or misinterpret intent. One post on “prompt collapse” calls out the danger: As natural-language UI becomes dominant, software may turn into “a capability accessed through conversation” and the company into “an API with a natural-language frontend”. That transformation is powerful, but only safe if systems are designed for introspection, audit and governance.
The shift also has cultural and organizational ramifications. For decades, enterprises hired integration engineers to design APIs and middleware. With MCP-driven models, companies will increasingly hire ontology engineers, capability architects and agent enablement specialists. These roles focus on defining the semantics of business operations, mapping business entities to system capabilities and curating context memory. Because the interface is now human-centric, skills such as domain knowledge, prompt framing, oversight and evaluation become central.
What should enterprise leaders do today? First, think of natural language as the interface layer, not as a fancy add-on. Map your business workflows that can safely be invoked via language. Then catalogue the underlying capabilities you already have: data services, analytics and APIs. Then ask: “Are these discoverable? Can they be called via intent?” Finally, pilot an MCP-style layer: Build a small domain (customer support triage) where a user or agent can express outcomes in language, and let systems do the orchestration. Then iterate and scale.
Natural language is not just the new front-end. It is becoming the default interface layer for software, replacing CLI, then APIs, then SDKs. MCP is the abstraction that makes this possible. Benefits include faster integration, modular systems, higher productivity and new roles. For those organizations still tethered to calling endpoints manually, the shift will feel like learning a new platform all over again. The question is no longer “which function do I call?” but “what do I want to do?”
Dhyey Mavani is accelerating gen AI and computational mathematics.