A team of researchers led by Nvidia has released DreamDojo, a new AI system designed to teach robots how to interact with the physical world by watching tens of thousands of hours of human video — a development that could significantly reduce the time and cost required to train the next generation of humanoid machines.
The research, published this month and involving collaborators from UC Berkeley, Stanford, the University of Texas at Austin, and several other institutions, introduces what the team calls “the first robot world model of its kind that demonstrates strong generalization to diverse objects and environments after post-training.”
At the core of DreamDojo is what the researchers describe as “a large-scale video dataset” comprising “44k hours of diverse human egocentric videos, the largest dataset to date for world model pretraining.” The dataset, called DreamDojo-HV, is a dramatic leap in scale — “15x longer duration, 96x more skills, and 2,000x more scenes than the previously largest dataset for world model training,” according to the project documentation.
The system operates in two distinct phases. First, DreamDojo “acquires comprehensive physical knowledge from large-scale human datasets by pre-training with latent actions.” Then it undergoes “post-training on the target embodiment with continuous robot actions” — essentially learning general physics from watching humans, then fine-tuning that knowledge for specific robot hardware.
For enterprises considering humanoid robots, this approach addresses a stubborn bottleneck. Teaching a robot to manipulate objects in unstructured environments traditionally requires massive amounts of robot-specific demonstration data — expensive and time-consuming to collect. DreamDojo sidesteps this problem by leveraging existing human video, allowing robots to learn from observation before ever touching a physical object.
One of the technical breakthroughs is speed. Through a distillation process, the researchers achieved “real-time interactions at 10 FPS for over 1 minute” — a capability that enables practical applications like live teleoperation and on-the-fly planning. The team demonstrated the system working across multiple robot platforms, including the GR-1, G1, AgiBot, and YAM humanoid robots, showing what they call “realistic action-conditioned rollouts” across “a wide range of environments and object interactions.”
The release comes at a pivotal moment for Nvidia’s robotics ambitions — and for the broader AI industry. At the World Economic Forum in Davos last month, CEO Jensen Huang declared that AI robotics represents a “once-in-a-generation” opportunity, particularly for regions with strong manufacturing bases. According to Digitimes, Huang has also stated that the next decade will be “a critical period of accelerated development for robotics technology.”
The financial stakes are enormous. Huang told CNBC’s “Halftime Report” on February 6 that the tech industry’s capital expenditures — potentially reaching $660 billion this year from major hyperscalers — are “justified, appropriate and sustainable.” He characterized the current moment as “the largest infrastructure buildout in human history,” with companies like Meta, Amazon, Google, and Microsoft dramatically increasing their AI spending.
That infrastructure push is already reshaping the robotics landscape. Robotics startups raised a record $26.5 billion in 2025, according to data from Dealroom. European industrial giants including Siemens, Mercedes-Benz, and Volvo have announced robotics partnerships in the past year, while Tesla CEO Elon Musk has claimed that 80 percent of his company’s future value will come from its Optimus humanoid robots.
For technical decision-makers evaluating humanoid robots, DreamDojo’s most immediate value may lie in its simulation capabilities. The researchers highlight downstream applications including “reliable policy evaluation without real-world deployment and model-based planning for test-time improvement” — capabilities that could let companies simulate robot behavior extensively before committing to costly physical trials.
This matters because the gap between laboratory demonstrations and factory floors remains significant. A robot that performs flawlessly in controlled conditions often struggles with the unpredictable variations of real-world environments — different lighting, unfamiliar objects, unexpected obstacles. By training on 44,000 hours of diverse human video spanning thousands of scenes and nearly 100 distinct skills, DreamDojo aims to build the kind of general physical intuition that makes robots adaptable rather than brittle.
The research team, led by Linxi “Jim” Fan, Joel Jang, and Yuke Zhu, with Shenyuan Gao and William Liang as co-first authors, has indicated that code will be released publicly, though a timeline was not specified.
Whether DreamDojo translates into commercial robotics products remains to be seen. But the research signals where Nvidia’s ambitions are heading as the company increasingly positions itself beyond its gaming roots. As Kyle Barr observed at Gizmodo earlier this month, Nvidia now views “anything related to gaming and the ‘personal computer'” as “outliers on Nvidia’s quarterly spreadsheets.”
The shift reflects a calculated bet: that the future of computing is physical, not just digital. Nvidia has already invested $10 billion in Anthropic and signaled plans to invest heavily in OpenAI’s next funding round. DreamDojo suggests the company sees humanoid robots as the next frontier where its AI expertise and chip dominance can converge.
For now, the 44,000 hours of human video at the heart of DreamDojo represent something more fundamental than a technical benchmark. They represent a theory — that robots can learn to navigate our world by watching us live in it. The machines, it turns out, have been taking notes.
Presented by F5
As enterprises pour billions into GPU infrastructure for AI workloads, many are discovering that their expensive compute resources sit idle far more than expected. The culprit isn’t the hardware. It’s the often-invisible data delivery layer between storage and compute that’s starving GPUs of the information they need.
“While people are focusing their attention, justifiably so, on GPUs, because they’re very significant investments, those are rarely the limiting factor,” says Mark Menger, solutions architect at F5. “They’re capable of more work. They’re waiting on data.”
AI performance increasingly depends on an independent, programmable control point between AI frameworks and object storage — one that most enterprises haven’t deliberately architected. As AI workloads scale, bottlenecks and instability happens when AI frameworks are tightly coupled to specific storage endpoints during scaling events, failures, and cloud transitions.
“Traditional storage access patterns were not designed for highly parallel, bursty, multi-consumer AI workloads,” says Maggie Stringfellow, VP, product management – BIG-IP. “Efficient AI data movement requires a distinct data delivery layer designed to abstract, optimize, and secure data flows independently of storage systems, because GPU economics make inefficiency immediately visible and expensive.”
These bidirectional patterns include massive ingestion from continuous data capture, simulation output, and model checkpoints. Combined with read-intensive training and inference workloads, they stress the tightly coupled infrastructure upon which the storage systems are reliant.
While storage vendors have done significant work in scaling the data throughput into and out of their systems, that focus on throughput alone creates knock-on effects across the switching, traffic management, and security layers coupled to storage.
The stress on S3-compatible systems from AI workloads is multidimensional and differs significantly from traditional application patterns. It’s less about raw throughput and more about concurrency, metadata pressure, and fan-out considerations. Training and fine-tuning create particularly challenging patterns, like massive parallel reads of small to mid-size objects. These workloads also involve repeated passes through training data across epochs and periodic checkpoint write bursts.
RAG workloads introduce their own complexity through request amplification. A single request can fan out into dozens or hundreds of additional data chunks, cascading into further detail, related chunks, and more complex documents. The stress concentration is less about capacity, storage system speed, and more about request management and traffic shaping.
When AI frameworks connect directly to storage endpoints without an intermediate delivery layer, operational fragility compounds quickly during scaling events, failures, and cloud transitions, which can have major consequences.
“Any instability in the storage service now has an uncontained blast radius,” Menger says. “Anything here becomes a system failure, not a storage failure. Or frankly, aberrant behavior in one application can have knock-on effects to all consumers of that storage service.”
Menger describes a pattern he’s seen with three different customers, where tight coupling cascaded into complete system failures.
“We see large training or fine-tuning workloads overwhelm the storage infrastructure, and the storage infrastructure goes down,” he explains. “At that scale, the recovery is never measured in seconds. Minutes if you’re lucky. Usually hours. The GPUs are now not being fed. They’re starved for data. These high value resources, for that entire time the system is down, are negative ROI.”
The financial impact of introducing an independent data delivery layer extends beyond preventing catastrophic failures.
Decoupling allows data access to be optimized independently of storage hardware, improving GPU utilization by reducing idle time and contention while improving cost predictability and system performance as scale increases, Stringfellow says.
“It enables intelligent caching, traffic shaping, and protocol optimization closer to compute, which lowers cloud egress and storage amplification costs,” she explains. “Operationally, this isolation protects storage systems from unbounded AI access patterns, resulting in more predictable cost behavior and stable performance under growth and variability.”
F5’s answer is to position its Application Delivery and Security Platform, powered by BIG-IP, as a “storage front door” that provides health-aware routing, hotspot avoidance, policy enforcement, and security controls without requiring application rewrites.
“Introducing a delivery tier in between compute and storage helps define boundaries of accountability,” Menger says. “Compute is about execution. Storage is about durability. Delivery is about reliability.”
The programmable control point, which uses event-based, conditional logic rather than generative AI, enables intelligent traffic management that goes beyond simple load balancing. Routing decisions are based on real backend health, using intelligent health awareness to detect early signs of trouble. This includes monitoring leading indicators of trouble. And when problems emerge, the system can isolate misbehaving components without taking down the entire service.
“An independent, programmable data delivery layer becomes necessary because it allows policy, optimization, security, and traffic control to be applied uniformly across both ingestion and consumption paths without modifying storage systems or AI frameworks,” Stringfellow says. “By decoupling data access from storage implementation, organizations can safely absorb bursty writes, optimize reads, and protect backend systems from unbounded AI access patterns.”
AI isn’t just pushing storage teams on throughput, it’s forcing them to treat data movement as both a performance and security problem, Stringfellow says. Security can no longer be assumed simply because data sits deep in the data center. AI introduces automated, high-volume access patterns that must be authenticated, encrypted, and governed at speed. That’s where F5 BIG-IP comes into play.
“F5 BIG-IP sits directly in the AI data path to deliver high-throughput access to object storage while enforcing policy, inspecting traffic, and making payload-informed traffic management decisions,” Stringfellow says. “Feeding GPUs quickly is necessary, but not sufficient; storage teams now need confidence that AI data flows are optimized, controlled, and secure.”
Looking ahead, the requirements for data delivery will only intensify, Stringfellow says.
“AI data delivery will shift from bulk optimization toward real-time, policy-driven data orchestration across distributed systems,” she says. “Agentic and RAG-based architectures will require fine-grained runtime control over latency, access scope, and delegated trust boundaries. Enterprises should start treating data delivery as programmable infrastructure, not a byproduct of storage or networking. The organizations that do this early will scale faster and with less risk.”
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
OpenAI on Wednesday released GPT-5.3-Codex, which the company calls its most capable coding agent to date, in an announcement timed to land at the exact same moment Anthropic unveiled its own flagship model upgrade, Claude Opus 4.6. The synchronized launches mark the opening salvo in what industry observers are calling the AI coding wars — a high-stakes battle to capture the enterprise software development market.
The dueling announcements came amid an already heated week between the two AI giants, who are also set to air competing Super Bowl advertisements on Sunday, and whose executives have been trading barbs publicly over business models, access, and corporate ethics.
“I love building with this model; it feels like more of a step forward than the benchmarks suggest,” OpenAI CEO Sam Altman wrote on X minutes after the launch. He later added: “It was amazing to watch how much faster we were able to ship 5.3-Codex by using 5.3-Codex, and for sure this is a sign of things to come.”
That claim — that the model helped build itself — is a significant milestone in AI development. According to OpenAI’s announcement, the Codex team used early versions of GPT-5.3-Codex to debug its own training runs, manage deployment infrastructure, and diagnose test results and evaluations. The company describes it as “our first model that was instrumental in creating itself.”
The new model posts substantial gains across multiple industry benchmarks. GPT-5.3-Codex achieves 57% on SWE-Bench Pro, a rigorous evaluation of real-world software engineering that spans four programming languages and tests contamination-resistant, industrially relevant challenges. It scores 77.3% on Terminal-Bench 2.0, which measures the terminal skills essential for coding agents, and 64% on OSWorld, an agentic computer-use benchmark where models must complete productivity tasks in visual desktop environments.
The Terminal-Bench 2.0 result is particularly striking. According to performance data released Wednesday, GPT-5.3-Codex scored 77.3% compared to GPT-5.2-Codex’s 64.0% and the base GPT-5.2 model’s 62.2% — a 13-percentage-point leap in a single generation. One user on X noted that the score “absolutely demolished” Anthropic’s Opus 4.6, which reportedly achieved 65.4% on the same benchmark.
OpenAI also claims the model accomplishes these results with dramatically improved efficiency: less than half the tokens of its predecessor for equivalent tasks, plus more than 25% faster inference per token.
“Notably, GPT-5.3-Codex does so with fewer tokens than any prior model, letting users simply build more,” the company stated in its announcement.
Perhaps more significant than the benchmark improvements is OpenAI’s positioning of GPT-5.3-Codex as a model that transcends pure coding. The company explicitly states that “Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.”
This expanded capability set includes debugging, deploying, monitoring, writing product requirement documents, editing copy, conducting user research, building slide decks, and analyzing data in spreadsheet applications. The model shows strong performance on GDPVal, an OpenAI evaluation released in 2025 that measures performance on well-specified knowledge-work tasks across 44 occupations.
The expansion signals OpenAI’s ambition to capture not just the developer tools market but the broader enterprise productivity software space — a market that includes established players like Microsoft, Salesforce, and ServiceNow, all of whom are racing to embed AI agents into their platforms.
The pivot toward general-purpose computing brings new security considerations. In a notable disclosure, OpenAI revealed that GPT-5.3-Codex is the first model it classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and the first directly trained to identify software vulnerabilities.
“While we don’t have definitive evidence it can automate cyber attacks end-to-end, we’re taking a precautionary approach and deploying our most comprehensive cybersecurity safety stack to date,” the company stated. Mitigations include dual-use safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines incorporating threat intelligence.
Altman highlighted this development on X: “This is our first model that hits ‘high’ for cybersecurity on our preparedness framework. We are piloting a Trusted Access framework, and committing $10 million in API credits to accelerate cyber defense.”
The company is also expanding the private beta of Aardvark, its security research agent, and partnering with open-source maintainers to provide free codebase scanning for widely used projects. OpenAI cited Next.js as an example where a security researcher used Codex to discover vulnerabilities disclosed last week.
The cybersecurity announcement, however, has been overshadowed by the increasingly personal nature of the OpenAI-Anthropic rivalry. The timing of Wednesday’s release cannot be understood without the context of OpenAI’s intensifying competition with Anthropic, the AI safety-focused startup founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei.
Both companies scheduled major product announcements for 10 a.m. Pacific Time today. Anthropic unveiled Claude Opus 4.6, which it describes as its “smartest model” that “plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.”
The head-to-head timing follows a week of escalating tensions. Anthropic announced it will air Super Bowl advertisements mocking OpenAI’s recent decision to begin testing ads within ChatGPT for free users.
Altman responded with unusual directness, calling the advertisements “funny” but “clearly dishonest” in an extensive X post.
“We would obviously never run ads in the way Anthropic depicts them. We are not stupid and we know our users would reject that,” Altman wrote. “I guess it’s on brand for Anthropic doublespeak to use a deceptive ad to critique theoretical deceptive ads that aren’t real, but a Super Bowl ad is not where I would expect it.”
He went further, characterizing Anthropic as an “authoritarian company” that “wants to control what people do with AI.”
“Anthropic serves an expensive product to rich people,” Altman wrote. “More Texans use ChatGPT for free than total people use Claude in the US, so we have a differently-shaped problem than they do.”
The public sparring masks a deadly serious business competition. The rivalry plays out against a backdrop of explosive enterprise AI adoption, where both companies are fighting for position in a rapidly expanding market.
According to survey data from Andreessen Horowitz released this week, enterprise spending on large language models has dramatically outpaced even bullish projections. Average enterprise LLM spending reached $7 million in 2025, 180% higher than 2024’s actual spending of $2.5 million — and 56% above what enterprises had projected for 2025 just a year earlier. Spending is projected to reach $11.6 million per enterprise in 2026, a further 65% increase.
The a16z data reveals shifting market dynamics that help explain the intensity of the competition. OpenAI maintains the largest average share of enterprise AI wallet, but that share is shrinking — from 62% in 2024 to a projected 53% in 2026. Anthropic’s share, meanwhile, has grown from 14% to a projected 18% over the same period, with Google showing similar gains.
Enterprise adoption patterns tell a more nuanced story. While OpenAI leads in overall usage, only 46% of surveyed OpenAI customers are using its most capable models in production, compared to 75% for Anthropic and 76% for Google. When including testing environments, 89% of Anthropic customers are testing or using the company’s most capable models — the highest rate among major providers.
For software development specifically — one of the primary use cases for both companies’ coding agents — the a16z survey shows OpenAI with approximately 35% market share, with Anthropic claiming a substantial and growing portion of the remainder.
These market dynamics explain why both companies are positioning themselves as platforms rather than mere model providers. OpenAI on Wednesday also launched Frontier, a new platform designed to serve as a comprehensive hub for businesses adopting a range of AI tools — including those developed by third parties — that can operate together seamlessly.
“We can be the partner of choice for AI transformation for enterprise. The sky is the limit in terms of revenue we can generate from a platform like that,” Fidji Simo, OpenAI’s CEO of applications, told reporters this week.
This follows Monday’s launch of the Codex desktop application for macOS, which OpenAI says has already surpassed 500,000 downloads. The app enables users to manage multiple AI coding agents simultaneously — a capability that becomes increasingly important as enterprises deploy agents for complex, long-running tasks.
The platform ambitions require extraordinary capital. The dueling launches underscore the staggering financial requirements of frontier AI development, with both companies burning through billions while racing to establish market dominance.
Anthropic is currently in discussions for a funding round that could bring in more than $20 billion at a valuation of at least $350 billion, according to Bloomberg, and is simultaneously planning an employee tender offer at that valuation.
OpenAI, meanwhile, has disclosed that it owes more than $1 trillion in financial obligations to backers — including Oracle, Microsoft, and Nvidia — that are essentially fronting compute costs in expectation of future returns.
GPT-5.3-Codex was “co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems,” according to OpenAI’s announcement—a reference to Nvidia’s latest Blackwell-generation AI supercomputing architecture.
The financial pressure adds urgency to both companies’ enterprise strategies. Unlike established tech giants with diversified revenue streams, both Anthropic and OpenAI must prove they can generate sufficient revenue from AI products to justify their extraordinary valuations and infrastructure costs.
Looking ahead, OpenAI says GPT-5.3-Codex is available immediately for paid ChatGPT users across all Codex surfaces: the desktop app, command-line interface, IDE extensions, and web interface. API access is expected to follow.
The model includes a new interactivity feature: users can choose between “pragmatic” or “friendly” personalities — a customization Altman suggests users feel strongly about. More substantively, the model provides frequent progress updates during tasks, allowing users to interact in real time, ask questions, discuss approaches, and steer toward solutions without losing context.
“Instead of waiting for a final output, you can interact in real time,” OpenAI stated. “GPT-5.3-Codex talks through what it’s doing, responds to feedback, and keeps you in the loop from start to finish.”
The company promises more capabilities in the coming weeks, with Altman declaring: “I believe Codex is going to win.”
He concluded his response to Anthropic with a philosophical statement that frames the competition in stark terms: “This time belongs to the builders, not the people who want to control them.”
Whether that message resonates with enterprise customers — who according to a16z data cite trust, security, and compliance as their top concerns — remains to be seen. What’s clear is that the AI coding wars have begun in earnest, and neither company intends to cede ground.
Anthropic on Thursday released Claude Opus 4.6, a major upgrade to its flagship artificial intelligence model that the company says plans more carefully, sustains longer autonomous workflows, and outperforms competitors including OpenAI’s GPT-5.2 on key enterprise benchmarks — a release that arrives at a tumultuous moment for the AI industry and global software markets.
The launch comes just three days after OpenAI released its own Codex desktop application in a direct challenge to Anthropic’s Claude Code momentum, and amid a $285 billion rout in software and services stocks that investors attribute partly to fears that Anthropic’s AI tools could disrupt established enterprise software businesses.
For the first time, Anthropic’s Opus-class models will feature a 1 million token context window, allowing the AI to process and reason across vastly more information than previous versions. The company also introduced “agent teams” in Claude Code — a research preview feature that enables multiple AI agents to work simultaneously on different aspects of a coding project, coordinating autonomously.
“We’re focused on building the most capable, reliable, and safe AI systems,” an Anthropic spokesperson told VentureBeat about the announcements. “Opus 4.6 is even better at planning, helping solve the most complex coding tasks. And the new agent teams feature means users can split work across multiple agents — one on the frontend, one on the API, one on the migration — each owning its piece and coordinating directly with the others.”
The release intensifies an already fierce competition between Anthropic and OpenAI, the two most valuable privately held AI companies in the world. OpenAI on Monday released a new desktop application for its Codex artificial intelligence coding system, a tool the company says transforms software development from a collaborative exercise with a single AI assistant into something more akin to managing a team of autonomous workers.
AI coding assistants have exploded in popularity over the last year, and OpenAI said more than 1 million developers have used Codex in the past month. The new Codex app is part of OpenAI’s ongoing effort to lure users and market share away from rivals like Anthropic and Cursor.
The timing of Anthropic’s release — just 72 hours after OpenAI’s Codex launch — underscores the breakneck pace of competition in AI development tools. OpenAI faces intensifying competition from Anthropic, which posted the largest share increase of any frontier lab since May 2025, according to a recent Andreessen Horowitz survey. Forty-four percent of enterprises now use Anthropic in production, driven by rapid capability gains in software development since late 2024. The desktop launch is a strategic counter to Claude Code’s momentum.
According to Anthropic’s announcement, Opus 4.6 achieves the highest score on Terminal-Bench 2.0, an agentic coding evaluation, and leads all other frontier models on Humanity’s Last Exam, a complex multi-discipline reasoning test. On GDPval-AA — a benchmark measuring performance on economically valuable knowledge work tasks in finance, legal and other domains — Opus 4.6 outperforms OpenAI’s GPT-5.2 by approximately 144 ELO points, which translates to obtaining a higher score approximately 70% of the time.
The stakes are substantial. Asked about Claude Code’s financial performance, the Anthropic spokesperson noted that in November, the company announced that Claude Code reached $1 billion in run rate revenue only six months after becoming generally available in May 2025.
The spokesperson highlighted major enterprise deployments: “Claude Code is used by Uber across teams like software engineering, data science, finance, and trust and safety; wall-to-wall deployment across Salesforce’s global engineering org; tens of thousands of devs at Accenture; and companies across industries like Spotify, Rakuten, Snowflake, Novo Nordisk, and Ramp.”
That enterprise traction has translated into skyrocketing valuations. Earlier this month, Anthropic signed a term sheet for a $10 billion funding round at a $350 billion valuation. Bloomberg reported that Anthropic is simultaneously working on a tender offer that would allow employees to sell shares at that valuation, offering liquidity to staffers who have watched the company’s worth multiply since its 2021 founding.
One of Opus 4.6’s most significant technical improvements addresses what the AI industry calls “context rot“—the degradation of model performance as conversations grow longer. Anthropic says Opus 4.6 scores 76% on MRCR v2, a needle-in-a-haystack benchmark testing a model’s ability to retrieve information hidden in vast amounts of text, compared to just 18.5% for Sonnet 4.5.
“This is a qualitative shift in how much context a model can actually use while maintaining peak performance,” the company said in its announcement.
The model also supports outputs of up to 128,000 tokens — enough to complete substantial coding tasks or documents without breaking them into multiple requests.
For developers, Anthropic is introducing several new API features alongside the model: adaptive thinking, which allows Claude to decide when deeper reasoning would be helpful rather than requiring a binary on-off choice; four effort levels (low, medium, high, max) to control intelligence, speed and cost tradeoffs; and context compaction, a beta feature that automatically summarizes older context to enable longer-running tasks.
Anthropic, which has built its brand around AI safety research, emphasized that Opus 4.6 maintains alignment with its predecessors despite its enhanced capabilities. On the company’s automated behavior audit measuring misaligned behaviors such as deception, sycophancy, and cooperation with misuse, Opus 4.6 “showed a low rate” of problematic responses while also achieving “the lowest rate of over-refusals — where the model fails to answer benign queries — of any recent Claude model.”
When asked how Anthropic thinks about safety guardrails as Claude becomes more agentic, particularly with multiple agents coordinating autonomously, the spokesperson pointed to the company’s published framework: “Agents have tremendous potential for positive impacts in work but it’s important that agents continue to be safe, reliable, and trustworthy. We outlined our framework for developing safe and trustworthy agents last year which shares core principles developers should consider when building agents.”
The company said it has developed six new cybersecurity probes to detect potentially harmful uses of the model’s enhanced capabilities, and is using Opus 4.6 to help find and patch vulnerabilities in open-source software as part of defensive cybersecurity efforts.
The rivalry between Anthropic and OpenAI has spilled into consumer marketing in dramatic fashion. Both companies will feature prominently during Sunday’s Super Bowl. Anthropic is airing commercials that mock OpenAI’s decision to begin testing advertisements in ChatGPT, with the tagline: “Ads are coming to AI. But not to Claude.”
OpenAI CEO Sam Altman responded by calling the ads “funny” but “clearly dishonest,” posting on X that his company would “obviously never run ads in the way Anthropic depicts them” and that “Anthropic wants to control what people do with AI” while serving “an expensive product to rich people.”
The exchange highlights a fundamental strategic divergence: OpenAI has moved to monetize its massive free user base through advertising, while Anthropic has focused almost exclusively on enterprise sales and premium subscriptions.
The launch occurs against a backdrop of historic market volatility in software stocks. A new AI automation tool from Anthropic PBC sparked a $285 billion rout in stocks across the software, financial services and asset management sectors on Tuesday as investors raced to dump shares with even the slightest exposure. A Goldman Sachs basket of US software stocks sank 6%, its biggest one-day decline since April’s tariff-fueled selloff.
The selloff was triggered by a new legal tool from Anthropic, which showed the AI industry’s growing push into industries that can unlock lucrative enterprise revenue needed to fund massive investments in the technology. One trigger for Tuesday’s selloff was Anthropic’s launch of plug-ins for its Claude Cowork agent on Friday, enabling automated tasks across legal, sales, marketing and data analysis.
Thomson Reuters plunged 15.83% Tuesday, its biggest single-day drop on record; and Legalzoom.com sank 19.68%. European legal software providers including RELX, owner of LexisNexis, and Wolters Kluwer experienced their worst single-day performances in decades.
Not everyone agrees the selloff is warranted. Nvidia CEO Jensen Huang said on Tuesday that fears AI would replace software and related tools were “illogical” and “time will prove itself.” Mark Murphy, head of U.S. enterprise software research at JPMorgan, said in a Reuters report it “feels like an illogical leap” to say a new plug-in from an LLM would “replace every layer of mission-critical enterprise software.”
Among the more notable product announcements: Anthropic is releasing Claude in PowerPoint in research preview, allowing users to create presentations using the same AI capabilities that power Claude’s document and spreadsheet work. The integration puts Claude directly inside a core Microsoft product — an unusual arrangement given Microsoft’s 27% stake in OpenAI.
The Anthropic spokesperson framed the move pragmatically in an interview with VentureBeat: “Microsoft has an official add-in marketplace for Office products with multiple add-ins available to help people with slide creation and iteration. Any developer can build a plugin for Excel or PowerPoint. We’re participating in that ecosystem to bring Claude into PowerPoint. This is about participating in the ecosystem and giving users the ability to work with the tools that they want, in the programs they want.”
Data from a16z’s recent enterprise AI survey suggests both Anthropic and OpenAI face an increasingly competitive landscape. While OpenAI remains the most widely used AI provider in the enterprise, with approximately 77% of surveyed companies using it in production in January 2026, Anthropic’s adoption is rising rapidly — from near-zero in March 2024 to approximately 40% using it in production by January 2026.
The survey data also shows that 75% of Anthropic’s enterprise customers are using it in production, with 89% either testing or in production — figures that slightly exceed OpenAI’s 46% in production and 73% testing or in production rates among its customer base.
Enterprise spending on AI continues to accelerate. Average enterprise LLM spend reached $7 million in 2025, up 180% from $2.5 million in 2024, with projections suggesting $11.6 million in 2026 — a 65% increase year-over-year.
Opus 4.6 is available immediately on claude.ai, the Claude API, and major cloud platforms. Developers can access it via claude-opus-4-6 through the API. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, with premium pricing of $10/$37.50 for prompts exceeding 200,000 tokens using the 1 million token context window.
For users who find Opus 4.6 “overthinking” simpler tasks — a characteristic Anthropic acknowledges can add cost and latency — the company recommends adjusting the effort parameter from its default high setting to medium.
The recommendation captures something essential about where the AI industry now stands. These models have grown so capable that their creators must now teach customers how to make them think less. Whether that represents a breakthrough or a warning sign depends entirely on which side of the disruption you’re standing on — and whether you remembered to sell your software stocks before Tuesday.
Five years ago, Databricks coined the term ‘data lakehouse’ to describe a new type of data architecture that combines a data lake with a data warehouse. That term and data architecture are now commonplace across the data industry for analytics workloads.
Now, Databricks is once again looking to create a new category with its Lakebase service, now generally available today. While the data lakehouse construct deals with OLAP (online analytical processing) databases, Lakebase is all about OLTP (online transaction processing) and operational databases. The Lakebase service has been in development since June 2025 and is based on technology Databricks gained via its acquisition of PostgreSQL database provider Neon. It was further enhanced in October of 2025 with the acquisition of Mooncake, which brought capabilities to help bridge PostgreSQL with lakehouse data formats.
Lakebase is a serverless operational database that represents a fundamental rethinking of how databases work in the age of autonomous AI agents. Early adopters, including easyJet, Hafnia and Warner Music Group, are cutting application delivery times by 75 to 95%, but the deeper architectural innovation positions databases as ephemeral, self-service infrastructure that AI agents can provision and manage without human intervention.
This isn’t just another managed Postgres service. Lakebase treats operational databases as lightweight, disposable compute running on data lake storage rather than monolithic systems requiring careful capacity planning and database administrator (DBA) oversight.
“Really, for the vibe coding trend to take off, you need developers to believe they can actually create new apps very quickly, but you also need the central IT team, or DBAs, to be comfortable with the tsunami of apps and databases,” Databricks co-founder Reynold Xin told VentureBeat. “Classic databases simply won’t scale to that because they can’t afford to put a DBA per database and per app.”
The production numbers demonstrate immediate impact beyond the agent provisioning vision. Hafnia reduced delivery time for production-ready applications from two months to five days — or 92% — using Lakebase as the transactional engine for their internal operations portal. The shipping company moved beyond static BI reports to real-time business applications for fleet, commercial and finance workflows.
EasyJet consolidated more than 100 Git repositories into just two and cut development cycles from nine months to four months — a 56% reduction — while building a web-based revenue management hub on Lakebase to replace a decade-old desktop app and one of Europe’s largest legacy SQL Server environments.
Warner Music Group is moving insights directly into production systems using the unified foundation, while Quantum Capital Group uses it to maintain consistent, governed data for identifying and evaluating oil and gas investments — eliminating the data duplication that previously forced teams to maintain multiple copies in different formats.
The acceleration stems from the elimination of two major bottlenecks: database cloning for test environments and ETL pipeline maintenance for syncing operational and analytical data.
Traditional databases couple storage and compute — organizations provision a database instance with attached storage and scale by adding more instances or storage. AWS Aurora innovated by separating these layers using proprietary storage, but the storage remained locked inside AWS’s ecosystem and wasn’t independently accessible for analytics.
Lakebase takes the separation of storage and compute to its logical conclusion by putting storage directly in the data lakehouse. The compute layer runs essentially vanilla PostgreSQL— maintaining full compatibility with the Postgres ecosystem — but every write goes to lakehouse storage in formats that Spark, Databricks SQL and other analytics engines can immediately query without ETL.
“The unique technical insight was that data lakes decouple storage from compute, which was great, but we need to introduce data management capabilities like governance and transaction management into the data lake,” Xin explained. “We’re actually not that different from the lakehouse concept, but we’re building lightweight, ephemeral compute for OLTP databases on top.”
Databricks built Lakebase with the technology it gained from the acquisition of Neon. But Xin emphasized that Databricks significantly expanded Neon’s original capabilities to create something fundamentally different.
“They didn’t have the enterprise experience, and they didn’t have the cloud scale,” Xin said. “We brought the Neon team’s novel architectural idea together with the robustness of the Databricks infrastructure and combined them. So now we’ve created a super scalable platform.”
Xin outlined a vision directly tied to the economics of AI coding tools that explains why the Lakebase construct matters beyond current use cases. As development costs plummet, enterprises will shift from buying hundreds of SaaS applications to building millions of bespoke internal applications.
“As the cost of software development goes down, which we’re seeing today because of AI coding tools, it will shift from the proliferation of SaaS in the last 10 to 15 years to the proliferation of in-house application development,” Xin said. “Instead of building maybe hundreds of applications, they’ll be building millions of bespoke apps over time.”
This creates an impossible fleet management problem with traditional approaches. You cannot hire enough DBAs to manually provision, monitor and troubleshoot thousands of databases. Xin’s solution: Treat database management itself as a data problem rather than an operations problem.
Lakebase stores all telemetry and metadata — query performance, resource utilization, connection patterns, error rates — directly in the lakehouse, where it can be analyzed using standard data engineering and data science tools. Instead of configuring dashboards in database-specific monitoring tools, data teams query telemetry data with SQL or analyze it with machine learning models to identify outliers and predict issues.
“Instead of creating a dashboard for every 50 or 100 databases, you can actually look at the chart to understand if something has misbehaved,” Xin explained. “Database management will look very similar to an analytics problem. You look at outliers, you look at trends, you try to understand why things happen. This is how you manage at scale when agents are creating and destroying databases programmatically.”
The implications extend to autonomous agents themselves. An AI agent experiencing performance issues could query the telemetry data to diagnose problems — treating database operations as just another analytics task rather than requiring specialized DBA knowledge. Database management becomes something agents can do for themselves using the same data analysis capabilities they already have.
The Lakebase construct signals a fundamental shift in how enterprises should think about operational databases — not as precious, carefully managed infrastructure requiring specialized DBAs, but as ephemeral, self-service resources that scale programmatically like cloud compute.
This matters whether or not autonomous agents materialize as quickly as Databricks envisions, because the underlying architectural principle — treating database management as an analytics problem rather than an operations problem — changes the skill sets and team structures enterprises need.
Data leaders should pay attention to the convergence of operational and analytical data happening across the industry. When writes to an operational database are immediately queryable by analytics engines without ETL, the traditional boundaries between transactional systems and data warehouses blur. This unified architecture reduces the operational overhead of maintaining separate systems, but it also requires rethinking data team structures built around those boundaries.
When lakehouse launched, competitors rejected the concept before eventually adopting it themselves. Xin expects the same trajectory for Lakebase.
“It just makes sense to separate storage and compute and put all the storage in the lake — it enables so many capabilities and possibilities,” he said.
The chief data officer (CDO) has evolved from a niche compliance role into one of the most critical positions for AI deployment. These executives now sit at the intersection of data governance, AI strategy, and workforce readiness. Their decisions determine whether enterprises move from AI pilots to production scale or remain stuck in experimentation mode.
That’s why Informatica’s third annual survey — the largest survey yet of CDOs specifically on AI readiness, spanning 600 executives globally — carries particular weight. The findings expose a dangerous disconnect that explains why so many organizations struggle to scale AI beyond pilots: While 69% of enterprises have deployed generative AI and 47% are running agentic AI systems, 76% admit their governance frameworks can’t keep pace with how employees actually use these technologies.
The survey reveals what Informatica calls a “trust paradox” — and explains why data leaders are dangerously overconfident about AI readiness. Organizations deployed generative AI systems faster than they built the governance and training infrastructure to support them. The result: Employees generally trust the data powering AI systems, but organizations acknowledge their workforces lack the literacy to question that data or use AI responsibly. Seventy-five percent of data leaders say employees need upskilling in data literacy. Seventy-four percent require AI literacy training for day-to-day operations.
“The gap now is just, can you trust the data to set an agent loose on it?” Graeme Thompson, CIO at Informatica, told VentureBeat. “The agents do what they’re supposed to do if you give them the right information. There’s just such a lack of trust in the data that I think that’s the gap.”
GenAI adoption jumped from 48% a year ago to 69% today. Nearly half of organizations (47%) now run agentic AI — systems that autonomously take actions rather than just generate content. This rapid expansion has created a race to acquire vector databases, upgrade data pipelines, and expand compute infrastructure.
But Thompson dismisses infrastructure gaps as the primary problem. The technology exists and works. The limitation is organizational, not technical.
“The technology that we have available at the moment, the infrastructure, is more than — it’s not the problem yet,” Thompson said. He compared the situation to amateur athletes blaming their equipment. “There’s a long way to go before the equipment is the problem in the room. People chase equipment like golfers. Those golfers are a sucker for a new driver, a new putter that’s going to cure their physical inability to hit a golf ball straight.”
The survey data supports this. When asked about 2026 investment priorities, the top three are all people and process issues: data privacy and security (43%), AI governance (41%), and workforce upskilling (39%).
The survey data combined with Thompson’s implementation experience reveals specific lessons for data leaders trying to move from pilots to production.
The trust paradox exists because organizations can deploy AI technology faster than they can train people to use it responsibly. Seventy-five percent need data literacy upskilling. Seventy-four percent need AI literacy training. The technology gap is a people gap.
“It’s much easier to get your people that know your company and know your data and know your processes to learn AI than it is to bring an AI person in that doesn’t know anything about those things and teach them about your company,” Thompson said. “And also the AI people are super expensive, just like data scientists are super expensive.”
Thompson structures Informatica so the CDO reports directly to him as CIO. This makes data governance an execution function rather than a separate strategic layer.
“That is a deliberate decision based on that function being a get things done function instead of an ivory tower function,” Thompson said. The structure ensures data teams and application owners share common priorities through a common boss. “If they have a common boss, their priorities should be aligned. And if not, it’s because the boss isn’t doing his job, not because the two functions aren’t working off the same priority list.”
If 76% of organizations can’t govern AI usage effectively, reporting structure may be part of the problem. Siloed data and IT functions create the conditions for pilots that never scale.
The breakthrough insight is that AI literacy programs must extend beyond technology teams into business functions. At Informatica, the chief marketing officer is one of Thompson’s strongest AI partners.
“You need that literacy across your business teams as well as in your technology teams,” Thompson said.
He noted that the marketing operations team understands the technology and data. It knows that the answer to the “How do I get more value out of my limited marketing program dollars each year?” is by automating and adding AI to how that job is done, not adding people and more Google ad dollars.
Business-side literacy creates pull rather than push for AI adoption. Marketing, sales and operations teams start demanding AI capabilities because they see strategic value, not just efficiency gains.
Data leaders have spent decades fighting perceptions that IT is just a cost center. AI offers the opportunity to change that narrative, but only if CDOs reframe the value proposition away from productivity savings.
“I am very disappointed that, given this new technology capability on a plate, as IT people and as data people, we immediately turn around and talk about productivity savings,” Thompson said. “What a waste of an opportunity.”
The tactical shift: Pitch AI’s ability to remove headcount constraints entirely rather than reduce existing headcount. This reframes AI from operational efficiency to strategic capability. Organizations can expand market reach, enter new geographies and test initiatives that were previously cost-prohibitive.
“It’s not about saving money,” Thompson said. “And if that’s mainly the approach that you have, then your company’s not going to win.”
Don’t wait for perfect horizontal data governance layers before delivering production value. Pick one high-value use case. Build the complete governance, data quality and literacy stack for that specific workflow. Validate results. Then replicate the pattern to adjacent use cases.
This delivers production value while building organizational capability incrementally.
“I think this space is moving so quickly that if you try and solve 100% your governance problem before you get to your semantic layer problem, before you get to your glossary of terms problem, then you’re never going to generate any outcome and people are going to lose patience,” Thompson said.
Airtable is applying its data-first design philosophy to AI agents with the debut of Superagent on Tuesday. It’s a standalone research agent that deploys teams of specialized AI agents working in parallel to complete research tasks.
The technical innovation lies in how Superagent’s orchestrator maintains context. Earlier agent systems used simple model routing where an intermediary filtered information between models. Airtable’s orchestrator maintains full visibility over the entire execution journey: the initial plan, execution steps and sub-agent results. This creates what co-founder Howie Liu calls “a coherent journey” where the orchestrator made all decisions along the way.
“It ultimately comes down to how you leverage the model’s self-reflective capability,” Liu told VentureBeat. Liu co-founded Airtable more than a dozen years ago with a cloud-based relational database at its core.
Airtable built its business on a singular bet: Software should adapt to how people work, not the other way around. That philosophy powered growth to over 500,000 organizations, including 80% of the Fortune 100, using its platform to build custom applications fitted to their workflows.
The superagent technology is an evolution of capabilities originally developed by DeepSky (formerly known as Gradient), which Airtable acquired in October 2025.
Liu frames Airtable and Superagent as complementary form factors that together address different enterprise needs. Airtable provides the structured foundation, and the superagent handles unstructured research tasks.
“We obviously started with a data layer. It’s in the name Airtable: It’s a table of data,” Liu said.
The platform evolved as scaffolding around that core database with workflow capabilities, automations, and interfaces that scale to thousands of users. “I think Superagent is a very complementary form factor, which is very unstructured,” Liu said. “These agents are, by nature, very free form.”
The decision to build free-form capabilities reflects industry learnings about using increasingly capable models. Liu said that as the models have gotten smarter, the best way to use them is to have fewer restrictions on how they run.
When a user submits a query, the orchestrator creates a visible plan that breaks complex research into parallel workstreams. So, for example if you’re researching a company for investment, it’ll break that up into different parts of that task, like research the team, research the funding history, research the competitive landscape. Each workstream gets delegated to a specialized agent that executes independently. These agents work in parallel, their work coordinated by the system, each contributing its piece to the whole.
Airtable’s orchestrator maintains full visibility over the entire execution journey: the initial plan, execution steps and sub-agent results. This creates what Liu calls “a coherent journey” where the orchestrator made all decisions along the way. The sub-agent approach aggregates cleaned results without polluting the main orchestrator’s context. Superagent uses multiple frontier models for different sub-tasks, including OpenAI, Anthropic, and Google.
This solves two problems: It manages context windows by aggregating cleaned results without pollution, and it enables adaptation during execution.
“Maybe it tried doing a research task in a certain way that didn’t work out, couldn’t find the right information, and then it decided to try something else,” Liu said. “It knows that it tried the first thing and it didn’t work. So it won’t make the same mistake again.”
From a builder perspective, Liu argues that agent performance depends more on data structure quality than model selection or prompt engineering. He based this on Airtable’s experience building an internal data analysis tool to figure out what works.
The internal tool experiment revealed that data preparation consumed more effort than agent configuration.
“We found that the hardest part to get right was not actually the agent harness, but most of the special sauce had more to do with massaging the data semantics,” Liu said. “Agents really benefit from good data semantics.”
The data preparation work focused on three areas: restructuring data so agents could find the right tables and fields, clarifying what those fields represent, and ensuring agents could use them reliably in queries and analysis.
For organizations evaluating multi-agent systems or building custom implementations, Liu’s experience points to several technical priorities.
Data architecture precedes agent deployment. The internal experiment demonstrated that enterprises should expect data preparation to consume more resources than agent configuration. Organizations with unstructured data or poor schema documentation will struggle with agent reliability and accuracy regardless of model sophistication.
Context management is critical. Simply stitching different LLMs together to create an agentic workflow isn’t enough. There needs to be a proper context orchestrator that can maintain state and information with a view of the whole workflow.
Relational databases matter. Relational database architecture provides cleaner semantics for agent navigation than document stores or unstructured repositories. Organizations standardizing on NoSQL for performance reasons should consider maintaining relational views or schemas for agent consumption.
Orchestration requires planning capabilities. Just like a relational database has a query planner to optimize results, agentic workflows need an orchestration layer that plans and manages outcomes.
“So the punchline and the short version is that a lot of it comes down to having a really good planning and execution orchestration layer for the agent, and being able to fully leverage the models for what they’re good at,” Liu said.
Presented by Splunk
Organizations across every industry are rushing to take advantage of agentic AI. The promise is compelling for digital resilience — the potential to move organizations from reactive to preemptive operations.
But there is a fundamental flaw in how most organizations are approaching this transformation.
Walk into any boardroom discussing AI strategy, and you will hear endless debates about LLMs, reasoning engines, and GPU clusters. The conversation is dominated by the “brain” (which models to use) and the “body” (what infrastructure to run them on).
What is conspicuously absent? Any serious discussion about the senses — the operational data that AI agents need to perceive and navigate their environment.
This is not a minor oversight. It is a category error that will determine which organizations successfully deploy agentic AI and which ones create expensive, dangerous chaos.
Consider the self-driving car analogy. You could possess the world’s most sophisticated autonomous driving AI, but without LiDAR, cameras, radar, and real-time sensor feeds, that AI is worthless. Worse than worthless, it’s dangerous.
The same principle applies to enterprise agentic AI. An AI agent tasked with security incident response, infrastructure optimization, or customer service orchestration needs continuous, contextual, high-quality machine data to function. Without it, you are asking agents to make critical decisions while essentially blindfolded.
For agentic AI to operate successfully in enterprise environments, it requires three fundamental sensory capabilities:
1. Real-time operational awareness: Agents need continuous streams of telemetry, logs, events, and metrics across the entire technology stack. This isn’t batch processing; it is live data flowing from applications, infrastructure, security tools, and cloud platforms. When a security agent detects anomalous behavior, it needs to see what is happening right now, not what happened an hour ago
2. Contextual understanding: Raw data streams aren’t enough. Agents need the ability to correlate information across domains instantly. A spike in failed login attempts means nothing in isolation. But correlate it with a recent infrastructure change and unusual network traffic, and suddenly you have a confirmed security incident. This context separates signal from noise.
3. Historical memory: Effective agents understand patterns, baselines, and anomalies over time. They need access to historical data that provides context: What does normal look like? Has this happened before? This memory enables agents to distinguish between routine fluctuations and genuine issues requiring intervention
Here is where things get uncomfortable for most organizations: The data infrastructure required for successful agentic AI has been on the “we should do that someday” list for years.
In traditional analytics, poor data quality results in slower insights. Frustrating, but not catastrophic. In agentic environments, however, these problems become immediately operational:
Inconsistent decisions: Agents oscillate between doing nothing and triggering unnecessary failovers because fragmented data sources contradict each other.
Stalled automation: Workflows break mid-stream because the agent lacks visibility into system dependencies or ownership.
Manual recovery: When things go wrong, teams spend days reconstructing events because there is no clear data lineage to explain the agent’s actions.
The velocity of agentic AI doesn’t hide these data problems; it exposes and amplifies them at machine speed. What used to be a quarterly data hygiene initiative is now an existential operational risk.
The organizations that will dominate in the agentic era aren’t those deploying the most agents or using the fanciest models. They are the ones who recognized that agentic sensing infrastructure is the actual competitive differentiator.
These winners are investing in four critical capabilities, all of which are central to the Cisco Data Fabric:
1. Unified data at infinite scale and finite cost: Transforming disconnected monitoring tools into a unified operational data platform is imperative. To support real-time autonomous operations, organizations need data infrastructures that can efficiently scale to handle petabyte-level datasets. Crucially, this must be done cost-effectively through strategies like tiering, federation, and AI automation. True autonomous operations are only possible when unified data platforms deliver both high performance and economic sustainability.
2. Built-in context and correlation: Sophisticated organizations are moving beyond raw data collection to delivering data that arrives enriched with context. Relationships between systems, dependencies across services, and the business impact of technical components must be embedded in the data workflow. This ensures agents spend less time discovering context and more time acting on it.
3. Traceable lineage and governance: In a world where AI agents make consequential decisions, the ability to answer “why did the agent do that?” is mandatory. Organizations need complete data lineage showing exactly what information informed each decision. This isn’t just for debugging; it is essential for compliance, auditability, and building trust in autonomous systems.
4. Open, interoperable standards: Agents do not operate in single-vendor vacuums. They need to sense across platforms, cloud providers, and on-premises systems. This requires a commitment to open standards and API integrations. Organizations that lock themselves into proprietary data formats will find their agents operating with partial blindness.
As we move deeper into 2026, the strategic question isn’t “How many AI agents can we deploy?”
It is: “Can our agents sense what is actually happening in our environment accurately, continuously, and with full context?”
If the answer is no, get ready for agentic chaos.
The good news is that this infrastructure isn’t just valuable for AI agents. It enhances human operations, traditional automation, and business intelligence immediately. The organizations that treat operational data as critical infrastructure will find that their AI agents work better autonomously, reliably, and at scale.
In 2026 and beyond, the competitive moat isn’t the sophistication of your AI models — it’s the operational data providing agents the insights to deliver the right outcome.
Cisco Data Fabric, powered by Splunk Platform, provides a unified data fabric architecture for the agentic AI era. Learn more about Cisco Data Fabric.
Mangesh Pimpalkhare is SVP and GM, Splunk Platform.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
While vector databases still have many valid use cases, organizations including OpenAI are leaning on PostgreSQL to get things done.
In a blog post on Thursday, OpenAI disclosed how it is using the open-source PostgreSQL database.
OpenAI runs ChatGPT and its API platform for 800 million users on a single-primary PostgreSQL instance — not a distributed database, not a sharded cluster. One Azure PostgreSQL Flexible Server handles all writes. Nearly 50 read replicas spread across multiple regions handle reads. The system processes millions of queries per second while maintaining low double-digit millisecond p99 latency and five-nines availability.
The setup challenges conventional scaling wisdom and offers enterprise architects insight into what actually works at massive scale.
The lesson here isn’t to copy OpenAI’s stack. It’s that architectural decisions should be driven by workload patterns and operational constraints — not by scale panic or fashionable infrastructure choices. OpenAI’s PostgreSQL setup shows how far proven systems can stretch when teams optimize deliberately instead of re-architecting prematurely.
“For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API,” OpenAI engineer Bohan Zhang wrote in a technical disclosure. “Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.”
The company achieved this scale through targeted optimizations, including connection pooling that cut connection time from 50 milliseconds to 5 milliseconds and cache locking to prevent ‘thundering herd’ problems where cache misses trigger database overload.
PostgreSQL handles operational data for ChatGPT and OpenAI’s API platform. The workload is heavily read-oriented, which makes PostgreSQL a good fit. However, PostgreSQL’s multiversion concurrency control (MVCC) creates challenges under heavy write loads.
When updating data, PostgreSQL copies entire rows to create new versions, causing write amplification and forcing queries to scan through multiple versions to find current data.
Rather than fighting this limitation, OpenAI built its strategy around it. At OpenAI’s scale, these tradeoffs aren’t theoretical — they determine which workloads stay on PostgreSQL and which ones must move elsewhere.
At large scale, conventional database wisdom points to one of two paths: shard PostgreSQL across multiple primary instances so writes can be distributed, or migrate to a distributed SQL database like CockroachDB or YugabyteDB designed to handle massive scale from the start. Most organizations would have taken one of these paths years ago, well before reaching 800 million users.
Sharding or moving to a distributed SQL database eliminates the single-writer bottleneck. A distributed SQL database handles this coordination automatically, but both approaches introduce significant complexity: application code must route queries to the correct shard, distributed transactions become harder to manage and operational overhead increases substantially.
Instead of sharding PostgreSQL, OpenAI established a hybrid strategy: no new tables in PostgreSQL. New workloads default to sharded systems like Azure Cosmos DB. Existing write-heavy workloads that can be horizontally partitioned get migrated out. Everything else stays in PostgreSQL with aggressive optimization.
This approach offers enterprises a practical alternative to wholesale re-architecture. Rather than spending years rewriting hundreds of endpoints, teams can identify specific bottlenecks and move only those workloads to purpose-built systems.
OpenAI’s experience scaling PostgreSQL reveals several practices that enterprises can adopt regardless of their scale.
Build operational defenses at multiple layers. OpenAI’s approach combines cache locking to prevent “thundering herd” problems, connection pooling (which dropped their connection time from 50ms to 5ms), and rate limiting at application, proxy and query levels. Workload isolation routes low-priority and high-priority traffic to separate instances, ensuring a poorly optimized new feature can’t degrade core services.
Review and monitor ORM-generated SQL in production. Object-Relational Mapping (ORM) frameworks like Django, SQLAlchemy, and Hibernate automatically generate database queries from application code, which is convenient for developers. However, OpenAI found one ORM-generated query joining 12 tables that caused multiple high-severity incidents when traffic spiked. The convenience of letting frameworks generate SQL creates hidden scaling risks that only surface under production load. Make reviewing these queries a standard practice.
Enforce strict operational discipline. OpenAI permits only lightweight schema changes — anything triggering a full table rewrite is prohibited. Schema changes have a 5-second timeout. Long-running queries get automatically terminated to prevent blocking database maintenance operations. When backfilling data, they enforce rate limits so aggressive that operations can take over a week.
Read-heavy workloads with burst writes can run on single-primary PostgreSQL longer than commonly assumed. The decision to shard should depend on workload patterns rather than user counts.
This approach is particularly relevant for AI applications, which often have heavily read-oriented workloads with unpredictable traffic spikes. These characteristics align with the pattern where single-primary PostgreSQL scales effectively.
The lesson is straightforward: identify actual bottlenecks, optimize proven infrastructure where possible, and migrate selectively when necessary. Wholesale re-architecture isn’t always the answer to scaling challenges.
For the modern CFO, the hardest part of the job often isn’t the math—it’s the storytelling. After the books are closed and the variances calculated, finance teams spend days, sometimes weeks, manually copy-pasting charts into PowerPoint slides to explain why the numbers moved.
Today, 11-year-old Israeli fintech company Datarails announced a set of new generative AI tools designed to automate that “last mile” of financial reporting, effectively allowing finance leaders to “vibe code” their way to a board deck.
Launching today to accompany the firm’s newly announced $70 million Series C funding round, the company’s new Strategy, Planning, and Reporting AI Finance Agents promise to answer complex financial questions with fully formatted assets, not just text.
A finance professional can now ask, “What’s driving our profitability changes this year?” or “Why did Marketing go over budget last month?” and the system will instantly generate board-ready PowerPoint slides, PDF reports, or Excel files containing the answer.
The deployment of these agents marks a fundamental shift in how the “Office of the CFO” interacts with data.
The promise of the new agents is to solve the fragmentation problem that plagues finance departments. Unlike a sales leader who lives in Salesforce, or a CIO who relies on ServiceNow, the CFO has no single “system of truth”. Data is scattered across ERPs, HRIS, CRMs, and bank portals.
A major barrier to AI adoption in finance has been security. CFOs are rightfully hesitant to plug P&L data into public models.
Datarails has addressed this by leveraging Microsoft’s Azure OpenAI Service. “We use the OpenAI in Azure to ensure the privacy and the security for our customers, they don’t like to share the data in [an] open LLM,” Gurfinkel noted. This allows the platform to utilize state-of-the-art models while keeping data within a secure enterprise perimeter.
Datarails’ new agents sit on top of a unified data layer that connects these disparate systems. Because the AI is grounded in the company’s own unified internal data, it avoids the hallucinations common in generic LLMs while offering a level of privacy required for sensitive financial data.
“If the CFO wants to leverage AI on the CFO level or the organization data, they need to consolidate the data,” explained Datarails CEO and co-founder Didi Gurfinkel in an interview with VentureBeat.
By solving that consolidation problem first, Datarails can now offer agents that understand the context of the business.
“Now the CFO can use our agents to run analysis, get insights, create reports… because now the data is ready,” Gurfinkel said.
The launch taps into a broader trend in software development where natural language prompts replace complex coding or manual configuration—a concept tech circles refer to as “vibe coding.” Gurfinkel believes this is the future of financial engineering.
“Very soon, the CFO and the financial team themselves will be able to develop applications,” Gurfinkel predicted. “The LLMs become so strong that in one prompt, they can replace full product runs.”
He described a workflow where a user could simply prompt: “That was my budget and my actual of the past year. Now build me the budget for the next year.”
The new agents are designed to handle exactly these types of complex, multi-variable scenarios. For example, a user could ask, “What happens if revenue grows slower next quarter?” and receive a scenario analysis in return.
Because the output can be delivered as an Excel file, finance teams can verify the formulas and assumptions, maintaining the audit trail that generic AI tools often lack.
For most engineering teams, the arrival of a new enterprise financial platform signals a looming headache: months of data migration, schema redesigns, and the inevitable friction of forcing non-technical users to abandon their preferred workflows. Datarails has engineered its way around this friction by building what might be best described as an “anti-implementation.”
Instead of demanding a “rip and replace” of legacy systems, the platform accepts the messy reality of the modern finance stack. The architecture is designed to decouple the data storage from the presentation layer, effectively treating the organization’s existing Excel files as a frontend interface while Datarails acts as the backend database.
“We are not replacing anything,” Gurfinkel explained. “The implementation can be very fast, from a few hours to maybe a few days”.
From a technical perspective, this means the “engineering” requirement is almost entirely stripped away. There are no ETL pipelines to build or Python scripts to maintain. The system comes pre-wired with over 200 native connectors—linking directly to ERPs like NetSuite and Sage, CRMs like Salesforce, and various HRIS and bank portals.
The heavy lifting is replaced by a “no-code” mapping process. A finance analyst, not a developer, maps the fields from their General Ledger to their Excel models in a self-service workflow. For modules like Month-End Close, the company explicitly promises that “no IT support is needed,” a phrase that likely comes as a relief to stretched CTOs. Even complex setups, such as the new Cash Management module which requires banking integrations, are typically fully operational within two to three weeks.
The result is a system where the “technical debt” usually associated with financial transformation is rendered obsolete. The finance team gets their “single source of truth” without ever asking engineering to provision a database.
Datarails wasn’t always the “FinanceOS” for the AI era. Founded in 2015 by Gurfinkel alongside co-founders Eyal Cohen (COO) and Oded Har-Tal (CTO), the Tel Aviv-based startup spent its early years tackling a dryer problem: version control for Excel. The initial premise was to synchronize and manage spreadsheets across enterprises, but adoption was sluggish as the team struggled to find the right product-market fit.
The breakthrough came in 2020 with a strategic pivot. The team realized that finance professionals didn’t want to replace Excel with a new dashboard; they wanted to fix Excel’s limitations—specifically manual consolidation and data fragmentation. By shifting focus to SMB finance teams and embracing an “Excel-native” automation philosophy, the company found its stride.
This alignment led to rapid scaling, fueled by a $55 million Series A in June 2021 led by Zeev Ventures, followed quickly by a $50 million Series B in March 2022 led by Qumra Capital. While the company faced headwinds during the tech downturn—resulting in an 18% workforce reduction in late 2022—it has since rebounded aggressively. By 2025, Datarails had nearly doubled its workforce to over 400 employees globally, driven by a multi-product expansion strategy that now includes Month-End Close and Cash Management solutions.
The new AI capabilities are supported by the $70 million Series C injection from One Peak, along with existing investors Vertex Growth, Vintage Investment Partners, and others. The funding arrives after a year of 70% revenue growth for Datarails, driven largely by the expansion of its product suite.
More than 50% of the company’s growth in 2025 came from solutions launched in the last 12 months, including Datarails Month-End Close (a tool for automating reconciliations and workflow management) and Datarails Cash Management (for real-time liquidity monitoring).
These products serve as the “plumbing” that makes the new AI agents effective. By automating the month-end close and unifying cash data, Datarails ensures that when a CFO asks the AI a question, the underlying numbers are accurate and up-to-date.
For Gurfinkel, the goal is to make the finance office “AI-native” without forcing users to abandon their favorite tool: Excel.
“We are not replacing anything,” Gurfinkel said. “We connect the Excel so Excel now becomes the calculation and the presentation.”
With the launch of these new agents, Datarails is betting that the future of finance isn’t about learning new software, but about having a conversation with the data you already have.