Five years ago, Databricks coined the term ‘data lakehouse’ to describe a new type of data architecture that combines a data lake with a data warehouse. That term and data architecture are now commonplace across the data industry for analytics workloads.
Now, Databricks is once again looking to create a new category with its Lakebase service, now generally available today. While the data lakehouse construct deals with OLAP (online analytical processing) databases, Lakebase is all about OLTP (online transaction processing) and operational databases. The Lakebase service has been in development since June 2025 and is based on technology Databricks gained via its acquisition of PostgreSQL database provider Neon. It was further enhanced in October of 2025 with the acquisition of Mooncake, which brought capabilities to help bridge PostgreSQL with lakehouse data formats.
Lakebase is a serverless operational database that represents a fundamental rethinking of how databases work in the age of autonomous AI agents. Early adopters, including easyJet, Hafnia and Warner Music Group, are cutting application delivery times by 75 to 95%, but the deeper architectural innovation positions databases as ephemeral, self-service infrastructure that AI agents can provision and manage without human intervention.
This isn’t just another managed Postgres service. Lakebase treats operational databases as lightweight, disposable compute running on data lake storage rather than monolithic systems requiring careful capacity planning and database administrator (DBA) oversight.
“Really, for the vibe coding trend to take off, you need developers to believe they can actually create new apps very quickly, but you also need the central IT team, or DBAs, to be comfortable with the tsunami of apps and databases,” Databricks co-founder Reynold Xin told VentureBeat. “Classic databases simply won’t scale to that because they can’t afford to put a DBA per database and per app.”
The production numbers demonstrate immediate impact beyond the agent provisioning vision. Hafnia reduced delivery time for production-ready applications from two months to five days — or 92% — using Lakebase as the transactional engine for their internal operations portal. The shipping company moved beyond static BI reports to real-time business applications for fleet, commercial and finance workflows.
EasyJet consolidated more than 100 Git repositories into just two and cut development cycles from nine months to four months — a 56% reduction — while building a web-based revenue management hub on Lakebase to replace a decade-old desktop app and one of Europe’s largest legacy SQL Server environments.
Warner Music Group is moving insights directly into production systems using the unified foundation, while Quantum Capital Group uses it to maintain consistent, governed data for identifying and evaluating oil and gas investments — eliminating the data duplication that previously forced teams to maintain multiple copies in different formats.
The acceleration stems from the elimination of two major bottlenecks: database cloning for test environments and ETL pipeline maintenance for syncing operational and analytical data.
Traditional databases couple storage and compute — organizations provision a database instance with attached storage and scale by adding more instances or storage. AWS Aurora innovated by separating these layers using proprietary storage, but the storage remained locked inside AWS’s ecosystem and wasn’t independently accessible for analytics.
Lakebase takes the separation of storage and compute to its logical conclusion by putting storage directly in the data lakehouse. The compute layer runs essentially vanilla PostgreSQL— maintaining full compatibility with the Postgres ecosystem — but every write goes to lakehouse storage in formats that Spark, Databricks SQL and other analytics engines can immediately query without ETL.
“The unique technical insight was that data lakes decouple storage from compute, which was great, but we need to introduce data management capabilities like governance and transaction management into the data lake,” Xin explained. “We’re actually not that different from the lakehouse concept, but we’re building lightweight, ephemeral compute for OLTP databases on top.”
Databricks built Lakebase with the technology it gained from the acquisition of Neon. But Xin emphasized that Databricks significantly expanded Neon’s original capabilities to create something fundamentally different.
“They didn’t have the enterprise experience, and they didn’t have the cloud scale,” Xin said. “We brought the Neon team’s novel architectural idea together with the robustness of the Databricks infrastructure and combined them. So now we’ve created a super scalable platform.”
Xin outlined a vision directly tied to the economics of AI coding tools that explains why the Lakebase construct matters beyond current use cases. As development costs plummet, enterprises will shift from buying hundreds of SaaS applications to building millions of bespoke internal applications.
“As the cost of software development goes down, which we’re seeing today because of AI coding tools, it will shift from the proliferation of SaaS in the last 10 to 15 years to the proliferation of in-house application development,” Xin said. “Instead of building maybe hundreds of applications, they’ll be building millions of bespoke apps over time.”
This creates an impossible fleet management problem with traditional approaches. You cannot hire enough DBAs to manually provision, monitor and troubleshoot thousands of databases. Xin’s solution: Treat database management itself as a data problem rather than an operations problem.
Lakebase stores all telemetry and metadata — query performance, resource utilization, connection patterns, error rates — directly in the lakehouse, where it can be analyzed using standard data engineering and data science tools. Instead of configuring dashboards in database-specific monitoring tools, data teams query telemetry data with SQL or analyze it with machine learning models to identify outliers and predict issues.
“Instead of creating a dashboard for every 50 or 100 databases, you can actually look at the chart to understand if something has misbehaved,” Xin explained. “Database management will look very similar to an analytics problem. You look at outliers, you look at trends, you try to understand why things happen. This is how you manage at scale when agents are creating and destroying databases programmatically.”
The implications extend to autonomous agents themselves. An AI agent experiencing performance issues could query the telemetry data to diagnose problems — treating database operations as just another analytics task rather than requiring specialized DBA knowledge. Database management becomes something agents can do for themselves using the same data analysis capabilities they already have.
The Lakebase construct signals a fundamental shift in how enterprises should think about operational databases — not as precious, carefully managed infrastructure requiring specialized DBAs, but as ephemeral, self-service resources that scale programmatically like cloud compute.
This matters whether or not autonomous agents materialize as quickly as Databricks envisions, because the underlying architectural principle — treating database management as an analytics problem rather than an operations problem — changes the skill sets and team structures enterprises need.
Data leaders should pay attention to the convergence of operational and analytical data happening across the industry. When writes to an operational database are immediately queryable by analytics engines without ETL, the traditional boundaries between transactional systems and data warehouses blur. This unified architecture reduces the operational overhead of maintaining separate systems, but it also requires rethinking data team structures built around those boundaries.
When lakehouse launched, competitors rejected the concept before eventually adopting it themselves. Xin expects the same trajectory for Lakebase.
“It just makes sense to separate storage and compute and put all the storage in the lake — it enables so many capabilities and possibilities,” he said.
The chief data officer (CDO) has evolved from a niche compliance role into one of the most critical positions for AI deployment. These executives now sit at the intersection of data governance, AI strategy, and workforce readiness. Their decisions determine whether enterprises move from AI pilots to production scale or remain stuck in experimentation mode.
That’s why Informatica’s third annual survey — the largest survey yet of CDOs specifically on AI readiness, spanning 600 executives globally — carries particular weight. The findings expose a dangerous disconnect that explains why so many organizations struggle to scale AI beyond pilots: While 69% of enterprises have deployed generative AI and 47% are running agentic AI systems, 76% admit their governance frameworks can’t keep pace with how employees actually use these technologies.
The survey reveals what Informatica calls a “trust paradox” — and explains why data leaders are dangerously overconfident about AI readiness. Organizations deployed generative AI systems faster than they built the governance and training infrastructure to support them. The result: Employees generally trust the data powering AI systems, but organizations acknowledge their workforces lack the literacy to question that data or use AI responsibly. Seventy-five percent of data leaders say employees need upskilling in data literacy. Seventy-four percent require AI literacy training for day-to-day operations.
“The gap now is just, can you trust the data to set an agent loose on it?” Graeme Thompson, CIO at Informatica, told VentureBeat. “The agents do what they’re supposed to do if you give them the right information. There’s just such a lack of trust in the data that I think that’s the gap.”
GenAI adoption jumped from 48% a year ago to 69% today. Nearly half of organizations (47%) now run agentic AI — systems that autonomously take actions rather than just generate content. This rapid expansion has created a race to acquire vector databases, upgrade data pipelines, and expand compute infrastructure.
But Thompson dismisses infrastructure gaps as the primary problem. The technology exists and works. The limitation is organizational, not technical.
“The technology that we have available at the moment, the infrastructure, is more than — it’s not the problem yet,” Thompson said. He compared the situation to amateur athletes blaming their equipment. “There’s a long way to go before the equipment is the problem in the room. People chase equipment like golfers. Those golfers are a sucker for a new driver, a new putter that’s going to cure their physical inability to hit a golf ball straight.”
The survey data supports this. When asked about 2026 investment priorities, the top three are all people and process issues: data privacy and security (43%), AI governance (41%), and workforce upskilling (39%).
The survey data combined with Thompson’s implementation experience reveals specific lessons for data leaders trying to move from pilots to production.
The trust paradox exists because organizations can deploy AI technology faster than they can train people to use it responsibly. Seventy-five percent need data literacy upskilling. Seventy-four percent need AI literacy training. The technology gap is a people gap.
“It’s much easier to get your people that know your company and know your data and know your processes to learn AI than it is to bring an AI person in that doesn’t know anything about those things and teach them about your company,” Thompson said. “And also the AI people are super expensive, just like data scientists are super expensive.”
Thompson structures Informatica so the CDO reports directly to him as CIO. This makes data governance an execution function rather than a separate strategic layer.
“That is a deliberate decision based on that function being a get things done function instead of an ivory tower function,” Thompson said. The structure ensures data teams and application owners share common priorities through a common boss. “If they have a common boss, their priorities should be aligned. And if not, it’s because the boss isn’t doing his job, not because the two functions aren’t working off the same priority list.”
If 76% of organizations can’t govern AI usage effectively, reporting structure may be part of the problem. Siloed data and IT functions create the conditions for pilots that never scale.
The breakthrough insight is that AI literacy programs must extend beyond technology teams into business functions. At Informatica, the chief marketing officer is one of Thompson’s strongest AI partners.
“You need that literacy across your business teams as well as in your technology teams,” Thompson said.
He noted that the marketing operations team understands the technology and data. It knows that the answer to the “How do I get more value out of my limited marketing program dollars each year?” is by automating and adding AI to how that job is done, not adding people and more Google ad dollars.
Business-side literacy creates pull rather than push for AI adoption. Marketing, sales and operations teams start demanding AI capabilities because they see strategic value, not just efficiency gains.
Data leaders have spent decades fighting perceptions that IT is just a cost center. AI offers the opportunity to change that narrative, but only if CDOs reframe the value proposition away from productivity savings.
“I am very disappointed that, given this new technology capability on a plate, as IT people and as data people, we immediately turn around and talk about productivity savings,” Thompson said. “What a waste of an opportunity.”
The tactical shift: Pitch AI’s ability to remove headcount constraints entirely rather than reduce existing headcount. This reframes AI from operational efficiency to strategic capability. Organizations can expand market reach, enter new geographies and test initiatives that were previously cost-prohibitive.
“It’s not about saving money,” Thompson said. “And if that’s mainly the approach that you have, then your company’s not going to win.”
Don’t wait for perfect horizontal data governance layers before delivering production value. Pick one high-value use case. Build the complete governance, data quality and literacy stack for that specific workflow. Validate results. Then replicate the pattern to adjacent use cases.
This delivers production value while building organizational capability incrementally.
“I think this space is moving so quickly that if you try and solve 100% your governance problem before you get to your semantic layer problem, before you get to your glossary of terms problem, then you’re never going to generate any outcome and people are going to lose patience,” Thompson said.
Airtable is applying its data-first design philosophy to AI agents with the debut of Superagent on Tuesday. It’s a standalone research agent that deploys teams of specialized AI agents working in parallel to complete research tasks.
The technical innovation lies in how Superagent’s orchestrator maintains context. Earlier agent systems used simple model routing where an intermediary filtered information between models. Airtable’s orchestrator maintains full visibility over the entire execution journey: the initial plan, execution steps and sub-agent results. This creates what co-founder Howie Liu calls “a coherent journey” where the orchestrator made all decisions along the way.
“It ultimately comes down to how you leverage the model’s self-reflective capability,” Liu told VentureBeat. Liu co-founded Airtable more than a dozen years ago with a cloud-based relational database at its core.
Airtable built its business on a singular bet: Software should adapt to how people work, not the other way around. That philosophy powered growth to over 500,000 organizations, including 80% of the Fortune 100, using its platform to build custom applications fitted to their workflows.
The superagent technology is an evolution of capabilities originally developed by DeepSky (formerly known as Gradient), which Airtable acquired in October 2025.
Liu frames Airtable and Superagent as complementary form factors that together address different enterprise needs. Airtable provides the structured foundation, and the superagent handles unstructured research tasks.
“We obviously started with a data layer. It’s in the name Airtable: It’s a table of data,” Liu said.
The platform evolved as scaffolding around that core database with workflow capabilities, automations, and interfaces that scale to thousands of users. “I think Superagent is a very complementary form factor, which is very unstructured,” Liu said. “These agents are, by nature, very free form.”
The decision to build free-form capabilities reflects industry learnings about using increasingly capable models. Liu said that as the models have gotten smarter, the best way to use them is to have fewer restrictions on how they run.
When a user submits a query, the orchestrator creates a visible plan that breaks complex research into parallel workstreams. So, for example if you’re researching a company for investment, it’ll break that up into different parts of that task, like research the team, research the funding history, research the competitive landscape. Each workstream gets delegated to a specialized agent that executes independently. These agents work in parallel, their work coordinated by the system, each contributing its piece to the whole.
Airtable’s orchestrator maintains full visibility over the entire execution journey: the initial plan, execution steps and sub-agent results. This creates what Liu calls “a coherent journey” where the orchestrator made all decisions along the way. The sub-agent approach aggregates cleaned results without polluting the main orchestrator’s context. Superagent uses multiple frontier models for different sub-tasks, including OpenAI, Anthropic, and Google.
This solves two problems: It manages context windows by aggregating cleaned results without pollution, and it enables adaptation during execution.
“Maybe it tried doing a research task in a certain way that didn’t work out, couldn’t find the right information, and then it decided to try something else,” Liu said. “It knows that it tried the first thing and it didn’t work. So it won’t make the same mistake again.”
From a builder perspective, Liu argues that agent performance depends more on data structure quality than model selection or prompt engineering. He based this on Airtable’s experience building an internal data analysis tool to figure out what works.
The internal tool experiment revealed that data preparation consumed more effort than agent configuration.
“We found that the hardest part to get right was not actually the agent harness, but most of the special sauce had more to do with massaging the data semantics,” Liu said. “Agents really benefit from good data semantics.”
The data preparation work focused on three areas: restructuring data so agents could find the right tables and fields, clarifying what those fields represent, and ensuring agents could use them reliably in queries and analysis.
For organizations evaluating multi-agent systems or building custom implementations, Liu’s experience points to several technical priorities.
Data architecture precedes agent deployment. The internal experiment demonstrated that enterprises should expect data preparation to consume more resources than agent configuration. Organizations with unstructured data or poor schema documentation will struggle with agent reliability and accuracy regardless of model sophistication.
Context management is critical. Simply stitching different LLMs together to create an agentic workflow isn’t enough. There needs to be a proper context orchestrator that can maintain state and information with a view of the whole workflow.
Relational databases matter. Relational database architecture provides cleaner semantics for agent navigation than document stores or unstructured repositories. Organizations standardizing on NoSQL for performance reasons should consider maintaining relational views or schemas for agent consumption.
Orchestration requires planning capabilities. Just like a relational database has a query planner to optimize results, agentic workflows need an orchestration layer that plans and manages outcomes.
“So the punchline and the short version is that a lot of it comes down to having a really good planning and execution orchestration layer for the agent, and being able to fully leverage the models for what they’re good at,” Liu said.
Presented by Splunk
Organizations across every industry are rushing to take advantage of agentic AI. The promise is compelling for digital resilience — the potential to move organizations from reactive to preemptive operations.
But there is a fundamental flaw in how most organizations are approaching this transformation.
Walk into any boardroom discussing AI strategy, and you will hear endless debates about LLMs, reasoning engines, and GPU clusters. The conversation is dominated by the “brain” (which models to use) and the “body” (what infrastructure to run them on).
What is conspicuously absent? Any serious discussion about the senses — the operational data that AI agents need to perceive and navigate their environment.
This is not a minor oversight. It is a category error that will determine which organizations successfully deploy agentic AI and which ones create expensive, dangerous chaos.
Consider the self-driving car analogy. You could possess the world’s most sophisticated autonomous driving AI, but without LiDAR, cameras, radar, and real-time sensor feeds, that AI is worthless. Worse than worthless, it’s dangerous.
The same principle applies to enterprise agentic AI. An AI agent tasked with security incident response, infrastructure optimization, or customer service orchestration needs continuous, contextual, high-quality machine data to function. Without it, you are asking agents to make critical decisions while essentially blindfolded.
For agentic AI to operate successfully in enterprise environments, it requires three fundamental sensory capabilities:
1. Real-time operational awareness: Agents need continuous streams of telemetry, logs, events, and metrics across the entire technology stack. This isn’t batch processing; it is live data flowing from applications, infrastructure, security tools, and cloud platforms. When a security agent detects anomalous behavior, it needs to see what is happening right now, not what happened an hour ago
2. Contextual understanding: Raw data streams aren’t enough. Agents need the ability to correlate information across domains instantly. A spike in failed login attempts means nothing in isolation. But correlate it with a recent infrastructure change and unusual network traffic, and suddenly you have a confirmed security incident. This context separates signal from noise.
3. Historical memory: Effective agents understand patterns, baselines, and anomalies over time. They need access to historical data that provides context: What does normal look like? Has this happened before? This memory enables agents to distinguish between routine fluctuations and genuine issues requiring intervention
Here is where things get uncomfortable for most organizations: The data infrastructure required for successful agentic AI has been on the “we should do that someday” list for years.
In traditional analytics, poor data quality results in slower insights. Frustrating, but not catastrophic. In agentic environments, however, these problems become immediately operational:
Inconsistent decisions: Agents oscillate between doing nothing and triggering unnecessary failovers because fragmented data sources contradict each other.
Stalled automation: Workflows break mid-stream because the agent lacks visibility into system dependencies or ownership.
Manual recovery: When things go wrong, teams spend days reconstructing events because there is no clear data lineage to explain the agent’s actions.
The velocity of agentic AI doesn’t hide these data problems; it exposes and amplifies them at machine speed. What used to be a quarterly data hygiene initiative is now an existential operational risk.
The organizations that will dominate in the agentic era aren’t those deploying the most agents or using the fanciest models. They are the ones who recognized that agentic sensing infrastructure is the actual competitive differentiator.
These winners are investing in four critical capabilities, all of which are central to the Cisco Data Fabric:
1. Unified data at infinite scale and finite cost: Transforming disconnected monitoring tools into a unified operational data platform is imperative. To support real-time autonomous operations, organizations need data infrastructures that can efficiently scale to handle petabyte-level datasets. Crucially, this must be done cost-effectively through strategies like tiering, federation, and AI automation. True autonomous operations are only possible when unified data platforms deliver both high performance and economic sustainability.
2. Built-in context and correlation: Sophisticated organizations are moving beyond raw data collection to delivering data that arrives enriched with context. Relationships between systems, dependencies across services, and the business impact of technical components must be embedded in the data workflow. This ensures agents spend less time discovering context and more time acting on it.
3. Traceable lineage and governance: In a world where AI agents make consequential decisions, the ability to answer “why did the agent do that?” is mandatory. Organizations need complete data lineage showing exactly what information informed each decision. This isn’t just for debugging; it is essential for compliance, auditability, and building trust in autonomous systems.
4. Open, interoperable standards: Agents do not operate in single-vendor vacuums. They need to sense across platforms, cloud providers, and on-premises systems. This requires a commitment to open standards and API integrations. Organizations that lock themselves into proprietary data formats will find their agents operating with partial blindness.
As we move deeper into 2026, the strategic question isn’t “How many AI agents can we deploy?”
It is: “Can our agents sense what is actually happening in our environment accurately, continuously, and with full context?”
If the answer is no, get ready for agentic chaos.
The good news is that this infrastructure isn’t just valuable for AI agents. It enhances human operations, traditional automation, and business intelligence immediately. The organizations that treat operational data as critical infrastructure will find that their AI agents work better autonomously, reliably, and at scale.
In 2026 and beyond, the competitive moat isn’t the sophistication of your AI models — it’s the operational data providing agents the insights to deliver the right outcome.
Cisco Data Fabric, powered by Splunk Platform, provides a unified data fabric architecture for the agentic AI era. Learn more about Cisco Data Fabric.
Mangesh Pimpalkhare is SVP and GM, Splunk Platform.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
While vector databases still have many valid use cases, organizations including OpenAI are leaning on PostgreSQL to get things done.
In a blog post on Thursday, OpenAI disclosed how it is using the open-source PostgreSQL database.
OpenAI runs ChatGPT and its API platform for 800 million users on a single-primary PostgreSQL instance — not a distributed database, not a sharded cluster. One Azure PostgreSQL Flexible Server handles all writes. Nearly 50 read replicas spread across multiple regions handle reads. The system processes millions of queries per second while maintaining low double-digit millisecond p99 latency and five-nines availability.
The setup challenges conventional scaling wisdom and offers enterprise architects insight into what actually works at massive scale.
The lesson here isn’t to copy OpenAI’s stack. It’s that architectural decisions should be driven by workload patterns and operational constraints — not by scale panic or fashionable infrastructure choices. OpenAI’s PostgreSQL setup shows how far proven systems can stretch when teams optimize deliberately instead of re-architecting prematurely.
“For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API,” OpenAI engineer Bohan Zhang wrote in a technical disclosure. “Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.”
The company achieved this scale through targeted optimizations, including connection pooling that cut connection time from 50 milliseconds to 5 milliseconds and cache locking to prevent ‘thundering herd’ problems where cache misses trigger database overload.
PostgreSQL handles operational data for ChatGPT and OpenAI’s API platform. The workload is heavily read-oriented, which makes PostgreSQL a good fit. However, PostgreSQL’s multiversion concurrency control (MVCC) creates challenges under heavy write loads.
When updating data, PostgreSQL copies entire rows to create new versions, causing write amplification and forcing queries to scan through multiple versions to find current data.
Rather than fighting this limitation, OpenAI built its strategy around it. At OpenAI’s scale, these tradeoffs aren’t theoretical — they determine which workloads stay on PostgreSQL and which ones must move elsewhere.
At large scale, conventional database wisdom points to one of two paths: shard PostgreSQL across multiple primary instances so writes can be distributed, or migrate to a distributed SQL database like CockroachDB or YugabyteDB designed to handle massive scale from the start. Most organizations would have taken one of these paths years ago, well before reaching 800 million users.
Sharding or moving to a distributed SQL database eliminates the single-writer bottleneck. A distributed SQL database handles this coordination automatically, but both approaches introduce significant complexity: application code must route queries to the correct shard, distributed transactions become harder to manage and operational overhead increases substantially.
Instead of sharding PostgreSQL, OpenAI established a hybrid strategy: no new tables in PostgreSQL. New workloads default to sharded systems like Azure Cosmos DB. Existing write-heavy workloads that can be horizontally partitioned get migrated out. Everything else stays in PostgreSQL with aggressive optimization.
This approach offers enterprises a practical alternative to wholesale re-architecture. Rather than spending years rewriting hundreds of endpoints, teams can identify specific bottlenecks and move only those workloads to purpose-built systems.
OpenAI’s experience scaling PostgreSQL reveals several practices that enterprises can adopt regardless of their scale.
Build operational defenses at multiple layers. OpenAI’s approach combines cache locking to prevent “thundering herd” problems, connection pooling (which dropped their connection time from 50ms to 5ms), and rate limiting at application, proxy and query levels. Workload isolation routes low-priority and high-priority traffic to separate instances, ensuring a poorly optimized new feature can’t degrade core services.
Review and monitor ORM-generated SQL in production. Object-Relational Mapping (ORM) frameworks like Django, SQLAlchemy, and Hibernate automatically generate database queries from application code, which is convenient for developers. However, OpenAI found one ORM-generated query joining 12 tables that caused multiple high-severity incidents when traffic spiked. The convenience of letting frameworks generate SQL creates hidden scaling risks that only surface under production load. Make reviewing these queries a standard practice.
Enforce strict operational discipline. OpenAI permits only lightweight schema changes — anything triggering a full table rewrite is prohibited. Schema changes have a 5-second timeout. Long-running queries get automatically terminated to prevent blocking database maintenance operations. When backfilling data, they enforce rate limits so aggressive that operations can take over a week.
Read-heavy workloads with burst writes can run on single-primary PostgreSQL longer than commonly assumed. The decision to shard should depend on workload patterns rather than user counts.
This approach is particularly relevant for AI applications, which often have heavily read-oriented workloads with unpredictable traffic spikes. These characteristics align with the pattern where single-primary PostgreSQL scales effectively.
The lesson is straightforward: identify actual bottlenecks, optimize proven infrastructure where possible, and migrate selectively when necessary. Wholesale re-architecture isn’t always the answer to scaling challenges.
For the modern CFO, the hardest part of the job often isn’t the math—it’s the storytelling. After the books are closed and the variances calculated, finance teams spend days, sometimes weeks, manually copy-pasting charts into PowerPoint slides to explain why the numbers moved.
Today, 11-year-old Israeli fintech company Datarails announced a set of new generative AI tools designed to automate that “last mile” of financial reporting, effectively allowing finance leaders to “vibe code” their way to a board deck.
Launching today to accompany the firm’s newly announced $70 million Series C funding round, the company’s new Strategy, Planning, and Reporting AI Finance Agents promise to answer complex financial questions with fully formatted assets, not just text.
A finance professional can now ask, “What’s driving our profitability changes this year?” or “Why did Marketing go over budget last month?” and the system will instantly generate board-ready PowerPoint slides, PDF reports, or Excel files containing the answer.
The deployment of these agents marks a fundamental shift in how the “Office of the CFO” interacts with data.
The promise of the new agents is to solve the fragmentation problem that plagues finance departments. Unlike a sales leader who lives in Salesforce, or a CIO who relies on ServiceNow, the CFO has no single “system of truth”. Data is scattered across ERPs, HRIS, CRMs, and bank portals.
A major barrier to AI adoption in finance has been security. CFOs are rightfully hesitant to plug P&L data into public models.
Datarails has addressed this by leveraging Microsoft’s Azure OpenAI Service. “We use the OpenAI in Azure to ensure the privacy and the security for our customers, they don’t like to share the data in [an] open LLM,” Gurfinkel noted. This allows the platform to utilize state-of-the-art models while keeping data within a secure enterprise perimeter.
Datarails’ new agents sit on top of a unified data layer that connects these disparate systems. Because the AI is grounded in the company’s own unified internal data, it avoids the hallucinations common in generic LLMs while offering a level of privacy required for sensitive financial data.
“If the CFO wants to leverage AI on the CFO level or the organization data, they need to consolidate the data,” explained Datarails CEO and co-founder Didi Gurfinkel in an interview with VentureBeat.
By solving that consolidation problem first, Datarails can now offer agents that understand the context of the business.
“Now the CFO can use our agents to run analysis, get insights, create reports… because now the data is ready,” Gurfinkel said.
The launch taps into a broader trend in software development where natural language prompts replace complex coding or manual configuration—a concept tech circles refer to as “vibe coding.” Gurfinkel believes this is the future of financial engineering.
“Very soon, the CFO and the financial team themselves will be able to develop applications,” Gurfinkel predicted. “The LLMs become so strong that in one prompt, they can replace full product runs.”
He described a workflow where a user could simply prompt: “That was my budget and my actual of the past year. Now build me the budget for the next year.”
The new agents are designed to handle exactly these types of complex, multi-variable scenarios. For example, a user could ask, “What happens if revenue grows slower next quarter?” and receive a scenario analysis in return.
Because the output can be delivered as an Excel file, finance teams can verify the formulas and assumptions, maintaining the audit trail that generic AI tools often lack.
For most engineering teams, the arrival of a new enterprise financial platform signals a looming headache: months of data migration, schema redesigns, and the inevitable friction of forcing non-technical users to abandon their preferred workflows. Datarails has engineered its way around this friction by building what might be best described as an “anti-implementation.”
Instead of demanding a “rip and replace” of legacy systems, the platform accepts the messy reality of the modern finance stack. The architecture is designed to decouple the data storage from the presentation layer, effectively treating the organization’s existing Excel files as a frontend interface while Datarails acts as the backend database.
“We are not replacing anything,” Gurfinkel explained. “The implementation can be very fast, from a few hours to maybe a few days”.
From a technical perspective, this means the “engineering” requirement is almost entirely stripped away. There are no ETL pipelines to build or Python scripts to maintain. The system comes pre-wired with over 200 native connectors—linking directly to ERPs like NetSuite and Sage, CRMs like Salesforce, and various HRIS and bank portals.
The heavy lifting is replaced by a “no-code” mapping process. A finance analyst, not a developer, maps the fields from their General Ledger to their Excel models in a self-service workflow. For modules like Month-End Close, the company explicitly promises that “no IT support is needed,” a phrase that likely comes as a relief to stretched CTOs. Even complex setups, such as the new Cash Management module which requires banking integrations, are typically fully operational within two to three weeks.
The result is a system where the “technical debt” usually associated with financial transformation is rendered obsolete. The finance team gets their “single source of truth” without ever asking engineering to provision a database.
Datarails wasn’t always the “FinanceOS” for the AI era. Founded in 2015 by Gurfinkel alongside co-founders Eyal Cohen (COO) and Oded Har-Tal (CTO), the Tel Aviv-based startup spent its early years tackling a dryer problem: version control for Excel. The initial premise was to synchronize and manage spreadsheets across enterprises, but adoption was sluggish as the team struggled to find the right product-market fit.
The breakthrough came in 2020 with a strategic pivot. The team realized that finance professionals didn’t want to replace Excel with a new dashboard; they wanted to fix Excel’s limitations—specifically manual consolidation and data fragmentation. By shifting focus to SMB finance teams and embracing an “Excel-native” automation philosophy, the company found its stride.
This alignment led to rapid scaling, fueled by a $55 million Series A in June 2021 led by Zeev Ventures, followed quickly by a $50 million Series B in March 2022 led by Qumra Capital. While the company faced headwinds during the tech downturn—resulting in an 18% workforce reduction in late 2022—it has since rebounded aggressively. By 2025, Datarails had nearly doubled its workforce to over 400 employees globally, driven by a multi-product expansion strategy that now includes Month-End Close and Cash Management solutions.
The new AI capabilities are supported by the $70 million Series C injection from One Peak, along with existing investors Vertex Growth, Vintage Investment Partners, and others. The funding arrives after a year of 70% revenue growth for Datarails, driven largely by the expansion of its product suite.
More than 50% of the company’s growth in 2025 came from solutions launched in the last 12 months, including Datarails Month-End Close (a tool for automating reconciliations and workflow management) and Datarails Cash Management (for real-time liquidity monitoring).
These products serve as the “plumbing” that makes the new AI agents effective. By automating the month-end close and unifying cash data, Datarails ensures that when a CFO asks the AI a question, the underlying numbers are accurate and up-to-date.
For Gurfinkel, the goal is to make the finance office “AI-native” without forcing users to abandon their favorite tool: Excel.
“We are not replacing anything,” Gurfinkel said. “We connect the Excel so Excel now becomes the calculation and the presentation.”
With the launch of these new agents, Datarails is betting that the future of finance isn’t about learning new software, but about having a conversation with the data you already have.
Elon Musk’s social network X (formerly known as Twitter) last night released some of the code and architecture of its overhauled social recommendation algorithm under a permissive, enterprise-friendly open source license (Apache 2.0) on Github, allowing for commercial usage and modification.
This is the algorithm that decides which X posts and accounts to show to which users on the social network.
The new X algorithm, as opposed toto the manual heuristic rules and legacy models in the past, is based on a “Transformer” architecture powered by its parent company, xAI’s, Grok AI language model.
This is a significant release for enterprises who have brand accounts on X, or whose leaders and employees use X to post company promotional messages, links, content, etc — as it now provides a look at how X evaluates posts and accounts on the platform, and what criteria go into it deciding to show a post or specific account to users.
Therefore, it’s imperative for any businesses using X to post promotional and informational content to understand how the X algorithm works as best as they can, in order to maximize their usage of the platform.
To analogize: imagine trying to navigate a hike through a massive woods without a map. You’d likely end up lost and waste time and energy (resources) trying to get to your destination.
But with a map, you could plot your route, look for the appropriate landmarks, check your progress along the way, and revise your path as necessary to stay on track. X open sourcing its new transformer-based recommendation algorithm is in many ways just this — providing a “map” to all those who use the platform on how to achieve the best performance they (and their brands) can.
Here is the technical breakdown of the new architecture and five data-backed strategies to leverage it for commercial growth.
In March 2023, shortly after it was acquired by Musk, X also open sourced its recommendation algorithm.
However, the release revealed a tangled web of “spaghetti code” and manual heuristics and was criticized by outlets like Wired (where my wife works, full disclosure) and organizations including the Center for Democracy and Technology, as being too heavily redacted to be useful. It was seen as a static snapshot of a decaying system.
The code released on January 19, 2026, confirms that the spaghetti is gone. X has replaced the manual filtering layers with a unified, AI-driven Transformer architecture.
The system uses a RecsysBatch input model that ingests user history and action probabilities to output a raw score. It is cleaner, faster, and infinitely more ruthless.
But there is a catch: The specific “weighting constants”—the magic numbers that tell us exactly how much a Like or Reply is worth—have been redacted from this release.
Here are the five strategic imperatives for brands operating in this new, Grok-mediated environment.
In the 2023 legacy code, content drifted through complex clusters, often finding life hours after posting. The new Grok architecture is designed for immediate signal processing.
Community analysis of the new Rust-based scoring functions reveals a strict “Velocity” mechanic.
The lifecycle of a corporate post is determined in the first half-hour. If engagement signals (clicks, dwells, replies) fail to exceed a dynamic threshold in the first 15 minutes, the post is mathematically unlikely to breach the general “For You” pool.
The architecture includes a specific scorer that penalizes multiple posts from the same user in a short window. Posting 10 times a day yields diminishing returns; the algorithm actively downranks your 3rd, 4th, and 5th posts to force variety into the feed. Space your announcements out.
Thus, the takeaway for business data leads is to coordinate your internal comms and advocacy programs with military precision.
“Employee advocacy” can no longer be asynchronous. If your employees or partners engage with a company announcement two hours later, the mathematical window has likely closed. You must front-load engagement in the first 10 minutes to artificially spike the velocity signal.
In 2023, data suggested that an author replying to comments was a “cheat code” for visibility. In 2026, this strategy has become a trap.
While early analysis circulated rumors of a “75x” boost for replies, developers examining the new repository have confirmed that the actual weighting constants are hidden.
More importantly, X’s Head of Product, Nikita Bier, has explicitly stated that “Replies don’t count anymore” for revenue sharing, in a move designed to kill “reply rings” and spam farms.
Bier clarified that replies only generate value if they are high-quality enough to generate “Home Timeline impressions” on their own merit.
As this is the case, businesses should stop optimizing for “reply volume” and start optimizing for “reply quality.” The algorithm is actively hostile toward low-effort engagement rings.
Businesses and individuals should not reply incessantly to every comment with emojis or generic thanks. They should only reply if the response adds enough value to stand alone as a piece of content in a user’s feed.
With replies devalued, focus on the other positive signals visible in the code: dwell_time (how long a user freezes on your post) and share_via_dm. Long-form threads or visual data that force a user to stop scrolling are now mathematically safer bets than controversial questions.
The 2023 algorithm used X paid subscription status as one of many variables. The 2026 architecture simplifies this into a brutal base-score reality.
Code analysis reveals that before a post is evaluated for quality, the account is assigned a base score. X accounts that are “verified” by paying the monthly “Premium” subscription ($3 per month for individual account Premium Basic, $200/month for businesses) receive a significantly higher ceiling (up to +100) compared to unverified accounts, which are capped (max +55).
Therefore, if your brand, executives, or key spokespeople are not verified (X Premium or Verified Organizations), you are competing with a handicap.
For a business looking to acquire customers or leads via X, verification is a mandatory infrastructure cost to remove a programmatic throttle on your reach.
The Grok model has replaced complex “toxicity” rules with a simplified feedback loop.
While the exact weight of someone filing a “Report” on your X post or account over objectionable or false material is hidden in the new config files, it remains the ultimate negative signal.
But it isn’t the only one. The model also outputs probabilities for P(not_interested) and P(mute_author). Irrelevant clickbait doesn’t just get ignored; it actively trains the model to predict that users will mute you, permanently suppressing your future reach.
In a system driven by AI probabilities, a “Report” or “Block” signal trains the model to permanently dissociate your brand from that user’s entire cluster.
In practice, this means “rage bait” or controversial takes are now incredibly dangerous for brands. It takes only a tiny fraction of users utilizing the “Report” function to tank a post’s visibility entirely. Your content strategy must prioritize engagement that excites users enough to reply, but never enough to report.
The most significant takeaway from today’s release is what is missing. The repository provides the architecture (the “car”), but it hides the weights (the “fuel”).
As X user @Tenobrus noted, the repo is “barebones” regarding constants. This means you cannot rely solely on the code to dictate strategy. You must triangulate the code with executive communications.
When Bier announces a change to “revenue share” logic, you must assume it mirrors a change in the “ranking” logic.
Therefore, data decision makers should assign a technical lead to monitor both the xai-org/x-algorithm repository and the public statements of the Engineering team. The code tells you *how* the system thinks; the executives tell you *what* it is currently rewarding.
The Grok-based transformer architecture is cleaner, faster, and more logical than its predecessor. It does not care about your legacy or your follower count. It cares about Velocity and Quality.
1. Verify to secure the base score.
2. Front-load engagement to survive the 30-minute velocity check.
3. Avoid “spammy” replies; focus on standalone value.
4. Monitor executive comms to fill in the gaps left by the code.
In the era of Grok, the algorithm is smarter. Your data and business strategy using X ought to be, too.
The order, first proposed a year ago, bans GM from collecting and then selling geolocation data to third parties, like data brokers and insurance companies.
When an enterprise LLM retrieves a product name, technical specification, or standard contract clause, it’s using expensive GPU computation designed for complex reasoning — just to access static information. This happens millions of times per day. Each lookup wastes cycles and inflates infrastructure costs.
DeepSeek’s newly released research on “conditional memory” addresses this architectural limitation directly. The work introduces Engram, a module that separates static pattern retrieval from dynamic reasoning. It delivers results that challenge assumptions about what memory is actually for in neural networks. The paper was co-authored by DeepSeek founder Liang Wenfeng.
Through systematic experiments DeepSeek found the optimal balance between computation and memory with 75% of sparse model capacity allocated to dynamic reasoning and 25% to static lookups. This memory system improved reasoning more than knowledge retrieval.
Complex reasoning benchmarks jumped from 70% to 74% accuracy, while knowledge-focused tests improved from 57% to 61%. These improvements came from tests including Big-Bench Hard, ARC-Challenge, and MMLU.
The research arrives as enterprises face mounting pressure to deploy more capable AI systems while navigating GPU memory constraints and infrastructure costs. DeepSeek’s approach offers a potential path forward by fundamentally rethinking how models should be structured.
Agentic memory systems, sometimes referred to as contextual memory — like Hindsight, MemOS, or Memp — focus on episodic memory. They store records of past conversations, user preferences, and interaction history. These systems help agents maintain context across sessions and learn from experience. But they’re external to the model’s forward pass and don’t optimize how the model internally processes static linguistic patterns.
For Chris Latimer, founder and CEO of Vectorize, which developed Hindsight, the conditional memory approach used in Engram solves a different problem than agentic AI memory.
“It’s not solving the problem of connecting agents to external memory like conversation histories and knowledge stores,” Latimer told VentureBeat. “It’s more geared towards squeezing performance out of smaller models and getting more mileage out of scarce GPU resources.”
Conditional memory tackles a fundamental issue: Transformers lack a native knowledge lookup primitive. When processing text, they must simulate retrieval of static patterns through expensive neural computation across multiple layers. These patterns include named entities, technical terminology, and common phrases.
The DeepSeek paper illustrates this with a concrete example. Recognizing “Diana, Princess of Wales” requires consuming multiple layers of attention and feed-forward networks to progressively compose features. The model essentially uses deep, dynamic logic circuits to perform what should be a simple hash table lookup. It’s like using a calculator to remember your phone number rather than just looking it up.
“The problem is that Transformer lacks a ‘native knowledge lookup’ ability,” the researchers write. “Many tasks that should be solved in O(1) time like retrieval have to be ‘simulated for retrieval’ through a large amount of computation, which is very inefficient.”
Engram introduces “conditional memory” to work alongside MoE’s conditional computation.
The mechanism is straightforward. The module takes sequences of two to three tokens and uses hash functions to look them up in a massive embedding table. Retrieval happens in constant time, regardless of table size.
But retrieved patterns need filtering. A hash lookup for “Apple” might collide with unrelated content, or the word might mean the fruit rather than the company. Engram solves this with a gating mechanism. The model’s current understanding of context (accumulated through earlier attention layers) acts as a filter. If retrieved memory contradicts the current context, the gate suppresses it. If it fits, the gate lets it through.
The module isn’t applied at every layer. Strategic placement balances performance gains against system latency.
This dual-system design raises a critical question: How much capacity should each get? DeepSeek’s key finding: the optimal split is 75-80% for computation and 20-25% for memory. Testing found pure MoE (100% computation) proved suboptimal. Too much computation wastes depth reconstructing static patterns; too much memory loses reasoning capacity.
Perhaps Engram’s most pragmatic contribution is its infrastructure-aware design. Unlike MoE’s dynamic routing, which depends on runtime hidden states, Engram’s retrieval indices depend solely on input token sequences. This deterministic nature enables a prefetch-and-overlap strategy.
“The challenge is that GPU memory is limited and expensive, so using bigger models gets costly and harder to deploy,” Latimer said. “The clever idea behind Engram is to keep the main model on the GPU, but offload a big chunk of the model’s stored information into a separate memory on regular RAM, which the model can use on a just-in-time basis.”
During inference, the system can asynchronously retrieve embeddings from host CPU memory via PCIe. This happens while GPU computes preceding transformer blocks. Strategic layer placement leverages computation of early layers as a buffer to mask communication latency.
The researchers demonstrated this with a 100B-parameter embedding table entirely offloaded to host DRAM. They achieved throughput penalties below 3%. This decoupling of storage from compute addresses a critical enterprise constraint as GPU high-bandwidth memory remains expensive and scarce.
For enterprises evaluating AI infrastructure strategies, DeepSeek’s findings suggest several actionable insights:
1. Hybrid architectures outperform pure approaches. The 75/25 allocation law indicates that optimal models should split sparse capacity between computation and memory.
2. Infrastructure costs may shift from GPU to memory. If Engram-style architectures prove viable in production, infrastructure investment patterns could change. The ability to store 100B+ parameters in CPU memory with minimal overhead suggests that memory-rich, compute-moderate configurations may offer better performance-per-dollar than pure GPU scaling.
3. Reasoning improvements exceed knowledge gains. The surprising finding that reasoning benefits more than knowledge retrieval suggests that memory’s value extends beyond obvious use cases.
For enterprises leading AI adoption, Engram demonstrates that the next frontier may not be simply bigger models. It’s smarter architectural choices that respect the fundamental distinction between static knowledge and dynamic reasoning. The research suggests that optimal AI systems will increasingly resemble hybrid architectures.
Organizations waiting to adopt AI later in the cycle should monitor whether major model providers incorporate conditional memory principles into their architectures. If the 75/25 allocation law holds across scales and domains, the next generation of foundation models may deliver substantially better reasoning performance at lower infrastructure costs.
A core element of any data retrieval operation is the use of a component known as a retriever. Its job is to retrieve the relevant content for a given query.
In the AI era, retrievers have been used as part of RAG pipelines. The approach is straightforward: retrieve relevant documents, feed them to an LLM, and let the model generate an answer based on that context.
While retrieval might have seemed like a solved problem, it actually wasn’t solved for modern agentic AI workflows.
In research published this week, Databricks introduced Instructed Retriever, a new architecture that the company claims delivers up to 70% improvement over traditional RAG on complex, instruction-heavy enterprise question-answering tasks. The difference comes down to how the system understands and uses metadata.
“A lot of the systems that were built for retrieval before the age of large language models were really built for humans to use, not for agents to use,” Michael Bendersky, a research director at Databricks, told VentureBeat. “What we found is that in a lot of cases, the errors that are coming from the agent are not because the agent is not able to reason about the data. It’s because the agent is not able to retrieve the right data in the first place.”
The core problem stems from how traditional RAG handles what Bendersky calls “system-level specifications.” These include the full context of user instructions, metadata schemas, and examples that define what a successful retrieval should look like.
In a typical RAG pipeline, a user query gets converted into an embedding, similar documents are retrieved from a vector database, and those results feed into a language model for generation. The system might incorporate basic filtering, but it fundamentally treats each query as an isolated text-matching exercise.
This approach breaks down with real enterprise data. Enterprise documents often include rich metadata like timestamps, author information, product ratings, document types, and domain-specific attributes. When a user asks a question that requires reasoning over these metadata fields, traditional RAG struggles.
Consider this example: “Show me five-star product reviews from the past six months, but exclude anything from Brand X.” Traditional RAG cannot reliably translate that natural language constraint into the appropriate database filters and structured queries.
“If you just use a traditional RAG system, there’s no way to make use of all these different signals about the data that are encapsulated in metadata,” Bendersky said. “They need to be passed on to the agent itself to do the right job in retrieval.”
The issue becomes more acute as enterprises move beyond simple document search to agentic workflows. A human using a search system can reformulate queries and apply filters manually when initial results miss the mark. An AI agent operating autonomously needs the retrieval system itself to understand and execute complex, multi-faceted instructions.
Databricks’ approach fundamentally redesigns the retrieval pipeline. The system propagates complete system specifications through every stage of both retrieval and generation. These specifications include user instructions, labeled examples and index schemas.
The architecture adds three key capabilities:
Query decomposition: The system breaks complex, multi-part requests into a search plan containing multiple keyword searches and filter instructions. A request for “recent FooBrand products excluding lite models” gets decomposed into structured queries with appropriate metadata filters. Traditional systems would attempt a single semantic search.
Metadata reasoning: Natural language instructions get translated into database filters. “From last year” becomes a date filter, “five-star reviews” becomes a rating filter. The system understands both what metadata is available and how to match it to user intent.
Contextual relevance: The reranking stage uses the full context of user instructions to boost documents that match intent, even when keywords are a weaker match. The system can prioritize recency or specific document types based on specifications rather than just text similarity.
“The magic is in how we construct the queries,” Bendersky said. “We kind of try to use the tool as an agent would, not as a human would. It has all the intricacies of the API and uses them to the best possible ability.”
Over the latter half of 2025, there was an industry shift away from RAG toward agentic AI memory, sometimes referred to as contextual memory. Approaches including Hindsight and A-MEM emerged offering the promise of a RAG-free future.
Bendersky argues that contextual memory and sophisticated retrieval serve different purposes. Both are necessary for enterprise AI systems.
“There’s no way you can put everything in your enterprise into your contextual memory,” Bendersky noted. “You kind of need both. You need contextual memory to provide specifications, to provide schemas, but still you need access to the data, which may be distributed across multiple tables and documents.”
Contextual memory excels at maintaining task specifications, user preferences, and metadata schemas within a session. It keeps the “rules of the game” readily available. But the actual enterprise data corpus exists outside this context window. Most enterprises have data volumes that exceed even generous context windows by orders of magnitude.
Instructed Retriever leverages contextual memory for system-level specifications while using retrieval to access the broader data estate. The specifications in context inform how the retriever constructs queries and interprets results. The retrieval system then pulls specific documents from potentially billions of candidates.
This division of labor matters for practical deployment. Loading millions of documents into context is neither feasible nor efficient. The metadata alone can be substantial when dealing with heterogeneous systems across an enterprise. Instructed Retriever solves this by making metadata immediately usable without requiring it all to fit in context.
Instructed Retriever is available now as part of Databricks Agent Bricks; it’s built into the Knowledge Assistant product. Enterprises using Knowledge Assistant to build question-answering systems over their documents automatically leverage the Instructed Retriever architecture without building custom RAG pipelines.
The system is not available as open source, though Bendersky indicated Databricks is considering broader availability. For now, the company’s strategy is to release benchmarks like StaRK-Instruct to the research community while keeping the implementation proprietary to its enterprise products.
The technology shows particular promise for enterprises with complex, highly structured data that includes rich metadata. Bendersky cited use cases across finance, e-commerce, and healthcare. Essentially any domain where documents have meaningful attributes beyond raw text can benefit.
“What we’ve seen in some cases kind of unlocks things that the customer cannot do without it,” Bendersky said.
He explained that without Instructed Retriever, users have to do more data management tasks to put content into the right structure and tables in order for an LLM to properly retrieve the correct information.
“Here you can just create an index with the right metadata, point your retriever to that, and it will just work out of the box,” he said.
For enterprises building RAG-based systems today, the research surfaces a critical question: Is your retrieval pipeline actually capable of the instruction-following and metadata reasoning your use case requires?
The 70% improvement Databricks demonstrates isn’t achievable through incremental optimization. It represents an architectural difference in how system specifications flow through the retrieval and generation process. Organizations that have invested in carefully structuring their data with detailed metadata may find that traditional RAG is leaving much of that structure’s value on the table.
For enterprises looking to implement AI systems that can reliably follow complex, multi-part instructions over heterogeneous data sources, the research indicates that retrieval architecture may be the critical differentiator.
Those still relying on basic RAG for production use cases involving rich metadata should evaluate whether their current approach can fundamentally meet their requirements. The performance gap Databricks demonstrates suggests that a more sophisticated retrieval architecture is now table stakes for enterprises with complex data estates.