The order, first proposed a year ago, bans GM from collecting and then selling geolocation data to third parties, like data brokers and insurance companies.
When an enterprise LLM retrieves a product name, technical specification, or standard contract clause, it’s using expensive GPU computation designed for complex reasoning — just to access static information. This happens millions of times per day. Each lookup wastes cycles and inflates infrastructure costs.
DeepSeek’s newly released research on “conditional memory” addresses this architectural limitation directly. The work introduces Engram, a module that separates static pattern retrieval from dynamic reasoning. It delivers results that challenge assumptions about what memory is actually for in neural networks. The paper was co-authored by DeepSeek founder Liang Wenfeng.
Through systematic experiments DeepSeek found the optimal balance between computation and memory with 75% of sparse model capacity allocated to dynamic reasoning and 25% to static lookups. This memory system improved reasoning more than knowledge retrieval.
Complex reasoning benchmarks jumped from 70% to 74% accuracy, while knowledge-focused tests improved from 57% to 61%. These improvements came from tests including Big-Bench Hard, ARC-Challenge, and MMLU.
The research arrives as enterprises face mounting pressure to deploy more capable AI systems while navigating GPU memory constraints and infrastructure costs. DeepSeek’s approach offers a potential path forward by fundamentally rethinking how models should be structured.
Agentic memory systems, sometimes referred to as contextual memory — like Hindsight, MemOS, or Memp — focus on episodic memory. They store records of past conversations, user preferences, and interaction history. These systems help agents maintain context across sessions and learn from experience. But they’re external to the model’s forward pass and don’t optimize how the model internally processes static linguistic patterns.
For Chris Latimer, founder and CEO of Vectorize, which developed Hindsight, the conditional memory approach used in Engram solves a different problem than agentic AI memory.
“It’s not solving the problem of connecting agents to external memory like conversation histories and knowledge stores,” Latimer told VentureBeat. “It’s more geared towards squeezing performance out of smaller models and getting more mileage out of scarce GPU resources.”
Conditional memory tackles a fundamental issue: Transformers lack a native knowledge lookup primitive. When processing text, they must simulate retrieval of static patterns through expensive neural computation across multiple layers. These patterns include named entities, technical terminology, and common phrases.
The DeepSeek paper illustrates this with a concrete example. Recognizing “Diana, Princess of Wales” requires consuming multiple layers of attention and feed-forward networks to progressively compose features. The model essentially uses deep, dynamic logic circuits to perform what should be a simple hash table lookup. It’s like using a calculator to remember your phone number rather than just looking it up.
“The problem is that Transformer lacks a ‘native knowledge lookup’ ability,” the researchers write. “Many tasks that should be solved in O(1) time like retrieval have to be ‘simulated for retrieval’ through a large amount of computation, which is very inefficient.”
Engram introduces “conditional memory” to work alongside MoE’s conditional computation.
The mechanism is straightforward. The module takes sequences of two to three tokens and uses hash functions to look them up in a massive embedding table. Retrieval happens in constant time, regardless of table size.
But retrieved patterns need filtering. A hash lookup for “Apple” might collide with unrelated content, or the word might mean the fruit rather than the company. Engram solves this with a gating mechanism. The model’s current understanding of context (accumulated through earlier attention layers) acts as a filter. If retrieved memory contradicts the current context, the gate suppresses it. If it fits, the gate lets it through.
The module isn’t applied at every layer. Strategic placement balances performance gains against system latency.
This dual-system design raises a critical question: How much capacity should each get? DeepSeek’s key finding: the optimal split is 75-80% for computation and 20-25% for memory. Testing found pure MoE (100% computation) proved suboptimal. Too much computation wastes depth reconstructing static patterns; too much memory loses reasoning capacity.
Perhaps Engram’s most pragmatic contribution is its infrastructure-aware design. Unlike MoE’s dynamic routing, which depends on runtime hidden states, Engram’s retrieval indices depend solely on input token sequences. This deterministic nature enables a prefetch-and-overlap strategy.
“The challenge is that GPU memory is limited and expensive, so using bigger models gets costly and harder to deploy,” Latimer said. “The clever idea behind Engram is to keep the main model on the GPU, but offload a big chunk of the model’s stored information into a separate memory on regular RAM, which the model can use on a just-in-time basis.”
During inference, the system can asynchronously retrieve embeddings from host CPU memory via PCIe. This happens while GPU computes preceding transformer blocks. Strategic layer placement leverages computation of early layers as a buffer to mask communication latency.
The researchers demonstrated this with a 100B-parameter embedding table entirely offloaded to host DRAM. They achieved throughput penalties below 3%. This decoupling of storage from compute addresses a critical enterprise constraint as GPU high-bandwidth memory remains expensive and scarce.
For enterprises evaluating AI infrastructure strategies, DeepSeek’s findings suggest several actionable insights:
1. Hybrid architectures outperform pure approaches. The 75/25 allocation law indicates that optimal models should split sparse capacity between computation and memory.
2. Infrastructure costs may shift from GPU to memory. If Engram-style architectures prove viable in production, infrastructure investment patterns could change. The ability to store 100B+ parameters in CPU memory with minimal overhead suggests that memory-rich, compute-moderate configurations may offer better performance-per-dollar than pure GPU scaling.
3. Reasoning improvements exceed knowledge gains. The surprising finding that reasoning benefits more than knowledge retrieval suggests that memory’s value extends beyond obvious use cases.
For enterprises leading AI adoption, Engram demonstrates that the next frontier may not be simply bigger models. It’s smarter architectural choices that respect the fundamental distinction between static knowledge and dynamic reasoning. The research suggests that optimal AI systems will increasingly resemble hybrid architectures.
Organizations waiting to adopt AI later in the cycle should monitor whether major model providers incorporate conditional memory principles into their architectures. If the 75/25 allocation law holds across scales and domains, the next generation of foundation models may deliver substantially better reasoning performance at lower infrastructure costs.
A core element of any data retrieval operation is the use of a component known as a retriever. Its job is to retrieve the relevant content for a given query.
In the AI era, retrievers have been used as part of RAG pipelines. The approach is straightforward: retrieve relevant documents, feed them to an LLM, and let the model generate an answer based on that context.
While retrieval might have seemed like a solved problem, it actually wasn’t solved for modern agentic AI workflows.
In research published this week, Databricks introduced Instructed Retriever, a new architecture that the company claims delivers up to 70% improvement over traditional RAG on complex, instruction-heavy enterprise question-answering tasks. The difference comes down to how the system understands and uses metadata.
“A lot of the systems that were built for retrieval before the age of large language models were really built for humans to use, not for agents to use,” Michael Bendersky, a research director at Databricks, told VentureBeat. “What we found is that in a lot of cases, the errors that are coming from the agent are not because the agent is not able to reason about the data. It’s because the agent is not able to retrieve the right data in the first place.”
The core problem stems from how traditional RAG handles what Bendersky calls “system-level specifications.” These include the full context of user instructions, metadata schemas, and examples that define what a successful retrieval should look like.
In a typical RAG pipeline, a user query gets converted into an embedding, similar documents are retrieved from a vector database, and those results feed into a language model for generation. The system might incorporate basic filtering, but it fundamentally treats each query as an isolated text-matching exercise.
This approach breaks down with real enterprise data. Enterprise documents often include rich metadata like timestamps, author information, product ratings, document types, and domain-specific attributes. When a user asks a question that requires reasoning over these metadata fields, traditional RAG struggles.
Consider this example: “Show me five-star product reviews from the past six months, but exclude anything from Brand X.” Traditional RAG cannot reliably translate that natural language constraint into the appropriate database filters and structured queries.
“If you just use a traditional RAG system, there’s no way to make use of all these different signals about the data that are encapsulated in metadata,” Bendersky said. “They need to be passed on to the agent itself to do the right job in retrieval.”
The issue becomes more acute as enterprises move beyond simple document search to agentic workflows. A human using a search system can reformulate queries and apply filters manually when initial results miss the mark. An AI agent operating autonomously needs the retrieval system itself to understand and execute complex, multi-faceted instructions.
Databricks’ approach fundamentally redesigns the retrieval pipeline. The system propagates complete system specifications through every stage of both retrieval and generation. These specifications include user instructions, labeled examples and index schemas.
The architecture adds three key capabilities:
Query decomposition: The system breaks complex, multi-part requests into a search plan containing multiple keyword searches and filter instructions. A request for “recent FooBrand products excluding lite models” gets decomposed into structured queries with appropriate metadata filters. Traditional systems would attempt a single semantic search.
Metadata reasoning: Natural language instructions get translated into database filters. “From last year” becomes a date filter, “five-star reviews” becomes a rating filter. The system understands both what metadata is available and how to match it to user intent.
Contextual relevance: The reranking stage uses the full context of user instructions to boost documents that match intent, even when keywords are a weaker match. The system can prioritize recency or specific document types based on specifications rather than just text similarity.
“The magic is in how we construct the queries,” Bendersky said. “We kind of try to use the tool as an agent would, not as a human would. It has all the intricacies of the API and uses them to the best possible ability.”
Over the latter half of 2025, there was an industry shift away from RAG toward agentic AI memory, sometimes referred to as contextual memory. Approaches including Hindsight and A-MEM emerged offering the promise of a RAG-free future.
Bendersky argues that contextual memory and sophisticated retrieval serve different purposes. Both are necessary for enterprise AI systems.
“There’s no way you can put everything in your enterprise into your contextual memory,” Bendersky noted. “You kind of need both. You need contextual memory to provide specifications, to provide schemas, but still you need access to the data, which may be distributed across multiple tables and documents.”
Contextual memory excels at maintaining task specifications, user preferences, and metadata schemas within a session. It keeps the “rules of the game” readily available. But the actual enterprise data corpus exists outside this context window. Most enterprises have data volumes that exceed even generous context windows by orders of magnitude.
Instructed Retriever leverages contextual memory for system-level specifications while using retrieval to access the broader data estate. The specifications in context inform how the retriever constructs queries and interprets results. The retrieval system then pulls specific documents from potentially billions of candidates.
This division of labor matters for practical deployment. Loading millions of documents into context is neither feasible nor efficient. The metadata alone can be substantial when dealing with heterogeneous systems across an enterprise. Instructed Retriever solves this by making metadata immediately usable without requiring it all to fit in context.
Instructed Retriever is available now as part of Databricks Agent Bricks; it’s built into the Knowledge Assistant product. Enterprises using Knowledge Assistant to build question-answering systems over their documents automatically leverage the Instructed Retriever architecture without building custom RAG pipelines.
The system is not available as open source, though Bendersky indicated Databricks is considering broader availability. For now, the company’s strategy is to release benchmarks like StaRK-Instruct to the research community while keeping the implementation proprietary to its enterprise products.
The technology shows particular promise for enterprises with complex, highly structured data that includes rich metadata. Bendersky cited use cases across finance, e-commerce, and healthcare. Essentially any domain where documents have meaningful attributes beyond raw text can benefit.
“What we’ve seen in some cases kind of unlocks things that the customer cannot do without it,” Bendersky said.
He explained that without Instructed Retriever, users have to do more data management tasks to put content into the right structure and tables in order for an LLM to properly retrieve the correct information.
“Here you can just create an index with the right metadata, point your retriever to that, and it will just work out of the box,” he said.
For enterprises building RAG-based systems today, the research surfaces a critical question: Is your retrieval pipeline actually capable of the instruction-following and metadata reasoning your use case requires?
The 70% improvement Databricks demonstrates isn’t achievable through incremental optimization. It represents an architectural difference in how system specifications flow through the retrieval and generation process. Organizations that have invested in carefully structuring their data with detailed metadata may find that traditional RAG is leaving much of that structure’s value on the table.
For enterprises looking to implement AI systems that can reliably follow complex, multi-part instructions over heterogeneous data sources, the research indicates that retrieval architecture may be the critical differentiator.
Those still relying on basic RAG for production use cases involving rich metadata should evaluate whether their current approach can fundamentally meet their requirements. The performance gap Databricks demonstrates suggests that a more sophisticated retrieval architecture is now table stakes for enterprises with complex data estates.
For decades the data landscape was relatively static. Relational databases (hello, Oracle!) were the default and dominated, organizing information into familiar columns and rows.
That stability eroded as successive waves introduced NoSQL document stores, graph databases, and most recently vector-based systems. In the era of agentic AI, data infrastructure is once again in flux — and evolving faster than at any point in recent memory.
As 2026 dawns, one lesson has become unavoidable: data matters more than ever.
Perhaps the most consequential trend out of 2025 that will continue to be debated into 2026 (and maybe beyond) is the role of RAG.
The problem is that the original RAG pipeline architecture is much like a basic search. The retrieval finds the result of a specific query, at a specific point in time. It is also often limited to a single data source, or at least that’s the way RAG pipelines were built in the past (the past being anytime prior to June 2025).
Those limitations have led a growing conga line of vendors all claiming that RAG is dying, on the way out, or already dead.
What is emerging, though, are alternative approaches (like contextual memory), as well as nuanced and improved approaches to RAG. For example, Snowflake recently announced its agentic document analytics technology, which expands the traditional RAG data pipeline to enable analysis across thousands of sources, without needing to have structured data first. There are also numerous other RAG-like approaches that are emerging including GraphRAG that will likely only grow in usage and capabilities in 2026.
So now RAG isn’t (entirely) dead, at least not yet. Organizations will still find use cases in 2026 where data retrieval is needed and some enhanced version of RAG will likely still fit the bill.
Enterprises in 2026 should evaluate use cases individually. Traditional RAG works for static knowledge retrieval, whereas enhanced approaches like GraphRAG suit complex, multi-source queries.
While RAG won’t entirely disappear in 2026, one approach that will likely surpass it in terms of usage for agentic AI is contextual memory, also known as agentic or long-context memory. This technology enables LLMs to store and access pertinent information over extended periods.
Multiple such systems emerged over the course of 2025 including Hindsight, A-MEM framework, General Agentic Memory (GAM), LangMem, and Memobase.
RAG will remain useful for static data, but agentic memory is critical for adaptive assistants and agentic AI workflows that must learn from feedback, maintain state, and adapt over time.
In 2026, contextual memory will no longer be a novel technique; it will become table stakes for many operational agentic AI deployments.
At the beginning of the modern generative AI era, purpose-built vector databases (like Pinecone and Milvus, among others) were all the rage.
In order for an LLM (generally but not exclusively via RAG) to get access to new information, it needs to access data. The best way to do that is by encoding the data in vectors — that is, a numerical representation of what the data represents.
In 2025 what became painfully obvious was that vectors were no longer a specific database type but rather a specific data type that could be integrated into an existing multimodel database. So instead of an organization being required to use a purpose-built system, it could just use an existing database that supports vectors. For example, Oracle supports vectors and so does every database offered by Google.
Oh, and it gets better. Amazon S3, long the de facto leader in cloud based object storage, now allows users to store vectors, further negating the need for a dedicated, unique vector database. That doesn’t mean object storage replaces vector search engines — performance, indexing, and filtering still matter — but it does narrow the set of use cases where specialized systems are required.
No, that doesn’t mean purpose-built vector databases are dead. Much like with RAG, there will continue to be use cases for purpose-built vector databases in 2026. What will change is that use cases will likely narrow somewhat for organizations that need the highest levels of performance or a specific optimization that a general-purpose solution doesn’t support.
As 2026 starts, what’s old is new again. The open-source PostgreSQL database will be 40 years old in 2026, yet it will be more relevant than it has ever been before.
Over the course of 2025, the supremacy of PostgreSQL as the go-to database for building any type of GenAI solution became apparent. Snowflake spent $250 million to acquire PostgreSQL database vendor Crunchy Data; Databricks spent $1 billion on Neon; and Supabase raised a $100 million series E giving it a $5 billion valuation.
All that money serves as a clear signal that enterprises are defaulting to PostgreSQL. The reasons are many including the open-source base, flexibility, and performance. For vibe coding (a core use case for Supabase and Neon in particular), PostgreSQL is the standard.
Expect to see more growth and adoption of PostgreSQL in 2026 as more organizations come to the same conclusions as Snowflake and Databricks.
It’s likely that there will be more innovation to help problems that many organizations likely assume are already: solved problems.
In 2025, we saw numerous innovations, like the notion that an AI is able to parse data from an unstructured data source like a PDF. That’s a capability that has existed for several years, but proved harder to operationalize at scale than many assumed. Databricks now has an advanced parser, and other vendors, including Mistral, have emerged with their own improvements.
The same is true with natural language to SQL translation. While some might have assumed that was a solved problem, it’s one that continued to see innovation in 2025 and will see more in 2026.
It’s critical for enterprises to stay vigilant in 2026. Don’t assume foundational capabilities like parsing or natural language to SQL are fully solved. Keep evaluating new approaches that may significantly outperform existing tools.
2025 was a big year for big money going into data vendors.
Meta invested $14.3 billion in data labeling vendor Scale AI; IBM said it plans to acquire data streaming vendor Confluent for $11 billion; and Salesforce picked up Informatica for $8 billion.
Organizations should expect the pace of acquisitions of all sizes to continue in 2026, as big vendors realize the foundational importance of data to the success of agentic AI.
The impact of acquisitions and consolidation on enterprises in 2026 is hard to predict. It can lead to vendor lock-in, and it can also potentially lead to expanded platform capabilities.
In 2026, the question won’t be whether enterprises are using AI — it will be whether their data systems are capable of sustaining it. As agentic AI matures, durable data infrastructure — not clever prompts or short-lived architectures — will determine which deployments scale and which quietly stall out.