What role will F-15 jets play in the future of supersonic flight?
Google’s new AI Plus subscription ($7.99) offers Gemini and creative tools, but hidden technical limits and a superior $9.99 alternative make it a tricky buy.
In the race to bring artificial intelligence into the enterprise, a small but well-funded startup is making a bold claim: The problem holding back AI adoption in complex industries has never been the models themselves.
Contextual AI, a two-and-a-half-year-old company backed by investors including Bezos Expeditions and Bain Capital Ventures, on Monday unveiled Agent Composer, a platform designed to help engineers in aerospace, semiconductor manufacturing, and other technically demanding fields build AI agents that can automate the kind of knowledge-intensive work that has long resisted automation.
The announcement arrives at a pivotal moment for enterprise AI. Four years after ChatGPT ignited a frenzy of corporate AI initiatives, many organizations remain stuck in pilot programs, struggling to move experimental projects into full-scale production. Chief financial officers and business unit leaders are growing impatient with internal efforts that have consumed millions of dollars but delivered limited returns.
Douwe Kiela, Contextual AI’s chief executive, believes the industry has been focused on the wrong bottleneck. “The model is almost commoditized at this point,” Kiela said in an interview with VentureBeat. “The bottleneck is context — can the AI actually access your proprietary docs, specs, and institutional knowledge? That’s the problem we solve.”
To understand what Contextual AI is attempting, it helps to understand a concept that has become central to modern AI development: retrieval-augmented generation, or RAG.
When large language models like those from OpenAI, Google, or Anthropic generate responses, they draw on knowledge embedded during training. But that knowledge has a cutoff date, and it cannot include the proprietary documents, engineering specifications, and institutional knowledge that make up the lifeblood of most enterprises.
RAG systems attempt to solve this by retrieving relevant documents from a company’s own databases and feeding them to the model alongside the user’s question. The model can then ground its response in actual company data rather than relying solely on its training.
Kiela helped pioneer this approach during his time as a research scientist at Facebook AI Research and later as head of research at Hugging Face, the influential open-source AI company. He holds a Ph.D. from Cambridge and serves as an adjunct professor in symbolic systems at Stanford University.
But early RAG systems, Kiela acknowledges, were crude.
“Early RAG was pretty crude — grab an off-the-shelf retriever, connect it to a generator, hope for the best,” he said. “Errors compounded through the pipeline. Hallucinations were common because the generator wasn’t trained to stay grounded.”
When Kiela founded Contextual AI in June 2023, he set out to solve these problems systematically. The company developed what it calls a “unified context layer” — a set of tools that sit between a company’s data and its AI models, ensuring that the right information reaches the model in the right format at the right time.
The approach has earned recognition. According to a Google Cloud case study, Contextual AI achieved the highest performance on Google’s FACTS benchmark for grounded, hallucination-resistant results. The company fine-tuned Meta’s open-source Llama models on Google Cloud’s Vertex AI platform, focusing specifically on reducing the tendency of AI systems to invent information.
Agent Composer extends Contextual AI’s existing platform with orchestration capabilities — the ability to coordinate multiple AI tools across multiple steps to complete complex workflows.
The platform offers three ways to create AI agents. Users can start with pre-built agents designed for common technical workflows like root cause analysis or compliance checking. They can describe a workflow in natural language and let the system automatically generate a working agent architecture. Or they can build from scratch using a visual drag-and-drop interface that requires no coding.
What distinguishes Agent Composer from competing approaches, the company says, is its hybrid architecture. Teams can combine strict, deterministic rules for high-stakes steps — compliance checks, data validation, approval gates — with dynamic reasoning for exploratory analysis.
“For highly critical workflows, users can choose completely deterministic steps to control agent behavior and avoid uncertainty,” Kiela said.
The platform also includes what the company calls “one-click agent optimization,” which takes user feedback and automatically adjusts agent performance. Every step of an agent’s reasoning process can be audited, and responses come with sentence-level citations showing exactly where information originated in source documents.
Contextual AI says early customers have reported significant efficiency gains, though the company acknowledges these figures come from customer self-reporting rather than independent verification.
“These come directly from customer evals, which are approximations of real-world workflows,” Kiela said. “The numbers are self-reported by our customers as they describe the before-and-after scenario of adopting Contextual AI.”
The claimed results are nonetheless striking. An advanced manufacturer reduced root-cause analysis from eight hours to 20 minutes by automating sensor data parsing and log correlation. A specialty chemicals company reduced product research from hours to minutes using agents that search patents and regulatory databases. A test equipment maker now generates test code in minutes instead of days.
Keith Schaub, vice president of technology and strategy at Advantest, a semiconductor test equipment company, offered an endorsement. “Contextual AI has been an important part of our AI transformation efforts,” Schaub said. “The technology has been rolled out to multiple teams across Advantest and select end customers, saving meaningful time across tasks ranging from test code generation to customer engineering workflows.”
The company’s other customers include Qualcomm, the semiconductor giant; ShipBob, a tech-enabled logistics provider that claims to have achieved 60 times faster issue resolution; and Nvidia, the chip maker whose graphics processors power most AI systems.
Perhaps the biggest challenge Contextual AI faces is not competing products but the instinct among engineering organizations to build their own solutions.
“The biggest objection is ‘we’ll build it ourselves,'” Kiela acknowledged. “Some teams try. It sounds exciting to do, but is exceptionally hard to do this well at scale. Many of our customers started with DIY, and found themselves still debugging retrieval pipelines instead of solving actual problems 12-18 months later.”
The alternative — off-the-shelf point solutions — presents its own problems, the company argues. Such tools deploy quickly but often prove inflexible and difficult to customize for specific use cases.
Agent Composer attempts to occupy a middle ground, offering a platform approach that combines pre-built components with extensive customization options. The system supports models from OpenAI, Anthropic, and Google, as well as Contextual AI’s own Grounded Language Model, which was specifically trained to stay faithful to retrieved content.
Pricing starts at $50 per month for self-serve usage, with custom enterprise pricing for larger deployments.
“The justification to CFOs is really about increasing productivity and getting them to production faster with their AI initiatives,” Kiela said. “Every technical team is struggling to hire top engineering talent, so making their existing teams more productive is a huge priority in these industries.”
Looking ahead, Kiela outlined three priorities for the coming year: workflow automation with actual write actions across enterprise systems rather than just reading and analyzing; better coordination among multiple specialized agents working together; and faster specialization through automatic learning from production feedback.
“The compound effect matters here,” he said. “Every document you ingest, every feedback loop you close, those improvements stack up. Companies building this infrastructure now are going to be hard to catch.”
The enterprise AI market remains fiercely competitive, with offerings from major cloud providers, established software vendors, and scores of startups all chasing the same customers. Whether Contextual AI’s bet on context over models will pay off depends on whether enterprises come to share Kiela’s view that the foundation model wars matter less than the infrastructure that surrounds them.
But there is a certain irony in the company’s positioning. For years, the AI industry has fixated on building ever-larger, ever-more-powerful models — pouring billions into the race for artificial general intelligence. Contextual AI is making a quieter argument: that for most real-world work, the magic isn’t in the model. It’s in knowing where to look.
Decart unveils Lucy 2, a real time world model generating live video at 1080p with no buffering, opening new paths for streaming, commerce, gaming, and robotics.
Adobe is introducing new features to Photoshop that reduce tedious tasks, elevate workflows and give content creators more precision and control over their work.
Mistral AI, the French artificial intelligence company that has positioned itself as Europe’s leading challenger to American AI giants, announced on Tuesday the general availability of Mistral Vibe 2.0, a significant upgrade to its terminal-based coding agent that’s the startup’s most aggressive push yet into the competitive AI-assisted software development market.
The release is a pivotal moment for the Paris-based company, which is transitioning its developer tools from a free testing phase to a commercial product integrated with its paid subscription plans. The move comes just days after Mistral CEO Arthur Mensch told Bloomberg Television at the World Economic Forum in Davos that the company expects to cross €1 billion in revenue by the end of 2026 — a projection that would still leave it far behind American competitors but would cement its position as Europe’s preeminent AI firm.
“The announcement is more of an upgrade and general availability,” Timothée Lacroix, cofounder of Mistral, said in an exclusive interview with VentureBeat. “We produced Devstral 2 in December, and we released at the time a first version of Vibe. Everything was free and in testing. Now we have finalized and improved the CLI, and we are moving Mistral Vibe to a paid plan that’s bundled with our Le Chat plans.”
Mistral Vibe 2.0 arrives as technology executives across industries grapple with a fundamental tension: the promise of AI-powered coding tools is immense, but the most capable models are controlled by a handful of American companies — OpenAI, Anthropic, and Google — whose closed-source approaches leave enterprises with limited control over their most sensitive intellectual property.
Mistral is betting that its open-source approach, combined with deep customization capabilities, will appeal to organizations wary of sending proprietary code to third-party providers. The strategy targets a specific pain point that Lacroix says plagues enterprises with legacy systems.
“The code bases that large enterprise work with are large and have been built upon years and years, and they haven’t seen the web,” Lacroix explained. “They potentially rely on large libraries or large domain-specific languages that are unknown to typical language models. And so what we’re able to do with the Vibe CLI and our models is to go and customize them to a customer’s code base and its specific IP to get an improved experience.”
This customization capability addresses a limitation that has frustrated many enterprise technology leaders: general-purpose AI coding assistants trained on public code repositories often struggle with proprietary frameworks, internal coding conventions, and domain-specific languages that exist only within corporate walls. A bank’s internal trading system, a manufacturer’s proprietary control software, or a pharmaceutical company’s research pipeline may rely on decades of accumulated code written in conventions that no public AI model has ever encountered.
The updated Vibe CLI introduces several features designed to give developers more granular control over how the AI agent operates. Custom subagents allow organizations to build specialized AI agents for targeted tasks—such as deployment scripts, pull request reviews, or test generation—that can be invoked on demand rather than relying on a single general-purpose assistant.
Multi-choice clarifications are a departure from the behavior of many AI coding tools that attempt to infer developer intent when instructions are ambiguous. Instead, Vibe 2.0 prompts users with options before taking action, reducing the risk of unwanted code changes. Slash-command skills enable developers to load preconfigured workflows for common tasks like deploying, linting, or generating documentation through simple commands. Unified agent modes allow teams to configure custom operational modes that combine specific tools, permissions, and behaviors, enabling developers to switch contexts without switching between different applications. The tool also now ships with continuous updates through the command line, eliminating the need for manual version management.
Mistral Vibe 2.0 is available through two subscription tiers. The Le Chat Pro plan costs $14.99 per month and provides full access to the Vibe CLI and Devstral 2, the underlying model that powers the agent, with students receiving a 50 percent discount. The Le Chat Team plan, priced at $24.99 per seat per month, adds unified billing, administrative controls, and priority support for organizations.
Both plans include generous usage allowances for sustained development work, with the option to continue beyond limits through pay-as-you-go pricing at API rates. The underlying Devstral 2 model, which previously was offered free through Mistral’s API during a testing period, now moves to paid access with input pricing of $0.40 per million tokens and output pricing of $2.00 per million tokens.
The Devstral 2 model family that powers Vibe CLI is Mistral’s bet that smaller, more efficient models can compete with — and in some cases outperform — the massive systems built by better-funded American rivals. Devstral 2, a 123-billion-parameter dense transformer, achieves 72.2 percent on SWE-bench Verified, a widely used benchmark for evaluating AI systems’ ability to solve real-world software engineering problems.
Perhaps more significant for enterprise deployment, the model is roughly five times smaller than DeepSeek V3.2 and eight times smaller than Kimi K2 — Chinese models that have drawn attention for matching American AI systems at a fraction of the cost. The smaller Devstral 2 Small, at 24 billion parameters, can run on consumer hardware including laptops.
“Those two models are dense, which makes it also—I mean, the small one is something that can run on a laptop, really, which is great if you’re working on the train,” Lacroix noted. “But the fact that the larger one is also dense is interesting for on-prem or more resource-constrained usage, where it’s easier to get efficient use of a dense model rather than large mixture of experts, and it requires smaller hardware to start.”
The distinction between dense and mixture-of-experts architectures is technically significant. While mixture-of-experts models can theoretically offer more capability per compute dollar by activating only portions of their parameters for any given task, they require more complex infrastructure to deploy efficiently. Dense models, by contrast, activate all parameters for every computation but are more straightforward to run on conventional hardware — a meaningful consideration for enterprises that want to deploy AI systems on their own infrastructure rather than relying on cloud providers.
For regulated industries — particularly financial services, healthcare, and defense — the question of where AI models run and who has access to the data they process is not merely technical but existential. Banks cannot send proprietary trading algorithms to external AI providers. Healthcare organizations face strict regulations about patient data. Defense contractors operate under security clearances that prohibit sharing sensitive information with foreign entities.
Lacroix suggests that the on-premises deployment capability, while important, is secondary to a more fundamental concern about ownership and control. “The fact that it’s on-prem, I think, is less relevant than the fact that it’s owned by the company and that it’s on wherever they feel safe moving that data — like they’re not shipping the entire code base to a third party,” he said. “I think that’s important.”
This framing positions Mistral not merely as a vendor of AI tools but as a partner in building proprietary AI capabilities that become strategic assets for client organizations. “When we work with a company to then customize them and potentially fine-tune them or continue pre-training them, then they become assets to that company, and they are their own competitive advantage, really,” Lacroix explained.
Mistral has actively cultivated relationships with governments to underscore this positioning. The company serves defense ministries in Europe and Southeast Asia, both directly and through defense contractors. At Davos, Mensch described AI as critical not only to economic sovereignty but to “strategic sovereignty,” noting that autonomous systems like drones require AI capabilities and that deterrence in this domain is increasingly important.
Mistral’s positioning as a European alternative to American AI giants takes on added significance amid rising geopolitical tensions. At the World Economic Forum, Mensch was characteristically blunt about the competitive landscape, dismissing claims that Chinese AI development lags the United States as a “fairy tale.”
“China is not behind the West,” Mensch said in his Bloomberg Television interview. The capabilities of China’s open-source technology, he added, are “probably stressing the CEOs in the U.S.”
The comments reflect a broader anxiety in the AI industry about the durability of American technological leadership. Chinese companies including DeepSeek and Alibaba have released open-source models that match or exceed many American systems, often at dramatically lower costs. For Mistral, this competitive pressure validates its strategy of focusing on efficiency and customization rather than attempting to match the massive training runs of better-capitalized American rivals.
European Commission digital chief Henna Virkkunen, also speaking at Davos, underscored the strategic importance of technological sovereignty. “It’s so important that we are not dependent on one country or one company when it comes to some very critical fields of our economy or society,” she said.
For American enterprise customers, Lacroix suggests that Mistral’s European identity and government relationships need not be a concern — and may even be an advantage. “One of the benefits when working as we do, like with open weights, and especially when deploying on customers’ premises and giving them control, is that the wider geopolitics don’t necessarily matter that much,” he said. “I think the benefits of the open-source scene is that it gives you confidence that you know what you’re using, and you’re in total control of it.”
Mistral’s transition from a pure model company to what Lacroix describes as “a full enterprise platform around developing AI applications” reflects a broader maturation in the AI industry. The realization that model weights alone do not capture the full value of AI systems has pushed companies across the sector toward more integrated offerings.
“We don’t think the only value we provide is in the model,” Lacroix said. “We started as a models company. We are now building a full enterprise platform around developing AI applications. We have a part of our company that provides services to integrate deeply. And so the way we make money, and I guess the question behind this is the value that is core to Mistral, is that full-stack solution to getting to the ROI of AI.”
This full-stack approach includes fine-tuning on internal languages and domain-specific languages, reinforcement learning with customer-specific environments, and end-to-end code modernization services that can migrate entire codebases to modern technology stacks. Mistral says it already delivers these solutions to some of the world’s largest organizations in finance, defense, and infrastructure.
The revenue milestone Mensch projected at Davos — crossing €1 billion by year’s end — would represent remarkable growth for a company founded in 2023. But it would still leave Mistral far behind American competitors whose valuations stretch into the hundreds of billions. OpenAI, now reportedly valued at more than $150 billion, and Anthropic, valued at approximately $60 billion, operate at a scale that Mistral cannot match through organic growth alone. To close the gap, Mistral is looking at acquisitions. “We are in the process of looking at a few opportunities,” Mensch said at Davos, though he declined to specify target business areas or geographic regions. The company’s September fundraise brought in €1.7 billion, with Dutch semiconductor equipment giant ASML joining as a key investor, valuing Mistral at €11.7 billion.
Looking beyond the immediate product announcement, Lacroix sees the current generation of AI coding tools as a transitional phase toward more autonomous software development. “For a few tasks, it’s already becoming the default entry point — like if I want to prototype something, or if I want to quickly iterate on an idea. I think it’s already faster,” he said. “What I see today is there is still some story that needs to happen on how you do the work asynchronously and in a way where it’s easy to orchestrate several tasks and several improvements on the same code base in a flow that feels natural.”
The current experience, he suggests, does not yet feel like having “your own team of developers that can really 10x yourself.” But he expects rapid improvement, driven by abundant training data and intense industry interest. Perhaps more ambitiously, Lacroix sees the file-manipulation and tool-calling capabilities built for coding as applicable far beyond software development. “What I’m really excited about is the use of these tools outside of coding,” he said. “The really strong realization is you now have an agent that is great at working with a file system, that can edit information and that expands its context a lot, and it’s really great at using all sorts of tools. Those tools don’t need to be necessarily related to coding, really.”
For chief technology officers and engineering leaders evaluating AI coding tools, Mistral’s announcement crystallizes the strategic choice now facing enterprises: accept the convenience and raw capability of closed-source American models, or bet on the flexibility and control of open-source alternatives that can be customized and deployed behind corporate firewalls. Human evaluations comparing Devstral 2 against Claude Sonnet 4.5 showed that Anthropic’s model was “significantly preferred,” according to Mistral’s own benchmarking — an acknowledgment that closed-source leaders retain advantages that efficiency and customization cannot fully offset.
But Lacroix is betting that for enterprises with proprietary code, legacy systems, and regulatory constraints, customization will matter more than raw performance on public benchmarks. “The point is that you can now get all of this vibe coding disruption and goodness in an environment where customization is needed, which was difficult before,” he said. “And that’s, I think, the main point that we’re making with this announcement.”
The AI coding wars, in other words, are no longer just about which model writes the best code. They’re about who gets to own the model that understands yours.
Cyberattacks succeed because defenses engage too late. Why closing the “window of exposure” is becoming the defining challenge in fraud, identity, and digital trust.
Despite one of the stranger marketing campaigns of recent memory, Highguard has a lot of potential that is being held back by its slow game mode.
Out of nowhere, Apple just released a series of iPhone updates, including one to address an issue for iPhones first released more than a decade ago.
As artificial intelligence reshapes software development, a small startup is betting that the industry’s next big bottleneck won’t be writing code — it will be trusting it.
Theorem, a San Francisco-based company that emerged from Y Combinator’s Spring 2025 batch, announced Tuesday it has raised $6 million in seed funding to build automated tools that verify the correctness of AI-generated software. Khosla Ventures led the round, with participation from Y Combinator, e14, SAIF, Halcyon, and angel investors including Blake Borgesson, co-founder of Recursion Pharmaceuticals, and Arthur Breitman, co-founder of blockchain platform Tezos.
The investment arrives at a pivotal moment. AI coding assistants from companies like GitHub, Amazon, and Google now generate billions of lines of code annually. Enterprise adoption is accelerating. But the ability to verify that AI-written software actually works as intended has not kept pace — creating what Theorem’s founders describe as a widening “oversight gap” that threatens critical infrastructure from financial systems to power grids.
“We’re already there,” said Jason Gross, Theorem’s co-founder, when we asked whether AI-generated code is outpacing human review capacity. “If you asked me to review 60,000 lines of code, I wouldn’t know how to do it.”
Theorem’s core technology combines formal verification — a mathematical technique that proves software behaves exactly as specified — with AI models trained to generate and check proofs automatically. The approach transforms a process that historically required years of PhD-level engineering into something the company claims can be completed in weeks or even days.
Formal verification has existed for decades but remained confined to the most mission-critical applications: avionics systems, nuclear reactor controls, and cryptographic protocols. The technique’s prohibitive cost — often requiring eight lines of mathematical proof for every single line of code — made it impractical for mainstream software development.
Gross knows this firsthand. Before founding Theorem, he earned his PhD at MIT working on verified cryptography code that now powers the HTTPS security protocol protecting trillions of internet connections daily. That project, by his estimate, consumed fifteen person-years of labor.
“Nobody prefers to have incorrect code,” Gross said. “Software verification has just not been economical before. Proofs used to be written by PhD-level engineers. Now, AI writes all of it.”
Theorem’s system operates on a principle Gross calls “fractional proof decomposition.” Rather than exhaustively testing every possible behavior — computationally infeasible for complex software — the technology allocates verification resources proportionally to the importance of each code component.
The approach recently identified a bug that slipped past testing at Anthropic, the AI safety company behind the Claude chatbot. Gross said the technique helps developers “catch their bugs now without expending a lot of compute.”
In a recent technical demonstration called SFBench, Theorem used AI to translate 1,276 problems from Rocq (a formal proof assistant) to Lean (another verification language), then automatically proved each translation equivalent to the original. The company estimates a human team would have required approximately 2.7 person-years to complete the same work.
“Everyone can run agents in parallel, but we are also able to run them sequentially,” Gross explained, noting that Theorem’s architecture handles interdependent code — where solutions build on each other across dozens of files — that trips up conventional AI coding agents limited by context windows.
The startup is already working with customers in AI research labs, electronic design automation, and GPU-accelerated computing. One case study illustrates the technology’s practical value.
A customer came to Theorem with a 1,500-page PDF specification and a legacy software implementation plagued by memory leaks, crashes, and other elusive bugs. Their most urgent problem: improving performance from 10 megabits per second to 1 gigabit per second — a 100-fold increase — without introducing additional errors.
Theorem’s system generated 16,000 lines of production code, which the customer deployed without ever manually reviewing it. The confidence came from a compact executable specification — a few hundred lines that generalized the massive PDF document — paired with an equivalence-checking harness that verified the new implementation matched the intended behavior.
“Now they have a production-grade parser operating at 1 Gbps that they can deploy with the confidence that no information is lost during parsing,” Gross said.
The funding announcement arrives as policymakers and technologists increasingly scrutinize the reliability of AI systems embedded in critical infrastructure. Software already controls financial markets, medical devices, transportation networks, and electrical grids. AI is accelerating how quickly that software evolves — and how easily subtle bugs can propagate.
Gross frames the challenge in security terms. As AI makes it cheaper to find and exploit vulnerabilities, defenders need what he calls “asymmetric defense” — protection that scales without proportional increases in resources.
“Software security is a delicate offense-defense balance,” he said. “With AI hacking, the cost of hacking a system is falling sharply. The only viable solution is asymmetric defense. If we want a software security solution that can last for more than a few generations of model improvements, it will be via verification.”
Asked whether regulators should mandate formal verification for AI-generated code in critical systems, Gross offered a pointed response: “Now that formal verification is cheap enough, it might be considered gross negligence to not use it for guarantees about critical systems.”
Theorem enters a market where numerous startups and research labs are exploring the intersection of AI and formal verification. The company’s differentiation, Gross argues, lies in its singular focus on scaling software oversight rather than applying verification to mathematics or other domains.
“Our tools are useful for systems engineering teams, working close to the metal, who need correctness guarantees before merging changes,” he said.
The founding team reflects that technical orientation. Gross brings deep expertise in programming language theory and a track record of deploying verified code into production at scale. Co-founder Rajashree Agrawal, a machine learning research engineer, focuses on training the AI models that power the verification pipeline.
“We’re working on formal program reasoning so that everyone can oversee not just the work of an average software-engineer-level AI, but really harness the capabilities of a Linus Torvalds-level AI,” Agrawal said, referencing the legendary creator of Linux.
Theorem plans to use the funding to expand its team, increase compute resources for training verification models, and push into new industries including robotics, renewable energy, cryptocurrency, and drug synthesis. The company currently employs four people.
The startup’s emergence signals a shift in how enterprise technology leaders may need to evaluate AI coding tools. The first wave of AI-assisted development promised productivity gains — more code, faster. Theorem is wagering that the next wave will demand something different: mathematical proof that speed doesn’t come at the cost of safety.
Gross frames the stakes in stark terms. AI systems are improving exponentially. If that trajectory holds, he believes superhuman software engineering is inevitable — capable of designing systems more complex than anything humans have ever built.
“And without a radically different economics of oversight,” he said, “we will end up deploying systems we don’t control.”
The machines are writing the code. Now someone has to check their work.