Hackers are hunting for vulnerable endpoints to deploy Python malware.
As artificial intelligence reshapes software development, a small startup is betting that the industry’s next big bottleneck won’t be writing code — it will be trusting it.
Theorem, a San Francisco-based company that emerged from Y Combinator’s Spring 2025 batch, announced Tuesday it has raised $6 million in seed funding to build automated tools that verify the correctness of AI-generated software. Khosla Ventures led the round, with participation from Y Combinator, e14, SAIF, Halcyon, and angel investors including Blake Borgesson, co-founder of Recursion Pharmaceuticals, and Arthur Breitman, co-founder of blockchain platform Tezos.
The investment arrives at a pivotal moment. AI coding assistants from companies like GitHub, Amazon, and Google now generate billions of lines of code annually. Enterprise adoption is accelerating. But the ability to verify that AI-written software actually works as intended has not kept pace — creating what Theorem’s founders describe as a widening “oversight gap” that threatens critical infrastructure from financial systems to power grids.
“We’re already there,” said Jason Gross, Theorem’s co-founder, when we asked whether AI-generated code is outpacing human review capacity. “If you asked me to review 60,000 lines of code, I wouldn’t know how to do it.”
Theorem’s core technology combines formal verification — a mathematical technique that proves software behaves exactly as specified — with AI models trained to generate and check proofs automatically. The approach transforms a process that historically required years of PhD-level engineering into something the company claims can be completed in weeks or even days.
Formal verification has existed for decades but remained confined to the most mission-critical applications: avionics systems, nuclear reactor controls, and cryptographic protocols. The technique’s prohibitive cost — often requiring eight lines of mathematical proof for every single line of code — made it impractical for mainstream software development.
Gross knows this firsthand. Before founding Theorem, he earned his PhD at MIT working on verified cryptography code that now powers the HTTPS security protocol protecting trillions of internet connections daily. That project, by his estimate, consumed fifteen person-years of labor.
“Nobody prefers to have incorrect code,” Gross said. “Software verification has just not been economical before. Proofs used to be written by PhD-level engineers. Now, AI writes all of it.”
Theorem’s system operates on a principle Gross calls “fractional proof decomposition.” Rather than exhaustively testing every possible behavior — computationally infeasible for complex software — the technology allocates verification resources proportionally to the importance of each code component.
The approach recently identified a bug that slipped past testing at Anthropic, the AI safety company behind the Claude chatbot. Gross said the technique helps developers “catch their bugs now without expending a lot of compute.”
In a recent technical demonstration called SFBench, Theorem used AI to translate 1,276 problems from Rocq (a formal proof assistant) to Lean (another verification language), then automatically proved each translation equivalent to the original. The company estimates a human team would have required approximately 2.7 person-years to complete the same work.
“Everyone can run agents in parallel, but we are also able to run them sequentially,” Gross explained, noting that Theorem’s architecture handles interdependent code — where solutions build on each other across dozens of files — that trips up conventional AI coding agents limited by context windows.
The startup is already working with customers in AI research labs, electronic design automation, and GPU-accelerated computing. One case study illustrates the technology’s practical value.
A customer came to Theorem with a 1,500-page PDF specification and a legacy software implementation plagued by memory leaks, crashes, and other elusive bugs. Their most urgent problem: improving performance from 10 megabits per second to 1 gigabit per second — a 100-fold increase — without introducing additional errors.
Theorem’s system generated 16,000 lines of production code, which the customer deployed without ever manually reviewing it. The confidence came from a compact executable specification — a few hundred lines that generalized the massive PDF document — paired with an equivalence-checking harness that verified the new implementation matched the intended behavior.
“Now they have a production-grade parser operating at 1 Gbps that they can deploy with the confidence that no information is lost during parsing,” Gross said.
The funding announcement arrives as policymakers and technologists increasingly scrutinize the reliability of AI systems embedded in critical infrastructure. Software already controls financial markets, medical devices, transportation networks, and electrical grids. AI is accelerating how quickly that software evolves — and how easily subtle bugs can propagate.
Gross frames the challenge in security terms. As AI makes it cheaper to find and exploit vulnerabilities, defenders need what he calls “asymmetric defense” — protection that scales without proportional increases in resources.
“Software security is a delicate offense-defense balance,” he said. “With AI hacking, the cost of hacking a system is falling sharply. The only viable solution is asymmetric defense. If we want a software security solution that can last for more than a few generations of model improvements, it will be via verification.”
Asked whether regulators should mandate formal verification for AI-generated code in critical systems, Gross offered a pointed response: “Now that formal verification is cheap enough, it might be considered gross negligence to not use it for guarantees about critical systems.”
Theorem enters a market where numerous startups and research labs are exploring the intersection of AI and formal verification. The company’s differentiation, Gross argues, lies in its singular focus on scaling software oversight rather than applying verification to mathematics or other domains.
“Our tools are useful for systems engineering teams, working close to the metal, who need correctness guarantees before merging changes,” he said.
The founding team reflects that technical orientation. Gross brings deep expertise in programming language theory and a track record of deploying verified code into production at scale. Co-founder Rajashree Agrawal, a machine learning research engineer, focuses on training the AI models that power the verification pipeline.
“We’re working on formal program reasoning so that everyone can oversee not just the work of an average software-engineer-level AI, but really harness the capabilities of a Linus Torvalds-level AI,” Agrawal said, referencing the legendary creator of Linux.
Theorem plans to use the funding to expand its team, increase compute resources for training verification models, and push into new industries including robotics, renewable energy, cryptocurrency, and drug synthesis. The company currently employs four people.
The startup’s emergence signals a shift in how enterprise technology leaders may need to evaluate AI coding tools. The first wave of AI-assisted development promised productivity gains — more code, faster. Theorem is wagering that the next wave will demand something different: mathematical proof that speed doesn’t come at the cost of safety.
Gross frames the stakes in stark terms. AI systems are improving exponentially. If that trajectory holds, he believes superhuman software engineering is inevitable — capable of designing systems more complex than anything humans have ever built.
“And without a radically different economics of oversight,” he said, “we will end up deploying systems we don’t control.”
The machines are writing the code. Now someone has to check their work.
WorldLeaks adds Nike to its data leak site, but did it steal customer data?
IT teams are being plagued with false alerts, many are missing or ignoring them entirely – but it can be fixed.
Model Context Protocol has a security problem that won’t go away.
When VentureBeat first reported on MCP’s vulnerabilities last October, the data was already alarming. Pynt’s research showed that deploying just 10 MCP plug-ins creates a 92% probability of exploitation — with meaningful risk even from a single plug-in.
The core flaw hasn’t changed: MCP shipped without mandatory authentication. Authorization frameworks arrived six months after widespread deployment. As Merritt Baer, chief security officer at Enkrypt AI, warned at the time: “MCP is shipping with the same mistake we’ve seen in every major protocol rollout: insecure defaults. If we don’t build authentication and least privilege in from day one, we’ll be cleaning up breaches for the next decade.”
Three months later, the cleanup has already begun — and it’s worse than expected.
Clawdbot changed the threat model. The viral personal AI assistant that can clear inboxes and write code overnight runs entirely on MCP. Every developer who spun up a Clawdbot on a VPS without reading the security docs just exposed their company to the protocol’s full attack surface.
Itamar Golan saw it coming. He sold Prompt Security to SentinelOne for an estimated $250 million last year. This week, he posted a warning on X: “Disaster is coming. Thousands of Clawdbots are live right now on VPSs … with open ports to the internet … and zero authentication. This is going to get ugly.”
He’s not exaggerating. When Knostic scanned the internet, they found 1,862 MCP servers exposed with no authentication. They tested 119. Every server responded without requiring credentials.
Anything Clawdbot can automate, attackers can weaponize.
The vulnerabilities aren’t edge cases. They’re direct consequences of MCP’s design decisions. Here’s a brief description of the workflows that expose each of the following CVEs:
CVE-2025-49596 (CVSS 9.4): Anthropic’s MCP Inspector exposed unauthenticated access between its web UI and proxy server, allowing full system compromise via a malicious webpage.
CVE-2025-6514 (CVSS 9.6): Command injection in mcp-remote, an OAuth proxy with 437,000 downloads, enabled attackers to take over systems by connecting to a malicious MCP server.
CVE-2025-52882 (CVSS 8.8): Popular Claude Code extensions exposed unauthenticated WebSocket servers, enabling arbitrary file access and code execution.
Three critical vulnerabilities in six months. Three different attack vectors. One root cause: MCP’s authentication was always optional, and developers treated optional as unnecessary.
Equixly recently analyzed popular MCP implementations and also found several vulnerabilities: 43% contained command injection flaws, 30% permitted unrestricted URL fetching, and 22% leaked files outside intended directories.
Forrester analyst Jeff Pollard described the risk in a blog post: “From a security perspective, it looks like a very effective way to drop a new and very powerful actor into your environment with zero guardrails.”
That’s not an exaggeration. An MCP server with shell access can be weaponized for lateral movement, credential theft, and ransomware deployment, all triggered by a prompt injection hidden in a document the AI was asked to process.
Security researcher Johann Rehberger disclosed a file exfiltration vulnerability last October. Prompt injection could trick AI agents into transmitting sensitive files to attacker accounts.
Anthropic launched Cowork this month; it expands MCP-based agents to a broader, less security-aware audience. Same vulnerability, and this time it’s immediately exploitable. PromptArmor demonstrated a malicious document that manipulated the agent into uploading sensitive financial data.
Anthropic’s mitigation guidance: Users should watch for “suspicious actions that may indicate prompt injection.”
a16z partner Olivia Moore spent a weekend using Clawdbot and captured the disconnect: “You’re giving an AI agent access to your accounts. It can read your messages, send texts on your behalf, access your files, and execute code on your machine. You need to actually understand what you’re authorizing.”
Most users don’t. Most developers don’t either. And MCP’s design never required them to.
Inventory your MCP exposure now. Traditional endpoint detection sees node or Python processes started by legitimate applications. It doesn’t flag them as threats. You need tooling that identifies MCP servers specifically.
Treat authentication as mandatory. The MCP specification recommends OAuth 2.1. The SDK includes no built-in authentication. Every MCP server touching production systems needs auth enforced at deployment, not after the incident.
Restrict network exposure. Bind MCP servers to localhost unless remote access is explicitly required and authenticated. The 1,862 exposed servers Knostic found suggest most exposures are accidental.
Assume prompt injection attacks are coming and will be successful. MCP servers inherit the blast radius of the tools they wrap. Server wraps cloud credentials, filesystems, or deployment pipelines? Design access controls assuming the agent will be compromised.
Force human approval for high-risk actions. Require explicit confirmation before agents send external email, delete data, or access sensitive information. Treat the agent like a fast but literal junior employee who will do exactly what you say, including things you didn’t mean.
Security vendors moved early to monetize MCP risk, but most enterprises didn’t move nearly as fast.
Clawdbot adoption exploded in Q4 2025. Most 2026 security roadmaps have zero AI agent controls. The gap between developer enthusiasm and security governance is measured in months. The window for attackers is wide open.
Golan is right. This is going to get ugly. The question is whether organizations will secure their MCP exposure before someone else exploits it.
A federal judge ordered a new briefing due Wednesday on whether DHS is using armed raids to pressure Minnesota into abandoning its sanctuary policies, leaving ICE operations in place for now.
From phone spyware and facial recognition to phone unlocking technology and databases and more, this tech powers Trump’s deportation machine.
Two VSCode extensions are harvesting sensitive data and sending it to China.
The London High Court awarded the London-based satirist and human rights activist Ghanem Al-Masarir more than £3 million, after finding the Saudi government hacked his phone and was likely behind a physical attack targeting him in London.
Your web gateway can’t see it. Your cloud access broker can’t see it. Your endpoint protection can’t see it. And yet 95% of organizations experienced browser-based attacks last year, according to Omdia research conducted across more than 1,000 IT and security leaders.
Still, three campaigns in 12 months are making the threat more concrete. ShadyPanda infected 4.3 million users through extensions that had been legitimate for seven years. Cyberhaven’s security extension was weaponized against 400,000 corporate customers on Christmas Eve. Trust Wallet lost $8.5 million from 2,520 wallets in 48 hours. None triggered traditional alerts.
The pattern is consistent: Attackers aren’t exploiting zero-days or bypassing perimeter defenses. They’re operating inside trusted browser sessions — where traditional security tools lose visibility after login.
“Let’s be honest, people are using a browser the majority of their day anyway,” said Sam Evans, CISO of Clearwater Analytics. “Having the major security component in the browser has made our lives very simple.” That convenience is exactly what makes the browser the highest-risk execution environment enterprises still treat as infrastructure, not attack surface.
VentureBeat recently spoke with Elia Zaitsev, CTO of CrowdStrike, about what’s driving these attacks. “The browser has become a prime target because modern adversaries don’t break in, they log in,” he said.
He added that as work, communication, and AI usage move into the browser, attackers increasingly operate inside trusted sessions, abusing valid identities, tokens, and access. Traditional security controls were never designed to stop this kind of activity because they assume “trust-once” access is granted and lack visibility into what happens inside live browser sessions.
Traditional enterprise security stacks were built to inspect traffic before authentication, not behavior after access is granted. Interviews with CISOs already running browser-layer controls reveal six operational patterns that consistently reduce exposure — assuming identity and endpoint foundations are in place.
The Omdia research quantifies the gap: 64% of encrypted traffic goes uninspected, and 65% of organizations lack control over data shared in AI tools, according to the study. LayerX’s Enterprise Browser Extension Security Report 2025 found that 99% of enterprise users have at least one browser extension, 53% with high or critical permissions granting access to cookies, passwords, and page content. Another 17% come from non-official stores, and 26% were sideloaded without IT knowing.
“Traditional endpoint detection products were using some machine learning, and they would get to a probability of maybe 85%,” Evans told VentureBeat. “This could be a threat, but we’re not really sure. How do we take action? Should I pull the fire alarm?”
“At the end of the day, it’s the device the person uses day in and day out that carries the highest risk,” he said.
“For a long time, the browser was treated as a window, not an execution layer,” Zaitsev said. “It was designed for searches and static web access, not for running core business applications or autonomous AI workflows. That’s changed dramatically. Today, SaaS applications, cloud identities, AI tools, and agentic workflows all run through the browser, making it the first line of enterprise execution and defense.”
Browser isolation from Menlo Security, Cloudflare, and Symantec addresses rendering threats by executing web content in remote containers. But thousands of extensions now run locally with privileged access, GenAI tools create new exfiltration paths, and session-based attacks hijack authenticated tokens. Isolation protects users before authentication — not after attackers inherit valid sessions, tokens, and extension privileges.
Trust can be accumulated over years — then weaponized overnight.
The long game. ShadyPanda submitted clean extensions to Chrome and Edge stores in 2018, accumulated Google’s “Featured” and “Verified” badges, then weaponized them seven years later. Clean Master became a remote code execution backdoor running hourly JavaScript downloads — not malware with a fixed function, but a backdoor letting attackers decide what comes next.
The credential hijack. Browser auto-updates function as a software supply chain — and inherit its risks. Cyberhaven attackers phished one developer’s credentials in 2024. The Chrome Web Store approved the malicious upload. Within 48 hours, 400,000 corporate customers had auto-updated to compromised code.
The API key leak. Control planes are attack surfaces, not internal safeguards. Trust Wallet attackers used a leaked Chrome Web Store API key to push malicious updates, bypassing all internal release controls. Around $8.5 million had been drained from wallets by attackers within a couple days. No phishing required. No zero-days. Just the auto-update mechanism doing what it was designed to do.
“Nation-state actors typically exploit browser access for long-term, covert intelligence collection, while financially motivated e-crime groups prioritize speed, using browser-based attacks to harvest credentials, session tokens, and sensitive data for rapid monetization or resale,” Zaitsev said. “Despite different objectives, both rely on the same browser-layer blind spot to operate inside trusted sessions and bypass traditional detection.”
Session hijacking illustrates why this matters. The most important signals are behavioral and contextual, not credentials themselves. That includes how a user interacts with the browser in real-time, whether actions align with expected behavior, how data is being accessed or moved, and whether the session context suddenly changes in ways that indicate abuse.
Once attackers capture a valid token, they replay it from anywhere. Authentication already happened, and MFA already passed. Zaitsev argues that detecting session hijacking early requires correlating in-session browser behavior with identity posture, endpoint signals, and threat intelligence. When those signals are unified, distinguishing a legitimate user from a hijacker becomes possible. That’s something siloed enterprise browsers and legacy security tools can’t see.
GenAI traffic surged 890% in 2024, with organizations now averaging 66 GenAI applications, according to Palo Alto Networks’ State of Generative AI 2025 report. GenAI-related data loss incidents more than doubled, accounting for 14% of all data security incidents.
Evans remembers the board conversation that started it all. “In October 2023, they asked, ‘What are your thoughts on ChatGPT?’ I said it’s an incredible productivity tool, however, I don’t know how we could let our employees use it, because my biggest fear is somebody copies and pastes customer data into it or our source code.”
Legitimate GenAI use and data exfiltration look identical at the network level. Both are encrypted browser sessions sending data to approved SaaS endpoints, often involving copy-and-paste into browser-based tools. The distinction only becomes clear at the browser layer, where you can see what data is being pasted, whether the destination is approved, and whether the behavior matches normal work patterns.
Evans found a balance. “If somebody goes to chatgpt.com, we allow them to use it. They just can’t copy and paste anything into it. They can’t upload any files, but they can ask questions and compare answers with our corporate version.” Employees get AI for research without risking customer data in model training.
“It seems like there’s a new one every five minutes,” Evans said. “Browser-layer controls maintain those categories, so if a new tool shows up, we can feel pretty good that employees won’t be able to copy and paste or upload our data.”
CrowdStrike acquired Seraphic Security and SGNL for a combined $1.16 billion in January 2026, signaling how seriously vendors are betting on the browser layer. Palo Alto Networks bought Talon in 2023.
Two camps are emerging. Island wants enterprises to replace Chrome and Edge entirely with a purpose-built browser, and has reached a $4.8 billion valuation (March, 2025). Menlo Security bets most enterprises won’t switch browsers, so it layers protection on top of whatever employees already use.
The tradeoff is real. Replacement browsers offer deeper control but require adoption. Security layers preserve user choice but see less. Both are winning deals.
Zaitsev says neither approach works without tying browser activity to identity. Authentication tells you who logged in. It doesn’t tell you if that session gets hijacked 10 minutes later, or if the user starts exfiltrating data to an unauthorized GenAI tool. Catching that requires correlating browser behavior with endpoint and identity signals in real time — something most enterprises can’t do yet.
For buyers, the decision isn’t about vendors — it’s about whether browser activity is tied into identity, endpoint, and SOC workflows, or left as a standalone control plane.
Securing the browser that employees actually use matters more than which enterprise browser to deploy. Today’s workforce moves across multiple browsers and managed and unmanaged devices. What matters is visibility and control inside live sessions without breaking how people work.
Evans put it more simply: “I wanted security closer to the end user, on the device they use every day. Having security in the browser made our lives simple. Road warriors dealing with hotel captive portals that normally get blocked by edge products? We don’t worry about that anymore.”
Based on interviews with CISOs running browser-layer controls in production, six patterns keep showing up. One caveat: These assume you already have mature identity and endpoint infrastructure. If you don’t, start there.
Build a complete extension inventory. Use browser management APIs to enumerate every extension, flag anything requesting sensitive permissions, and cross-reference against known-malicious hashes.
Break the auto-update kill chain. Fast patching reduces exposure to known vulnerabilities but creates supply chain risk. Implement version pinning with 48- to 72-hour delays. The Cyberhaven attack was detected in roughly 25 hours. A staged rollout would have contained it.
Move data protection to where data moves. “DLP is where we got the biggest win,” Evans said. “Customer data exfiltration can happen through social media, personal file shares, and web-based email. Being able to block copy-paste into certain site categories, block file uploads was incredibly powerful.”
Eliminate browser sprawl. “It does no good to deploy an enterprise browser when someone can download Opera, or Frank’s browser of the month, and bypass all the controls,” Evans said. Every unmanaged browser is a policy-free zone.
Extend identity into sessions, treat GenAI as unvetted, feed signals to the SOC. Session hijackers inherit valid credentials but not normal behavior patterns. Watch for impossible travel, permission escalation, and bulk access anomalies. Evans found that browser-layer blocking surfaced shadow AI tools employees actually wanted, which IT could then enable properly. And browser telemetry should flow into existing SOC workflows. “The AI does initial triage,” Evans said, “telling analysts where to look based on what we’ve seen before.”
Show the board a working demo. “I didn’t just come with concerns,” Evans said. “I came with a solution. When I explained how enterprise browsers work, the board said, ‘Can you really do it?’ At our July 2024 audit committee, they asked how it was going. I said, ‘Let me show you.’ Pulled up a screenshot — here I am on ChatGPT, tried to paste something, got: ‘Policy prevents this.’ They said, ‘Wow.’ That calmed their nerves.”
The browser security gap is real. The fix isn’t necessarily a new platform purchase. Start by assessing what you have: inventory extensions, delay auto-updates, and enforce data policies at the browser layer with existing tools.
“No security tool is 100% perfect,” Evans said. “But with browser-layer controls deployed, we sleep a lot easier.”
Breach rates won’t improve by stacking more perimeter tools onto architectures that assume trust ends at login. Outcomes improve when you treat the browser as what it’s become: the primary execution environment for enterprise work.