Adobe Rolls Out Smarter AI Image Editing In Photoshop And Firefly

Adobe is rolling out more AI-powered tools to its creative software. The software giant says the latest update bring more AI power to Photoshop and Firefly.

‘The future of the space economy’: Colorado startup Lux Aeterna raises $10 million to develop reusable satellites

The Colorado company Lux Aeterna wants to help open up the space economy with a fleet of fully reusable satellites, and it just raised to some money to help make that happen.

New Interactive Sky Formula 1 Features Let Viewers Take The Wheel

New side-bar interface lets you access race replays, driver cams, background details, live in-race standings and more on your terms.

Plugable Ships “Build Your Own” Thunderbolt 5 Local AI Enclosure

Plugable’s new TBT5-AI enclosure lets users plug workstation-class power into their PC by hosting a user-supplied GPU at their desk, bypassing cloud subscription fees.

OpenAI Acquires Promptfoo To Embed Security Testing Into Its Agents

OpenAI acquires Promptfoo to embed AI red-teaming and security testing directly into its Frontier agent platform, signaling that agent safety is now table stakes.

Andrej Karpathy’s new open source ‘autoresearch’ lets you run hundreds of AI experiments a night — with revolutionary implications

Over the weekend, Andrej Karpathy—the influential former Tesla AI lead and co-founder and former member of OpenAI who coined the term “vibe coding”— posted on X about his new open source project, autoresearch.

It wasn’t a finished model or a massive corporate product: it was by his own admission a simple, 630-line script made available on Github under a permissive, enterprise-friendly MIT License. But the ambition was massive: automating the scientific method with AI agents while us humans sleep.

“The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement,” he stated on X.

The system functions as an autonomous optimization loop. An AI agent is given a training script and a fixed compute budget (typically 5 minutes on a GPU).

It reads its own source code, forms a hypothesis for improvement (such as changing a learning rate or an architecture depth), modifies the code, runs the experiment, and evaluates the results.

If the validation loss—measured in bits per byte (val_bpb)—improves, it keeps the change; if not, it reverts and tries again. In one overnight run, Karpathy’s agent completed 126 experiments, driving loss down from 0.9979 to 0.9697.

Today, Karpathy reported that after leaving the agent to tune a “depth=12” model for two days, it successfully processed approximately 700 autonomous changes.

The agent found roughly 20 additive improvements that transferred perfectly to larger models. Stacking these changes dropped the “Time to GPT-2” metric on the leaderboard from 2.02 hours to 1.80 hours—an 11% efficiency gain on a project Karpathy believed was already well-tuned.

“Seeing the agent do this entire workflow end-to-end and all by itself… is wild,” Karpathy remarked, noting that the agent caught oversights in attention scaling and regularization that he had missed manually over two decades of work.

This is more than just a productivity hack; it is a fundamental shift in how intelligence is refined. By automating the “scientific method” for code, Karpathy has turned machine learning into an evolutionary process that runs at the speed of silicon rather than the speed of human thought.

And more than this, it showed the broader AI and machine learning community on X that this type of process could be applied far beyond computer science, to fields like marketing, health, and, well, basically anything that requires research.

Autoresearch spreads far and wide

The reaction was swift and viral, with Karpathy’s post garnering more than 8.6 million views in the intervening two days as builders and researchers scrambled to scale the “Karpathy loop”.

Varun Mathur, CEO of AI tool aggregator platform Hyperspace AI, took the single-agent loop and distributed it across a peer-to-peer network. Every node running the Hyperspace agent became an autonomous researcher.

On the night of March 8–9, 35 autonomous agents on the Hyperspace network ran 333 experiments completely unsupervised. The results were a masterclass in emergent strategy:

  • Hardware Diversity as a Feature: Mathur noted that while H100 GPUs used “brute force” to find aggressive learning rates, CPU-only agents on laptops were forced to be clever. These “underdog” agents focused on initialization strategies (like Kaiming and Xavier init) and normalization choices because they couldn’t rely on raw throughput.

  • Gossip-Based Discovery: Using the GossipSub protocol, agents shared their wins in real-time. When one agent found that Kaiming initialization dropped loss by 21%, the idea spread through the network like a digital virus. Within hours, 23 other agents had incorporated the discovery into their own hypotheses.

  • The Compression of History: In just 17 hours, these agents independently rediscovered ML milestones—such as RMSNorm and tied embeddings—that took human researchers at labs like Google Brain and OpenAI nearly eight years to formalize.

Run 36,500 marketing experiments each year instead of 30

While the ML purists focused on loss curves, the business world saw a different kind of revolution. Eric Siu, founder of ad agency Single Grain, applied autoresearch to the “Experiment Loop” of marketing.

“Most marketing teams run ~30 experiments a year,” Siu wrote on X. “The next generation will run 36,500+. Easily.” He continued:

“They’ll run experiments while they sleep.
Current marketing teams run 20-30 experiments a year. Maybe 52 if they’re ‘good’.
New landing page.
New ad creative.
Maybe a subject line test.
That’s considered “data-driven marketing.”
But the next generation of marketing systems will run 36,500+ experiments per year.”

Siu’s framework replaces the training script with a marketing asset—a landing page, an ad creative, or a cold email. The agent modifies a variable (the subject line or the CTA), deploys it, measures the “positive reply rate,” and keeps or discards.

Siu argues that this creates a “proprietary map” of what resonates with a specific audience—a moat built not of code, but of experiment history. “The companies that win won’t have better marketers,” he wrote, “they’ll have faster experiment loops”.

Community discussion and ‘spoiling’ the validation set

Despite the fervor, the GitHub Discussions revealed a community grappling with the implications of such rapid, automated progress.

The Over-Optimization Trap: Researcher alexisthual raised a poignant concern: “Aren’t you concerned that launching that many experiments will eventually ‘spoil’ the validation set?”. The fear is that with enough agents, parameters will be optimized for the specific quirks of the test data rather than general intelligence.

The Meaning of the Gains: User samionb questioned whether a drop from 0.9979 to 0.9697 was truly noticeable. Karpathy’s response was characteristically direct: “All we’re doing is optimizing performance per compute… these are real and substantial gains”

The Human Element: On X, user witcheer, Head of Growth at crypto platform Yari Finance, documented their own overnight run on a Mac Mini M4, noting that while 26 of 35 experiments failed or crashed, the seven that succeeded revealed that “the model got better by getting simpler”.

This insight—that less is often more—was reached without a single human intervention.

The future: curiosity as the bottleneck

The release of autoresearch suggests a future of research across domains where, thanks to simple AI instruction mechanisms, the role of the human shifts from “experimenter” to “experimental designer.”

As tools like DarkMatter, Optimization Arena, and NanoClaw emerge to support this swarm, the bottleneck of AI progress is no longer the “meat computer’s” (Karpathy’s description of the human brain’s) ability to code—it is our ability to define the constraints of the search.

Andrej Karpathy has once again shifted the vibe. We are no longer just coding models; we are seeding ecosystems that learn while we sleep.

New RØDECaster Console Makes Moving Audio Podcasts To Video Easier

The RØDECaster Video Core comes hot on the heels of the RØDECaster Video S and is the latest product in RØDE’s range of all-in-one video and audio production consoles.

The State Of Venture Capital In 2026: Welcome To The Value Creation Era

AI dominance, liquidity recovery, and selective capital define 2026 venture capital, as investors shift from hype to real value creation across tech sectors.

Anthropic rolls out Code Review for Claude Code as it sues over Pentagon blacklist and partners with Microsoft

Anthropic on Monday released Code Review, a multi-agent code review system built into Claude Code that dispatches teams of AI agents to scrutinize every pull request for bugs that human reviewers routinely miss. The feature, now available in research preview for Team and Enterprise customers, arrives on what may be the most consequential day in the company’s history: Anthropic simultaneously filed lawsuits against the Trump administration over a Pentagon blacklisting, while Microsoft announced a new partnership embedding Claude into its Microsoft 365 Copilot platform.

The convergence of a major product launch, a federal legal battle, and a landmark distribution deal with the world’s largest software company captures the extraordinary tension defining Anthropic’s current moment. The San Francisco-based AI lab is simultaneously trying to grow a developer tools business approaching $2.5 billion in annualized revenue, defend itself against an unprecedented government designation as a national security threat, and expand its commercial footprint through the very cloud platforms now navigating the fallout.

Code Review is Anthropic’s most aggressive bet yet that engineering organizations will pay significantly more — $15 to $25 per review — for AI-assisted code quality assurance that prioritizes thoroughness over speed. It also signals a broader strategic pivot: the company isn’t just building models, it’s building opinionated developer workflows around them.

How a team of AI agents reviews your pull requests

Code Review works differently from the lightweight code review tools most developers are accustomed to. When a developer opens a pull request, the system dispatches multiple AI agents that operate in parallel. These agents independently search for bugs, then cross-verify each other’s findings to filter out false positives, and finally rank the remaining issues by severity. The output appears as a single overview comment on the PR along with inline annotations for specific bugs.

Anthropic designed the system to scale dynamically with the complexity of the change. Large or intricate pull requests receive more agents and deeper analysis; trivial changes get a lighter pass. The company says the average review takes approximately 20 minutes — far slower than the near-instant feedback of tools like GitHub Copilot’s built-in review, but deliberately so.

“We built Code Review based on customer and internal feedback,” an Anthropic spokesperson told VentureBeat. “In our testing, we’ve found it provides high-value feedback and has helped catch bugs that we may have missed otherwise. Developers and engineering teams use a range of tools, and we build for that reality. The goal is to give teams a capable option at every stage of the development process.”

The system emerged from Anthropic’s own engineering practices, where the company says code output per engineer has grown 200% over the past year. That surge in AI-assisted code generation created a review bottleneck that the company says it now hears about from customers on a weekly basis. Before Code Review, only 16% of Anthropic’s internal PRs received substantive review comments. That figure has jumped to 54%.

Crucially, Code Review does not approve pull requests. That decision remains with human reviewers. Instead, the system functions as a force multiplier, surfacing issues so that human reviewers can focus on architectural decisions and higher-order concerns rather than line-by-line bug hunting.

Why Anthropic thinks $20 per review is a bargain

The pricing will draw immediate scrutiny. At $15 to $25 per review, billed on token usage and scaling with PR size, Code Review is substantially more expensive than alternatives. GitHub Copilot offers code review natively as part of its existing subscription, and startups like CodeRabbit operate at significantly lower price points. Anthropic’s more basic code review GitHub Action — which remains open source — is itself a lighter-weight and cheaper option.

Anthropic frames the cost not as a productivity expense but as an insurance product. “For teams shipping to production, the cost of a shipped bug dwarfs $20/review,” the company’s spokesperson told VentureBeat. “A single production incident — a rollback, a hotfix, an on-call page — can cost more in engineer hours than a month of Code Review. Code Review is an insurance product for code quality, not a productivity tool for churning through PRs faster.”

That framing is deliberate and revealing. Rather than competing on speed or price — the dimensions where lightweight tools have an advantage — Anthropic is positioning Code Review as a depth-first tool aimed at engineering leaders who manage production risk. The implicit argument is that the real cost comparison isn’t Code Review versus CodeRabbit, but Code Review versus the fully loaded cost of a production outage, including engineer time, customer impact, and reputational damage.

Whether that argument holds up will depend on the data. Anthropic has not yet published external benchmarks comparing Code Review’s bug-detection rates against competitors, and the spokesperson did not provide specific figures on bugs caught per dollar or developer hours saved when asked directly. For engineering leaders evaluating the tool, that gap in publicly available comparative data may slow adoption, even if the theoretical ROI case is compelling.

What the internal numbers reveal — and what they don’t

Anthropic’s internal usage data provides an early window into the system’s performance characteristics. On large pull requests exceeding 1,000 lines changed, 84% receive findings, averaging 7.5 issues per review. On small PRs under 50 lines, that drops to 31% with an average of 0.5 issues. The company reports that less than 1% of findings are marked incorrect by engineers.

That sub-1% figure is the kind of stat that demands careful unpacking. When asked how “marked incorrect” is defined, the Anthropic spokesperson explained that it means “an engineer actively resolving the comment without fixing it. We’ll continue to monitor feedback and engagement while Code Review is in research preview.”

The methodology matters. This is an opt-in disagreement metric — an engineer has to take the affirmative step of dismissing a finding. In practice, developers under time pressure may simply ignore irrelevant findings rather than actively marking them as wrong, which would cause false positives to go uncounted. Anthropic acknowledged the limitation implicitly by noting the system is in research preview and that it will continue monitoring engagement data. The company has not yet conducted or published a controlled evaluation comparing agent findings against a ground-truth baseline established by expert human reviewers.

The anecdotal evidence is nonetheless striking. Anthropic described a case where a one-line change to a production service — the kind of diff that typically receives a cursory approval — was flagged as critical by Code Review because it would have broken authentication for the service. In another example involving TrueNAS’s open-source middleware, Code Review surfaced a pre-existing bug in adjacent code during a ZFS encryption refactor: a type mismatch that was silently wiping the encryption key cache on every sync. These are precisely the categories of bugs — latent issues in touched-but-unchanged code, and subtle behavioral changes hiding in small diffs — that human reviewers are statistically most likely to miss.

A Pentagon lawsuit casts a long shadow over enterprise AI

The Code Review launch does not exist in a vacuum. On the same day, Anthropic filed two lawsuits — one in the U.S. District Court for the Northern District of California and another in the D.C. Circuit Court of Appeals — challenging the Trump administration’s decision to label the company a supply chain risk to national security, a designation historically reserved for foreign adversaries.

The legal confrontation stems from a breakdown in contract negotiations between Anthropic and the Pentagon. As CNN reported, the Defense Department wanted unrestricted access to Claude for “all lawful purposes,” while Anthropic insisted on two redlines: that its AI would not be used for fully autonomous weapons or mass domestic surveillance. When talks collapsed by a Pentagon-set deadline on February 27, President Trump directed all federal agencies to cease using Anthropic’s technology, and Defense Secretary Pete Hegseth formally designated the company a supply chain risk.

According to CNBC, the complaint alleges that these actions are “unprecedented and unlawful” and are “harming Anthropic irreparably,” with the company stating that contracts are already being cancelled and “hundreds of millions of dollars” in near-term revenue are in jeopardy.

“Seeking judicial review does not change our longstanding commitment to harnessing AI to protect our national security,” the Anthropic spokesperson told VentureBeat, “but this is a necessary step to protect our business, our customers, and our partners. We will continue to pursue every path toward resolution, including dialogue with the government.”

For enterprise buyers evaluating Code Review and other Claude-based tools, the lawsuit introduces a novel category of vendor risk. The supply chain risk designation doesn’t just affect Anthropic’s government contracts — as CNBC reported, it requires defense contractors to certify they don’t use Claude in their Pentagon-related work. That creates a chilling effect that could extend well beyond the defense sector, even as the company’s commercial momentum accelerates.

Microsoft, Google, and Amazon draw a line around Claude’s commercial availability

The market’s response to the Pentagon crisis has been notably bifurcated. While the government moved to isolate Anthropic, the company’s three largest cloud distribution partners moved in the opposite direction.

Microsoft on Monday announced it is integrating Claude into Microsoft 365 Copilot through a new product called Copilot Cowork, developed in close collaboration with Anthropic. As Yahoo Finance reported, the service enables enterprise users to perform tasks like building presentations, pulling data into Excel spreadsheets, and coordinating meetings — the kind of agentic productivity capabilities that sent shares of SaaS companies like Salesforce, ServiceNow, and Intuit tumbling when Anthropic first debuted its Cowork product on January 30.

The timing is not coincidental. As TechCrunch reported last week, Microsoft, Google, and Amazon Web Services all confirmed that Claude remains available to their customers for non-defense workloads. Microsoft’s legal team specifically concluded that “Anthropic products, including Claude, can remain available to our customers — other than the Department of War — through platforms such as M365, GitHub, and Microsoft’s AI Foundry.”

That three of the world’s most powerful technology companies publicly reaffirmed their commitment to distributing Anthropic’s models — on the same day the company sued the federal government — tells enterprise customers something important about the market’s assessment of both Claude’s technical value and the legal durability of the supply chain risk designation.

Data security and what enterprise buyers need to know next

For organizations considering Code Review, the data handling question looms especially large. The system necessarily ingests proprietary source code to perform its analysis. Anthropic’s spokesperson addressed this directly: “Anthropic does not train models on our customers’ data. This is part of why customers in highly regulated industries, from Novo Nordisk to Intuit, trust us to deploy AI safely and effectively.”

The spokesperson did not detail specific retention policies or compliance certifications when asked, though the company’s reference to pharmaceutical and financial services clients suggests it has undergone the kind of security review those industries require.

Administrators get several controls for managing costs and scope, including monthly organization-wide spending caps, repository-level enablement, and an analytics dashboard tracking PRs reviewed, acceptance rates, and total costs. Once enabled, reviews run automatically on new pull requests with no per-developer configuration required.

The revenue figure Anthropic confirmed — a $2.5 billion run rate as of February 12 for Claude Code — underscores just how quickly developer tooling has become a material revenue line for the company. The spokesperson pointed to Anthropic’s recent Series G fundraise for additional context but did not break out what share of total company revenue Claude Code now represents.

Code Review is available now in research preview for Claude Code Team and Enterprise plans. Whether it can justify its premium in a market already crowded with cheaper alternatives will depend on whether Anthropic can convert anecdotal bug catches and internal usage stats into the kind of rigorous, externally validated evidence that engineering leaders with production budgets require — all while navigating a legal and political environment unlike anything the AI industry has previously faced.

What The Claude Code Security Panic Got Wrong About Cybersecurity

Claude Code Security spooked investors but misses the bigger problem. The real risk to enterprises is in SaaS integrations and AI agents nobody is watching.