Anthropic says its most powerful AI cyber model is too dangerous to release publicly — so it built Project Glasswing

Anthropic on Tuesday announced Project Glasswing, a sweeping cybersecurity initiative that pairs an unreleased frontier AI model — Claude Mythos Preview — with a coalition of twelve major technology and finance companies in an effort to find and patch software vulnerabilities across the world’s most critical infrastructure before adversaries can exploit them.

The launch partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Anthropic says it has also extended access to more than 40 additional organizations that build or maintain critical software, and is committing up to $100 million in usage credits for Claude Mythos Preview across the effort, along with $4 million in direct donations to open-source security organizations.

The announcement arrives at a moment of extraordinary momentum — and extraordinary scrutiny — for the San Francisco-based AI startup. Anthropic disclosed on Sunday that its annualized revenue run rate has surpassed $30 billion, up from approximately $9 billion at the end of 2025, and the number of business customers each spending over $1 million annually now exceeds 1,000, doubling in less than two months. The company simultaneously announced a multi-gigawatt compute deal with Google and Broadcom. On the same day, Bloomberg reported that Anthropic had poached a senior Microsoft executive, Eric Boyd, to lead its infrastructure expansion.

But Glasswing is something categorically different from a revenue milestone or a compute deal. It’s Anthropic’s most ambitious attempt to translate frontier AI capabilities — capabilities the company itself describes as dangerous — into a defensive advantage before those same capabilities proliferate to hostile actors.

Why Anthropic built a model it considers too dangerous to release publicly

At the center of Project Glasswing sits Claude Mythos Preview, a general-purpose frontier model that Anthropic says has already identified thousands of high-severity zero-day vulnerabilities — meaning flaws previously unknown to software developers — in every major operating system and every major web browser, along with a range of other critical software.

The company is not making the model generally available.

“We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities,” Newton Cheng, Frontier Red Team Cyber Lead at Anthropic, told VentureBeat in an exclusive interview. “However, given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout — for economies, public safety, and national security — could be severe.”

That language — “the fallout could be severe” — is striking coming from the company that built the model. Anthropic is effectively arguing that the tool it created is powerful enough to reshape the cybersecurity landscape, and that the only responsible thing to do is to keep it restricted while giving defenders a head start.

The technical results reinforce that claim. According to Anthropic’s press release, Mythos Preview was able to find nearly all of the vulnerabilities it surfaced, and develop many related exploits, entirely autonomously, without any human steering. Three examples stand out: The model found a 27-year-old vulnerability in OpenBSD — widely regarded as one of the most security-hardened operating systems in the world and commonly used to run firewalls and critical infrastructure. The flaw allowed an attacker to remotely crash any machine running the OS simply by connecting to it. It also discovered a 16-year-old vulnerability in FFmpeg — the near-ubiquitous video encoding and decoding library — in a line of code that automated testing tools had exercised five million times without ever catching the problem. And perhaps most alarmingly, Mythos Preview autonomously found and chained together several vulnerabilities in the Linux kernel to escalate from ordinary user access to complete control of the machine.

All three vulnerabilities have been reported to the relevant maintainers and have since been patched. For many other vulnerabilities still in the remediation pipeline, Anthropic says it is publishing cryptographic hashes of the details today, with plans to reveal specifics after fixes are in place.

On the CyberGym evaluation benchmark, Mythos Preview scored 83.1%, compared to 66.6% for Claude Opus 4.6, Anthropic’s next-best model. The gap is even wider on coding benchmarks: Mythos Preview achieves 93.9% on SWE-bench Verified versus 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro versus 53.4%.

How Anthropic plans to disclose thousands of zero-days without overwhelming open-source maintainers

Finding thousands of zero-days at once sounds impressive. Actually handling the output responsibly is a logistical nightmare — and one of the sharpest criticisms that security researchers have raised about AI-driven vulnerability discovery. Flooding open-source maintainers, many of whom are unpaid volunteers, with an avalanche of critical bug reports could easily do more harm than good.

Cheng told VentureBeat that Anthropic has built a triage pipeline specifically to manage this problem. “We triage every bug that we find and then send the highest severity bugs to professional human triagers we have contracted to assist in our disclosure process by manually validating every bug report before we send it out to ensure that we send only high-quality reports to maintainers,” he said.

That pipeline is designed to prevent exactly the scenario that maintainers fear most: an automated firehose of unverified reports. “We do not submit large volumes of findings to a single project without first reaching out in an effort to agree on a pace the maintainer can sustain,” Cheng added.

When Anthropic has access to the source code, the company aims to include a candidate patch with every report, labeled by provenance — meaning the maintainer knows the patch was written or reviewed by a model — and offers to collaborate on a production-quality fix. “Models can write patches,” Cheng noted, “but there are many factors that impact patch quality, and we strongly recommend that autonomously-written patches are put under the same scrutiny and testing that human-written patches are.”

On disclosure timelines, Anthropic says it follows a coordinated vulnerability disclosure framework. Once a patch is available, the company will generally wait 45 days before publishing full technical details, giving downstream users time to deploy the fix before exploitation information becomes public. Cheng said the company may shorten that buffer “if the details are already publicly known through other channels, or if earlier publication would materially help defenders identify and mitigate ongoing attacks,” or extend it “when patch deployment is unusually complex or the affected footprint is unusually broad.”

Those are reasonable principles, but they will be tested at a scale that no vulnerability disclosure program has ever attempted. The sheer volume of findings — thousands of zero-days across every major platform — means that even a well-designed triage process will face bottlenecks. And the 45-day disclosure window assumes that maintainers can actually produce, test, and ship a patch in that time, which is far from guaranteed for complex kernel-level bugs or deeply embedded cryptographic flaws.

The source code leak, the CMS blunder, and why trust is Anthropic’s biggest vulnerability

The irony of a company claiming to build the most capable cyber model ever constructed while simultaneously suffering a string of embarrassing security lapses has not been lost on observers.

In late March, a draft blog post about Mythos was left in an unsecured and publicly searchable data store — a CMS misconfiguration that exposed roughly 3,000 internal assets, including what appeared to be strategic plans for the model’s rollout. Days later, on March 31, anyone who ran npm install on Claude Code pulled down Anthropic’s complete original source code — 512,000 lines — for approximately three hours due to a packaging error, an incident that drew widespread attention in the developer community and was first reported by VentureBeat.

When asked why partners and governments should trust Anthropic as the custodian of a model it describes as having unprecedented cyber capabilities, Cheng was direct. “Security is central to how we build and ship,” he told VentureBeat. “These two incidents, a blog CMS misconfiguration and an npm packaging error, were human errors in publishing tooling, not breaches of our security architecture. We’ve made changes to prevent these from happening again, and we’ll continue to improve our processes.”

It is a technically accurate distinction — neither incident involved a breach of Anthropic’s core model weights, training infrastructure, or API systems — but it is also a distinction that may prove difficult to sustain as a public argument. For an organization asking governments and Fortune 500 companies to trust it with a tool that can autonomously find and exploit vulnerabilities in the Linux kernel, even minor operational lapses carry outsized reputational risk. The fact that the Mythos leak itself was what first alerted the security community to the model’s existence, weeks before the planned announcement, underscores the point.

What Microsoft, CrowdStrike, and the Linux Foundation found when they tested the model

The coalition’s breadth is notable. It includes direct competitors — Google and Microsoft — alongside cybersecurity incumbents, financial institutions, and the steward of the world’s largest open-source ecosystem. And several partners have already been running Mythos Preview against their own infrastructure for weeks.

CrowdStrike’s CTO Elia Zaitsev framed the initiative in terms of collapsing timelines: “The window between a vulnerability being discovered and being exploited by an adversary has collapsed — what once took months now happens in minutes with AI.” AWS Vice President and CISO Amy Herzog said her teams have already been testing Mythos Preview against critical codebases, where the model is “already helping us strengthen our code.” And Microsoft’s Global CISO Igor Tsyganskiy noted that when tested against CTI-REALM, Microsoft’s open-source security benchmark, “Claude Mythos Preview showed substantial improvements compared to previous models.”

Perhaps the most revealing comment came from Jim Zemlin, CEO of the Linux Foundation, who pointed to the fundamental asymmetry that has plagued open-source security for decades: “In the past, security expertise has been a luxury reserved for organizations with large security teams. Open-source maintainers — whose software underpins much of the world’s critical infrastructure — have historically been left to figure out security on their own.” Project Glasswing, he said, “offers a credible path to changing that equation.”

To back that claim with dollars, Anthropic says it has donated $2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation. Maintainers interested in access can apply through Anthropic’s Claude for Open Source program.

Inside the pricing, the compute deal, and Anthropic’s path toward a potential IPO

After the research preview period — during which Anthropic’s $100 million credit commitment will cover most usage — Claude Mythos Preview will be available to participants at $25 per million input tokens and $125 per million output tokens. Participants can access the model through the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry.

Those prices reflect the model’s computational intensity. The draft blog post that leaked in March described Mythos as a large, compute-intensive model that would be expensive for both Anthropic and its customers to serve. Anthropic’s solution is to develop and launch new safeguards with an upcoming Claude Opus model, allowing the company to “improve and refine them with a model that does not pose the same level of risk as Mythos Preview,” as Cheng told VentureBeat. Security professionals whose legitimate work is affected by those safeguards will be able to apply to an upcoming Cyber Verification Program.

The financial context matters. The same day Project Glasswing launched, Anthropic disclosed its revenue milestone and the Google-Broadcom compute deal. Broadcom signed an expanded deal with Anthropic that will give the AI startup access to about 3.5 gigawatts worth of computing capacity drawing on Google’s AI processors, according to CNBC. The scale of compute being marshaled is staggering — and it helps explain why Anthropic needs both the revenue from enterprise cybersecurity partnerships and the infrastructure to serve a model of Mythos Preview’s size.

The timing also intersects with growing speculation about Anthropic’s path to a public offering. The company is reportedly evaluating an IPO as early as October 2026. A high-profile, government-adjacent cybersecurity initiative with blue-chip partners is exactly the kind of program that burnishes an IPO narrative — particularly when the company can simultaneously point to $30 billion in annualized revenue and a compute footprint measured in gigawatts.

Anthropic says defenders have months, not years, before adversaries catch up

The most consequential question raised by Project Glasswing is not whether Mythos Preview’s capabilities are real — the partner endorsements and patched vulnerabilities suggest they are — but how much time defenders actually have before similar capabilities are available to adversaries.

Cheng was candid about the timeline. “Frontier AI capabilities are likely to advance substantially over just the next few months,” he told VentureBeat. “Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely.” He described Project Glasswing as “an important step toward giving defenders a durable advantage in the coming AI-driven era of cybersecurity” but added a crucial caveat: “It’s important to note, this is a starting point. No one organization can solve these cybersecurity problems alone.”

That framing — months, not years — is worth taking seriously. DARPA launched its original Cyber Grand Challenge in 2016, a competition to create automatic defensive systems capable of reasoning about flaws, formulating patches, and deploying them on a network in real time. At the time, the winning AI-powered bot, Mayhem, finished last when placed against human teams at DEF CON. A decade later, Anthropic is claiming that a frontier AI model can find vulnerabilities that survived 27 years of expert human review and millions of automated security tests — and can chain exploits together autonomously to achieve full system compromise.

The delta between those two data points illustrates why the industry is treating this as a genuine inflection point, not a marketing exercise. Anthropic itself has firsthand experience with the offensive side of this equation: the company disclosed in November 2025 that a Chinese state-sponsored group achieved 80 to 90 percent autonomous tactical execution using Claude across approximately 30 targets, according to Anthropic’s misuse report.

Project Glasswing arrives during one of the most turbulent weeks in Anthropic’s history. In the span of days, the company has announced a model it considers too dangerous for public release, disclosed that its revenue has tripled, sealed a multi-gigawatt compute deal, hired a senior Microsoft executive, made it more expensive for Claude Code subscribers to use third-party tools like OpenClaw, and weathered a major outage of its Claude chatbot on Tuesday morning. Anthropic says it will report publicly on what it has learned within 90 days. In the medium term, the company has proposed that an independent, third-party body might be the ideal home for continued work on large-scale cybersecurity projects.

Whether any of that is fast enough depends on a race that is already underway. Anthropic built a model that can autonomously crack open the most hardened operating systems on the planet — and is now betting that sharing it with defenders, under careful restrictions, will do more good than the inevitable moment when similar capabilities land in less careful hands. It is, in essence, a wager that transparency can outrun proliferation. The next few months will determine whether that bet pays off, or whether the glasswing’s wings were never quite opaque enough to hide what was coming.

LLM-referred traffic converts at 30-40% — and most enterprises aren’t optimizing for it

For more than two decades, digital discovery has operated on a simple model: search, scan, click, decide.

That worked when humans were the ones doing the web searching; but with the advent of AI agents, the primary consumer of information is no longer always human.

This is giving rise to a new paradigm: Answer engine optimization (AEO), also referred to as generative engine optimization (GEO). Because agents look at data much differently than humans do, success is no longer defined by rankings and clicks, but whether content is understood, selected, and cited by AI systems.

The SEO model that the web was built on simply isn’t going to cut it anymore, and enterprises need to prepare now.

How LLMs interpret web content

Traditional SEO is built around keywords, rankings, page-level optimization, and click-through rates. Users manually search across multiple sources and click around to get what they need. Simple, but sometimes frustrating and a definite time suck.

But AEO operates on a whole different level. Agents are increasingly taking over users’ workflows: Claude Code, OpenClaw, CrewAI, Microsoft Copilot, AutoGen, LangChain, Agent Bricks, Agentforce, Google Vertex, Perplexity’s web interface, and whatever else comes along.

These agents do not “browse” the web the way humans do. They analyze user intent based not just on phrasing, but persistent memory and context from past sessions (rather than simple autocomplete). They require materials that are concise, structured, and to the point.

What’s more, agents are moving beyond browsing to delegation, handling more downstream work. What started as “search, read, decide,” evolves to “agent retrieves, agent summarizes, human decides” (and, beyond that, “agent acts → human validates”).

“In practice, AEO begins where SEO stops,” said Dustin Engel, founder of consultancy company Elegant Disruption. “AEO is the next layer of discovery,” or “zero-click discovery.”

In this new world where agents synthesize answers, users may never even see an enterprise’s website, click-through rates decline, and attribution and citability (rather than pure visibility, or showing up at the top of a list of blue links) become critical.

“The new default is closer to a citation map: Where the model is pulling from, how often you show up, and how you are described,” Engel said.

Some, like Adam Yang of Q&A platform Quora, argue that AEO is already becoming the default over SEO.

This is for “a certain class of queries,” Yang notes. Any question where the user wants a synthesized answer — “what’s the best approach to X,” “compare these two options,” “what do I need to know about Y” — is increasingly resolved by an AI without a click.

Google’s own AI Overviews are already accelerating this on the consumer side, many analysts note. “SEO isn’t dead,” Yang said. “But the optimization target has shifted from ‘rank on page 1’ to ‘get cited in the answer.’”

How devs are already using AI agents

Are there scenarios where regular search/Googling is still the best option?

“Absolutely,” said analyst Wyatt Mayham of Northwest AI Consulting. Notably, for personal tasks like finding nearby restaurants or local service providers. The interface is “just better” in those cases because it integrates maps, reviews, and photos. “That experience is hard to beat right now,” he said.

For work-related research, though, he says he’s “barely” using traditional search anymore, and it’s getting “closer to zero” every month.

“When I need to understand a company or a person professionally, agents do it faster and give me a more useful output than a page of blue links ever did,” he said.

His firm uses autonomous agents “heavily,” and built a Claude Skills function that powers its sales operation. Before a discovery call with a prospect, team members can trigger a skill that pulls the contact’s LinkedIn profile, scrapes their company website, grabs relevant info from sources like ZoomInfo, and crafts a clear picture of their revenue, team size, tech stack, and pain points.

“By the time I get on a call, I have a tailored research brief ready to go without spending 30 to 45 minutes manually Googling around,” Mayham said.

The big advantage is that these tools run in the background, he noted. You don’t have to sit clicking through browser tabs: You just tell the agent what you need, it does it, and you get a structured output that’s actually useful.

“It’s collapsed what used to be a full hour of sales prep into a few minutes,” Mayham said.

Carlos Dutra, CEO and founder of IT consultancy Vindler Solutions, said Claude Code has “genuinely changed” his daily workflow. He uses it for most of his coding work, and what surprised him wasn’t the speed, but the fact that he didn’t need to open and keep track of browser tabs.

“Not because I’m lazy, but because the answers are better,” he said. He still uses Google for some tasks: Pricing pages, recent news, anything that needs to be current.

“But for technical reasoning? Agents have mostly replaced search for me personally,” he said.

Quora’s Yang has had a similar experience. He’s been using Claude Code daily for the past few months, primarily for content strategy, knowledge management, and competitive research. Workflows that used to take him half a day now take 30 minutes.

But what’s been most advantageous is that he can now run research and synthesis tasks in parallel that he previously had to do sequentially. Also helpful is that agents’ context retention across sessions is “meaningfully better” than web-based tools.

When he needs to understand a concept, map a competitive landscape, or synthesize industry trends, Claude or Perplexity are the go-to before opening a browser tab. “I’ve started treating agent search as my first stop, not Google. Traditional search is now where I verify, not where I discover.”

The kinks are real, though. Mayham pointed out that LinkedIn, in particular, is “aggressive” about blocking automated access, and many other sites have (or are implementing) similar protections. Users will hit walls when agents can’t get through, so a fallback plan is important for those relying on agents.

“The reliability isn’t 100% yet, and that’s probably the biggest thing holding broader adoption back,” he said.

Mayham’s advice for other devs: Stop chasing shiny objects. A new AI tool launches “practically every day,” and many (experienced devs included) are jumping from platform to platform without ever going deep with any of them.

“Pick a model, go deep, build real workflows on it,” he emphasized. “You’ll get more value from mastery of one platform than surface-level experimentation across five.”

How enterprises can compete in an AEO-driven world

When AI agents do the searching, the rules change. The question is no longer whether your content ranks on the first page, it’s whether the model selects you as the source when generating an answer.

Structure matters much more than it used to. Content should:

  • Be organized around conversational intent, provide direct answers, and mirror real user questions and follow-ups;

  • Be authoritative and reflect strong expertise;

  • Be fresh (and, when necessary, regularly refreshed);

  • Have clear headers and established FAQ schema.

Another must is maintaining a strong brand presence across the forums and platforms — Wikipedia, Reddit, LinkedIn, industry publications — that models are trained on. Enterprises might also consider investing in original data, like research.

In Mayham’s experience, when a business gets recommended by an LLM during a search-style query, the conversion rate is “dramatically higher” than traditional channels. For his company, LLM-referred traffic is converting at 30 to 40%, which “blows away what we see from SEO or paid social.”

“The intent signal is just different when someone is having a conversation with an AI and it recommends you by name.”

Discoverability inside LLMs will matter as much as Google rankings, “maybe more,” Mayham said. “It’s a whole new surface for customer acquisition that most businesses aren’t even thinking about yet.”

Vindler’s Dutra agreed that the “uncomfortable truth” is that most enterprise content is becoming “basically invisible” in agent-driven queries. “AEO is about whether your content survives being chunked, embedded, and semantically retrieved,” he said.

The companies getting ahead aren’t doing anything “exotic,” he noted. They have clean, declarative content that doesn’t require context to understand. Those still writing copy stuffed with keywords are going to fall behind because LLMs care about semantic clarity.

A quick test he gives clients: Ask an LLM a question your page is supposed to answer, without giving it the URL. “If it can’t construct the answer from your content, you have a problem.”

Jeff Oxford of SEO agency Visibility Labs offers valuable step-by-step advice:

  1. Engage in conversations on Reddit, which is one of the most-cited domains in AI search. Enterprises should establish a positive reputation on Reddit, and engage on any relevant threads where customers are asking for recommendations.

  2. Build a strong YouTube presence. According to Ahrefs, which tracks internet behavior, YouTube mentions have the “strongest correlation” with AI visibility across ChatGPT, AI Mode, and AI Overviews. “This makes sense, since both Google and OpenAI have trained their models on YouTube transcripts,” Oxford said, “and YouTube is the most-cited domain in Google’s AI products.”

  3. Invest in digital PR and brand mentions; the latter is the second-highest correlated factor with AI visibility. “Brands need to improve their digital presence by being in as many places as possible,” Oxford said.

  4. Create content aligned with AI citation patterns. Enterprises should audit the prompts and topics where AI search engines are surfacing competitors, then create authoritative content on those same topics.

“The goal is to become a source that AI models consider worth citing,” he noted.

Still, there may be a lot of unnecessary hype around how drastically enterprises need to change, said Shashi Bellamkonda, principal research director at consultancy firm Info-Tech Research Group.

Those following best practices of producing content that their audience actually needs, written by experts and showcasing expert opinion, are in a good position to be cited in AI-powered search.

He pointed out that Google developed an EEAT framework (experience, expertise, authority, and trust) to evaluate content quality and helpfulness and help algorithms identify reliable, high-quality information.

To stand out, enterprises should use structured data and schema to signal the context: Is this an article, a research study, a product overview? “Original long-form content will be valued by AI-powered answer engines,” Bellamkonda said. “Copycat strategies or trying to game the system are taboo in this era.”

Experts should also share their thoughts across several channels, and “About Us” pages must be “robust” and include bios highlighting thought leaders’ expertise.

“Ultimately, the reputation of AI-powered search is in making sure the user likes the search rather than what you think they should read,” Bellamkonda said. “So a good focus on the end user is a great way to succeed.”

Sony Reveals More Details Of Its New Flagship RGB LED TVs – Including ‘True RGB’ Branding

Sony is on a mission to make everyone realize that all RGB LED TVs are far from equal.

Satechi’s Launches New 3-In-1 Charging Stand With Fast Qi 2 Charging

The new charging stand from Satechi supports the latest Qi2 25W wireless charging standard and can charge iPhones, AirPods and Apple Watches at the same time.

Taiwan’s Acer Gadget Takes 30% Stake In Plugable Technologies

Plugable says the deal announced today will let it plug into Acer Gadget’s global product development and supply chain resources to bring products to market more quickly.

Block introduces Managerbot, a proactive Square AI agent and the clearest proof point yet for Jack Dorsey’s AI bet

Block today unveiled Managerbot, a new AI agent embedded in the Square platform that proactively monitors a seller’s business, identifies emerging problems, and proposes actionable solutions — without the seller ever having to ask a question. The product marks the most tangible manifestation of CEO Jack Dorsey’s controversial bet that artificial intelligence can fundamentally reshape how his company operates, builds products, and serves the millions of small businesses that depend on Square to run day-to-day commerce.

In an exclusive interview with VentureBeat, Willem Avé, Block’s head of product at Square, described Managerbot as a decisive break from the company’s earlier Square AI assistant, which functioned as a reactive chatbot that answered seller questions about sales, employees, and business performance.

“The big shift from Square AI to Managerbot is really from reactive to proactive,” Avé said. “What that means is the primary interface is not a question box. You assign tasks to Managerbot, and that could be based on data, an insight, or a signal from your business.”

The product is beginning to roll out now, with full availability to Square sellers expected over the coming months. Block declined to say whether Managerbot would carry an additional fee or be bundled into existing Square subscriptions.

How Managerbot predicts inventory shortages, optimizes schedules, and writes marketing campaigns on its own

Avé outlined three core domains where Managerbot operates today: inventory forecasting, employee shift scheduling, and automated marketing campaign creation. In every case, the agent acts before the seller does — watching over the business, detecting patterns, and surfacing recommendations with proposed actions attached.

In the inventory domain, Managerbot continuously monitors a seller’s stock levels, sales velocity, and external signals such as weather patterns and local events, then alerts the seller when an item is about to run out — or when it should stock up ahead of anticipated demand. “In warmer weather, we can see that you sell more of a certain good,” Avé explained. “That’s the forecasting capability, combined with local data — weather, events — so we can help sellers manage both their inventory and cash flows.”

For shift scheduling — a task that Avé described as “one of those interesting, very hard computer science problems” that consumes hours of a small business owner’s week — Managerbot analyzes forecasted sales data and then generates optimized employee schedules that balance worker preferences with coverage needs. “It turns out that frontier models are actually pretty good at it,” Avé said.

The third capability tackles what Avé called “the whole bucket of things that sellers could do if they had more time” — principally marketing. Managerbot identifies sales trends across a seller’s catalog and automatically drafts win-back campaigns and promotional outreach targeted at a store’s best customer segments. Avé said Block is seeing “very meaningful lift” from Managerbot-generated campaigns compared to what some sellers create manually, though he declined to share specific performance figures publicly.

Block built Managerbot on frontier AI models from OpenAI and Anthropic — but says the real innovation is underneath

Managerbot runs on third-party frontier models — Avé specifically referenced Anthropic’s Sonnet and OpenAI’s GPT family — but Block’s competitive advantage, he argued, lies in the “agent harness” the company has built around those models. That harness draws heavily on Goose, Block’s open-source agent framework, and incorporates learnings from its consumer-facing Money Bot on Cash App.

The challenge specific to Square is scale and complexity. A seller running a small business might interact with hundreds of different tools across invoicing, inventory, customer management, marketing, payroll, and scheduling. Managerbot must navigate all of them coherently within a single agentic loop. “This isn’t like, you know, you load a skill and call it a day — think about hundreds of skills,” Avé said. “Actually, managing the context and managing the way that we progressively disclose tools, and some of the other innovation that we have at the harness layer, is I think some of the secret sauce.”

A critical design decision shapes every interaction: Managerbot does not autonomously execute changes to a seller’s business. Every write action — whether adjusting a shift schedule, publishing a marketing campaign, or modifying inventory — requires explicit seller approval. To facilitate that approval, Managerbot generates visual UI previews showing exactly what will change before the seller clicks “yes.” “We want to earn trust with sellers, so any write action is prompted to the user to approve,” Avé said. “The seller needs a visual representation of what the change is. You can’t just describe in words all the time what you’re going to go do.”

An $80 million fine and chatbot blunders hang over Block’s push to automate financial recommendations

That human-in-the-loop caution reflects a sensitivity that gains additional weight given Block’s recent history. In January 2025, 48 state financial regulators imposed an $80 million fine on Block for violations of Bank Secrecy Act and anti-money laundering laws related to Cash App. The Connecticut Department of Banking stated in announcing the settlement that regulators “found Block was not in compliance with certain requirements, creating the potential that its services could be used to support money laundering, terrorism financing, or other illegal activities.” The Illinois Department of Financial and Professional Regulation simultaneously joined the coordinated enforcement action.

Separately, reporting from The Guardian has documented instances of Block’s customer-facing chatbots making serious errors, including telling customers to cancel or close their accounts. When VentureBeat raised this concern during the interview, Avé acknowledged the stakes but redirected to Managerbot’s specific safeguards.

“Financial accuracy and financial data — the value of these products really come from recommendations,” Avé said. “We need to be better than whatever you can feed to ChatGPT. If you take a CSV of your sales and put it in ChatGPT or Claude, we need our product to be better and answer that question either more accurately or better than what’s available in the market.” He pointed to the harness layer’s role in reducing hallucinations through tuning, prompt engineering, and optimized tool-call loops, while acknowledging the inherent limitations of probabilistic systems: “It’s never going to be zero. Obviously, these are probabilistic systems, and we have guidance and call-outs in the tool to provide that.” On regulated domains like lending and payments, Avé was more definitive: “In any sort of regulated domains — banking, lending, payments — there are strict guardrails on what we can and can’t say to sellers. Those are just part of the product and business.”

Dorsey cut 4,000 jobs in the name of AI — Managerbot is the first answer to what those tools are actually building

It is impossible to evaluate Managerbot outside the context of the radical organizational surgery Block performed just weeks ago. In late February, Dorsey announced that Block would cut more than 4,000 of its roughly 10,000 employees — nearly half the workforce — explicitly citing AI as the driving rationale. As the BBC reported, Dorsey wrote that “AI fundamentally changes what it means to build and run a company.” Block’s stock surged more than 20 percent on the news, according to ABC7.

The company’s Q4 2025 earnings report, released alongside the layoff announcement, showed gross profit of $2.87 billion — up 24 percent year over year — and raised 2026 guidance to $12.2 billion in gross profit, according to AlphaSense’s earnings analysis. Block also reported a greater than 40 percent increase in production code shipped per engineer since September 2025 through the use of agentic coding tools. As CNBC commentator Steve Sedgwick wrote in an opinion piece following the announcement, “I keep getting told on CNBC that AI will create new jobs to replace those being lost. I’ve been asking the same question for years now.” The Observer’s Mark Minevich was more pointed, calling Block’s layoffs “probably the first legitimate mass layoff driven by A.I. as the actual operating thesis.”

Managerbot, then, is the product answer to the obvious follow-up question: if Block shed 4,000 workers in the name of intelligence tools, what exactly are those intelligence tools building? Avé framed the product as proof of concept for Block’s entire strategic thesis. “Block has been in the press recently about rebuilding as an intelligence company, and it’s like, a lot of people are asking, ‘What does that mean for us?'” Avé said. “What I like to do is show, not tell. We’re building Managerbot, which I think is one of the more advanced, maybe the most advanced, small business agent out there today.”

Sellers who use Managerbot are consolidating their businesses onto Square — and that may be the real strategic payoff

Perhaps the most consequential signal Avé shared was an early behavioral pattern: sellers who begin using Managerbot are voluntarily migrating more of their business operations onto the Square platform, consolidating payroll, time cards, and shift scheduling into Block’s ecosystem to feed the agent more data. “When they start interacting with Managerbot, they want to move more of their business onto Square because they see the value,” Avé said. “They’re like, ‘I should put my payroll here. I should get time cards here. I should get my shift schedules here,’ because once all that data is in one place, they can make better decisions and manage their business better.”

This dynamic could prove to be Managerbot’s most significant long-term effect — not as a standalone feature, but as a gravitational force pulling sellers deeper into Block’s integrated commerce stack. Block’s Q4 earnings already showed Square’s new volume added grew 29 percent year over year, with sales-led NVA surging 62 percent. Avé also argued that Square’s first-party architecture — built organically rather than through acquisitions — gives it a structural advantage over competitors in the AI era. “We’ve kind of harmonized and canonicalized this data at a sensible layer,” he said. “It’s not super hard to create more skills for these data domains.”

When VentureBeat pressed Avé on the tension between helping sellers and upselling them on Block’s own financial products — lending, payments processing, and other services that generate revenue for the company — he acknowledged the concern but framed Managerbot’s mission in terms of decision-making quality. “The goal for Managerbot is to help sellers increase their decision-making correctness,” Avé said. “If we can make sellers better at running their business by making better decisions and giving time back, I think that’s a good thing.”

Block says Managerbot isn’t a chatbot — it’s a business protector that compounds the company’s entire AI strategy

Avé was insistent that Managerbot represents something categorically different from the chatbot-as-advisor model that has proliferated across enterprise software. “A lot of people are building chatbots as advisors — it can answer a question for you,” he said. “What we really want Managerbot to be is a protector of your business. This is identifying trends. This is spotting things that you might have missed. This is helping you run your business and take actions.”

He also argued that the agent model compounds Block’s development velocity in ways that traditional software cannot match. “It’s much more straightforward to add a capability to Managerbot than it is to build a big Web 2.0 UI,” Avé said. “If we can deliver more capabilities, more features, more value to our sellers, the whole system compounds.”

Whether that compounding materializes — and whether sellers ultimately experience Managerbot as a trusted protector or a sophisticated upsell engine — will determine much about Block’s future. The company has staked its corporate identity, its headcount, and its Wall Street narrative on the conviction that AI agents can deliver more value with fewer humans in the loop. Managerbot is the first product to carry the full weight of that promise. And the small business owners who keep their shops open with Square terminals, who juggle shift schedules on napkins and skip marketing because there aren’t enough hours in the day — they didn’t ask to be the test case for Silicon Valley’s boldest AI thesis. But as of today, they are.

Can Decentralized Exchanges Finally Compete With Wall Street Speed?

New highspeed decentralized exchanges aim to rival Wall Street by combining low-latency execution with advanced risk models, potentially reshaping crypto market structure

Hundreds Of Millions Of iPhones Are Still On iOS 18: Here’s A New Reason To Upgrade Now

If you’re one of the millions of iPhone users who could upgrade to iOS 26 but hasn’t, the latest report shows a new reason, and it’s not security-related.

Microsoft’s Agent Stack Confuses Developers While Rivals Simplify

Microsoft ships Agent Framework 1.0 but Azure’s agent stack still spans too many surfaces while Google and AWS offer cleaner developer paths.

The Laptop That Refuses To Die: Inside Framework’s Mission To Fix Tech

Framework takes a different approach with its laptops. From modular technology to Linux support, the company is challenging the industry.