The BondClip is a new wireless mouse with a unique G-shape and is fashioned to complement the stylish design that Apple has popularized with its MacBook laptops.
FleishmanHillard’s Ephraim Cohen reveals how the global PR consultancy is democratizing AI across its workforce through hands-on training and integrated platforms.
Aqua Security’s CEO Mike Dube explains why the 11-year-old company is going back to its roots with a laser focus on runtime protection instead of chasing CNAPP trends.
The latest iPhone update is out. So, should you get it or not?
Chinese AI startup MiniMax, headquartered in Shanghai, has sent shockwaves through the AI industry today with the release of its new M2.5 language model in two variants, which promises to make high-end artificial intelligence so cheap you might stop worrying about the bill entirely.
It’s also said to be “open source,” though the weights (settings) and code haven’t been posted yet, nor has the exact license type or terms. But that’s almost beside the point given how cheap MiniMax is serving it through its API and those of partners.
For the last few years, using the world’s most powerful AI was like hiring an expensive consultant—it was brilliant, but you watched the clock (and the token count) constantly. M2.5 changes that math, dropping the cost of the frontier by as much as 95%.
By delivering performance that rivals the top-tier models from Google and Anthropic at a fraction of the cost, particularly in agentic tool use for enterprise tasks, including creating Microsoft Word, Excel and PowerPoint files, MiniMax is betting that the future isn’t just about how smart a model is, but how often you can afford to use it.
Indeed, to this end, MiniMax says it worked “with senior professionals in fields such as finance, law, and social sciences” to ensure the model could perform real work up to their specifications and standards.
This release matters because it signals a shift from AI as a “chatbot” to AI as a “worker”. When intelligence becomes “too cheap to meter,” developers stop building simple Q&A tools and start building “agents”—software that can spend hours autonomously coding, researching, and organizing complex projects without breaking the bank.
In fact, MiniMax has already deployed this model into its own operations. Currently, 30% of all tasks at MiniMax HQ are completed by M2.5, and a staggering 80% of their newly committed code is generated by M2.5!
As the MiniMax team writes in their release blog post, “we believe that M2.5 provides virtually limitless possibilities for the development and operation of agents in the economy.”
The secret to M2.5’s efficiency lies in its Mixture of Experts (MoE) architecture. Rather than running all of its 230 billion parameters for every single word it generates, the model only “activates” 10 billion. This allows it to maintain the reasoning depth of a massive model while moving with the agility of a much smaller one.
To train this complex system, MiniMax developed a proprietary Reinforcement Learning (RL) framework called Forge. MiniMax engineer Olive Song stated on the ThursdAI podcast on YouTube that this technique was instrumental to scaling the performance even while using the relatively small number of parameters, and that the model was trained over a period of two months.
Forge is designed to help the model learn from “real-world environments” — essentially letting the AI practice coding and using tools in thousands of simulated workspaces.
“What we realized is that there’s a lot of potential with a small model like this if we train reinforcement learning on it with a large amount of environments and agents,” Song said. “But it’s not a very easy thing to do,” adding that was what they spent “a lot of time” on.
To keep the model stable during this intense training, they used a mathematical approach called CISPO (Clipping Importance Sampling Policy Optimization) and shared the formula on their blog.
This formula ensures the model doesn’t over-correct during training, allowing it to develop what MiniMax calls an “Architect Mindset”. Instead of jumping straight into writing code, M2.5 has learned to proactively plan the structure, features, and interface of a project first.
The results of this architecture are reflected in the latest industry leaderboards. M2.5 hasn’t just improved; it has vaulted into the top tier of coding models, approaching Anthropic’s latest model, Claude Opus 4.6, released just a week ago, and showing that Chinese companies are now just days away from catching up to far better resourced (in terms of GPUs) U.S. labs.
Here are some of the new MiniMax M2.5 benchmark highlights:
SWE-Bench Verified: 80.2% — Matches Claude Opus 4.6 speeds
BrowseComp: 76.3% — Industry-leading search & tool use.
Multi-SWE-Bench: 51.3% — SOTA in multi-language coding
BFCL (Tool Calling): 76.8% — High-precision agentic workflows.
On the ThursdAI podcast, host Alex Volkov pointed out that MiniMax M2.5 operates extremely quickly and therefore uses less tokens to complete tasks, on the order $0.15 per task compared to $3.00 for Claude Opus 4.6.
MiniMax is offering two versions of the model through its API, both focused on high-volume production use:
M2.5-Lightning: Optimized for speed, delivering 100 tokens per second. It costs $0.30 per 1M input tokens and $2.40 per 1M output tokens.
Standard M2.5: Optimized for cost, running at 50 tokens per second. It costs half as much as the Lightning version ($0.15 per 1M input tokens / $1.20 per 1M output tokens).
In plain language: MiniMax claims you can run four “agents” (AI workers) continuously for an entire year for roughly $10,000.
For enterprise users, this pricing is roughly 1/10th to 1/20th the cost of competing proprietary models like GPT-5 or Claude 4.6 Opus.
|
Model |
Input |
Output |
Total Cost |
Source |
|
Qwen 3 Turbo |
$0.05 |
$0.20 |
$0.25 |
|
|
deepseek-chat (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
deepseek-reasoner (V3.2-Exp) |
$0.28 |
$0.42 |
$0.70 |
|
|
Grok 4.1 Fast (reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
Grok 4.1 Fast (non-reasoning) |
$0.20 |
$0.50 |
$0.70 |
|
|
MiniMax M2.5 |
$0.15 |
$1.20 |
$1.35 |
|
|
MiniMax M2.5-Lightning |
$0.30 |
$2.40 |
$2.70 |
|
|
Gemini 3 Flash Preview |
$0.50 |
$3.00 |
$3.50 |
|
|
Kimi-k2.5 |
$0.60 |
$3.00 |
$3.60 |
|
|
GLM-5 |
$1.00 |
$3.20 |
$4.20 |
|
|
ERNIE 5.0 |
$0.85 |
$3.40 |
$4.25 |
|
|
Claude Haiku 4.5 |
$1.00 |
$5.00 |
$6.00 |
|
|
Qwen3-Max (2026-01-23) |
$1.20 |
$6.00 |
$7.20 |
|
|
Gemini 3 Pro (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.2 |
$1.75 |
$14.00 |
$15.75 |
|
|
Claude Sonnet 4.5 |
$3.00 |
$15.00 |
$18.00 |
|
|
Gemini 3 Pro (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
Claude Opus 4.6 |
$5.00 |
$25.00 |
$30.00 |
|
|
GPT-5.2 Pro |
$21.00 |
$168.00 |
$189.00 |
For technical leaders, M2.5 represents more than just a cheaper API. It changes the operational playbook for enterprises right now.
The pressure to “optimize” prompts to save money is gone. You can now deploy high-context, high-reasoning models for routine tasks that were previously cost-prohibitive.
The 37% speed improvement in end-to-end task completion means the “agentic” pipelines valued by AI orchestrators — where models talk to other models — finally move fast enough for real-time user applications.
In addition, M2.5’s high scores in financial modeling (74.4% on MEWC) suggest it can handle the “tacit knowledge” of specialized industries like law and finance with minimal oversight.
Because M2.5 is positioned as an open-source model, organizations can potentially run intensive, automated code audits at a scale that was previously impossible without massive human intervention, all while maintaining better control over data privacy, but until the licensing terms and weights are posted, this remains just a moniker.
MiniMax M2.5 is a signal that the frontier of AI is no longer just about who can build the biggest brain, but who can make that brain the most useful—and affordable—worker in the room.
OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company’s first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads.
The partnership arrives at a pivotal moment for OpenAI. The company finds itself navigating a frayed relationship with longtime chip supplier Nvidia, mounting criticism over its decision to introduce advertisements into ChatGPT, a newly announced Pentagon contract, and internal organizational upheaval that has seen a safety-focused team disbanded and at least one researcher resign in protest.
“GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage,” an OpenAI spokesperson told VentureBeat. “Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so use cases such as real-time coding in Codex feel more responsive as you iterate.”
The careful framing — emphasizing that GPUs “remain foundational” while positioning Cerebras as a “complement” — underscores the delicate balance OpenAI must strike as it diversifies its chip suppliers without alienating Nvidia, the dominant force in AI accelerators.
Codex-Spark represents OpenAI’s first model purpose-built for real-time coding collaboration. The company claims the model delivers generation speeds 15 times faster than its predecessor, though it declined to provide specific latency metrics such as time-to-first-token or tokens-per-second figures.
“We aren’t able to share specific latency numbers, however Codex-Spark is optimized to feel near-instant—delivering 15x faster generation speeds while remaining highly capable for real-world coding tasks,” the OpenAI spokesperson said.
The speed gains come with acknowledged capability tradeoffs. On SWE-Bench Pro and Terminal-Bench 2.0 — two industry benchmarks that evaluate AI systems’ ability to perform complex software engineering tasks autonomously — Codex-Spark underperforms the full GPT-5.3-Codex model. OpenAI positions this as an acceptable exchange: developers get responses fast enough to maintain creative flow, even if the underlying model cannot tackle the most sophisticated multi-step programming challenges.
The model launches with a 128,000-token context window and supports text only — no image or multimodal inputs. OpenAI has made it available as a research preview to ChatGPT Pro subscribers through the Codex app, command-line interface, and Visual Studio Code extension. A small group of enterprise partners will receive API access to evaluate integration possibilities.
“We are making Codex-Spark available in the API for a small set of design partners to understand how developers want to integrate Codex-Spark into their products,” the spokesperson explained. “We’ll expand access over the coming weeks as we continue tuning our integration under real workloads.”
The technical architecture behind Codex-Spark tells a story about inference economics that increasingly matters as AI companies scale consumer-facing products. Cerebras’s Wafer Scale Engine 3 — a single chip roughly the size of a dinner plate containing 4 trillion transistors — eliminates much of the communication overhead that occurs when AI workloads spread across clusters of smaller processors.
For training massive models, that distributed approach remains necessary and Nvidia’s GPUs excel at it. But for inference — the process of generating responses to user queries — Cerebras argues its architecture can deliver results with dramatically lower latency. Sean Lie, Cerebras’s CTO and co-founder, framed the partnership as an opportunity to reshape how developers interact with AI systems.
“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience,” Lie said in a statement. “This preview is just the beginning.”
OpenAI’s infrastructure team did not limit its optimization work to the Cerebras hardware. The company announced latency improvements across its entire inference stack that benefit all Codex models regardless of underlying hardware, including persistent WebSocket connections and optimizations within the Responses API. The results: 80 percent reduction in overhead per client-server round trip, 30 percent reduction in per-token overhead, and 50 percent reduction in time-to-first-token.
The Cerebras partnership takes on additional significance given the increasingly complicated relationship between OpenAI and Nvidia. Last fall, when OpenAI announced its Stargate infrastructure initiative, Nvidia publicly committed to investing $100 billion to support OpenAI as it built out AI infrastructure. The announcement appeared to cement a strategic alliance between the world’s most valuable AI company and its dominant chip supplier.
Five months later, that megadeal has effectively stalled, according to multiple reports. Nvidia CEO Jensen Huang has publicly denied tensions, telling reporters in late January that there is “no drama” and that Nvidia remains committed to participating in OpenAI’s current funding round. But the relationship has cooled considerably, with friction stemming from multiple sources.
OpenAI has aggressively pursued partnerships with alternative chip suppliers, including the Cerebras deal and separate agreements with AMD and Broadcom. From Nvidia’s perspective, OpenAI may be using its influence to commoditize the very hardware that made its AI breakthroughs possible. From OpenAI’s perspective, reducing dependence on a single supplier represents prudent business strategy.
“We will continue working with the ecosystem on evaluating the most price-performant chips across all use cases on an ongoing basis,” OpenAI’s spokesperson told VentureBeat. “GPUs remain our priority for cost-sensitive and throughput-first use cases across research and inference.” The statement reads as a careful effort to avoid antagonizing Nvidia while preserving flexibility — and reflects a broader reality that training frontier AI models still requires exactly the kind of massive parallel processing that Nvidia GPUs provide.
The Codex-Spark launch comes as OpenAI navigates a series of internal challenges that have intensified scrutiny of the company’s direction and values. Earlier this week, reports emerged that OpenAI disbanded its mission alignment team, a group established in September 2024 to promote the company’s stated goal of ensuring artificial general intelligence benefits humanity. The team’s seven members have been reassigned to other roles, with leader Joshua Achiam given a new title as OpenAI’s “chief futurist.”
OpenAI previously disbanded another safety-focused group, the superalignment team, in 2024. That team had concentrated on long-term existential risks from AI. The pattern of dissolving safety-oriented teams has drawn criticism from researchers who argue that OpenAI’s commercial pressures are overwhelming its original non-profit mission.
The company also faces fallout from its decision to introduce advertisements into ChatGPT. Researcher Zoë Hitzig resigned this week over what she described as the “slippery slope” of ad-supported AI, warning in a New York Times essay that ChatGPT’s archive of intimate user conversations creates unprecedented opportunities for manipulation. Anthropic seized on the controversy with a Super Bowl advertising campaign featuring the tagline: “Ads are coming to AI. But not to Claude.”
Separately, the company agreed to provide ChatGPT to the Pentagon through Genai.mil, a new Department of Defense program that requires OpenAI to permit “all lawful uses” without company-imposed restrictions — terms that Anthropic reportedly rejected. And reports emerged that Ryan Beiermeister, OpenAI’s vice president of product policy who had expressed concerns about a planned explicit content feature, was terminated in January following a discrimination allegation she denies.
Despite the surrounding turbulence, OpenAI’s technical roadmap for Codex suggests ambitious plans. The company envisions a coding assistant that seamlessly blends rapid-fire interactive editing with longer-running autonomous tasks — an AI that handles quick fixes while simultaneously orchestrating multiple agents working on more complex problems in the background.
“Over time, the modes will blend — Codex can keep you in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when you want breadth and speed, so you don’t have to choose a single mode up front,” the OpenAI spokesperson told VentureBeat.
This vision would require not just faster inference but sophisticated task decomposition and coordination across models of varying sizes and capabilities. Codex-Spark establishes the low-latency foundation for the interactive portion of that experience; future releases will need to deliver the autonomous reasoning and multi-agent coordination that would make the full vision possible.
For now, Codex-Spark operates under separate rate limits from other OpenAI models, reflecting constrained Cerebras infrastructure capacity during the research preview. “Because it runs on specialized low-latency hardware, usage is governed by a separate rate limit that may adjust based on demand during the research preview,” the spokesperson noted. The limits are designed to be “generous,” with OpenAI monitoring usage patterns as it determines how to scale.
The Codex-Spark announcement arrives amid intense competition for AI-powered developer tools. Anthropic’s Claude Cowork product triggered a selloff in traditional software stocks last week as investors considered whether AI assistants might displace conventional enterprise applications. Microsoft, Google, and Amazon continue investing heavily in AI coding capabilities integrated with their respective cloud platforms.
OpenAI’s Codex app has demonstrated rapid adoption since launching ten days ago, with more than one million downloads and weekly active users growing 60 percent week-over-week. More than 325,000 developers now actively use Codex across free and paid tiers. But the fundamental question facing OpenAI — and the broader AI industry — is whether speed improvements like those promised by Codex-Spark translate into meaningful productivity gains or merely create more pleasant experiences without changing outcomes.
Early evidence from AI coding tools suggests that faster responses encourage more iterative experimentation. Whether that experimentation produces better software remains contested among researchers and practitioners alike. What seems clear is that OpenAI views inference latency as a competitive frontier worth substantial investment, even as that investment takes it beyond its traditional Nvidia partnership into untested territory with alternative chip suppliers.
The Cerebras deal is a calculated bet that specialized hardware can unlock use cases that general-purpose GPUs cannot cost-effectively serve. For a company simultaneously battling competitors, managing strained supplier relationships, and weathering internal dissent over its commercial direction, it is also a reminder that in the AI race, standing still is not an option. OpenAI built its reputation by moving fast and breaking conventions. Now it must prove it can move even faster — without breaking itself.
Sony has unveiled its latest in-ear headphones, with the promise of outstanding sound quality and better call quality.
New package aims to bring value and convenience to today’s increasingly fractured content world.
If you’re looking to upgrade to the latest affordable phone from Apple, a new report claims there’s not long to wait, and there’s an unexpected twist.
See the full list of America’s Greatest Historic Innovators and the mark they left on our history. Uncover the legends who helped shape our present and future.