The latest iPhone update is a first-of-a-kind. It’s a Background Security Improvements update which is quickly installed — or removed. Should you install it?
Generative AI, intelligent agents, and frontier models are giving rise to a new category of software that I believe will redefine how work gets done. For decades, we have built systems of record and systems of engagement; now, we are entering the era of systems of work.
Private AI models will outperform public AI models. Learn why enterprise AI, private LLMs, and proprietary data will drive most AI revenue.
Écoute Audio’s TH1 wireless headphones use of a genuine triode vacuum tube in the preamplifier stage, making them unique and something of a curiosity.
This article explains why project scoping, architecture, testing, and human oversight remain essential, even as AI changes how software gets built.
Enterprises that have been juggling separate models for reasoning, multimodal tasks, and agentic coding may be able to simplify their stack: Mistral’s new Small 4 brings all three into a single open-source model, with adjustable reasoning levels under the hood.
Small 4 enters a crowded field of small models — including Qwen and Claude Haiku — that are competing on inference cost and benchmark performance. Mistral’s pitch: shorter outputs that translate to lower latency and cheaper tokens.
Mistral Small 4 updates Mistral Small 3.2, which came out in June 2025, and is available under an Apache 2.0 license. “With Small 4, users no longer need to choose between a fast instruct model, a powerful reasoning engine, or a multimodal assistant: one model now delivers all three, with configurable reasoning effort and best-in-class efficiency,” Mistral said in a blog post.
The company said that despite its smaller size — Mistral Small 4 has 119 billion total parameters with only 6 billion active parameters per token — the model combines the capabilities of all Mistral’s models. It has the reasoning capabilities of Magistral, the multimodal understanding of Pixtral, and the agentic coding performance of Devstral. It also has a 256K context window that the company said works well for long-form conversations and analysis.
Rob May, co-founder and CEO of the small language model marketplace Neurometric, told VentureBeat that Mistral Small 4 stands out for its architectural flexibility. However, it joins a rising number of smaller models that he said risks adding more fragmentation to the market.
“From a technical perspective, yes, it can be competitive against other models,” May said. “The bigger issue is that it has to overcome market confusion. Mistral has to win the mindshare to get a shot at being part of that test set first. Only then can they show the technical capabilities of the model.”
Small models still offer good options for enterprise builders looking to have the same LLM experience at a lower cost.
The model is built on a mixture-of-experts architecture, much like other Mistral models. It features 128 experts with four active each token, which Mistral says enables efficient scaling and specialization.
This allows Mistral Small 4 to respond faster, even to more reasoning-intensive outputs. It can also process and reason about text and images, allowing users to parse documents and graphs.
Mistral said the model features a new parameter it calls reasoning_effort, which would allow users to “dynamically adjust the model’s behavior.” Enterprises would be able to configure Small 4 to deliver fast, lightweight responses in the same style as Mistral Small 3.2, or make it wordier in the vein of Magistral, providing step-by-step reasoning for complex tasks, according to Mistral.
Mistral said Small 4 runs on fewer chips than comparable models, with a recommended setup of four Nvidia HGX H100s or H200s, or two Nvidia DGX B200s.
“Delivering advanced open-source AI models requires broad optimization. Through close collaboration with Nvidia, inference has been optimized for both open source vLLM and SGLang, ensuring efficient, high-throughput serving across deployment scenarios,” Mistral said.
According to Mistral’s benchmarks, Small 4 performs close to the level of Mistral Medium 3.1 and Mistral Large 3, particularly in MMLU Pro.
Mistral said the instruction-following performance makes Small 4 suited for high-volume enterprise tasks such as document understanding.
While competitive with other small models from other companies, Small 4 still performs below other popular open-source models, especially in reasoning-intensive tasks. Qwen 3.5 122B and Qwen 3-next 80B outperform Small 4 on LiveCodeBench, as does Claude Haiku in instruct mode.
Mistral Small 4 was able to beat OpenAI’s GPT-OSS 120B in the LCR.
Mistral argues that Small 4 achieves these scores with “significantly shorter outputs” that translate to lower inference costs and latency than the other models. In instruct mode specifically, Small 4 produces the shortest outputs of any model tested — 2.1K characters vs. 14.2K for Claude Haiku and 23.6K for GPT-OSS 120B. In reasoning mode, outputs are much longer (18.7K), which is expected for that use case.
May said that while model choice depends on an organization’s goals, latency is one of the three pillars they should prioritize. “It depends on your goals and what you are optimizing your architecture to accomplish. Enterprises should prioritize these three pillars: reliability and structured output, latency to intelligence ratio, fine-tunability and privacy,” May said.
After a cyberattack wiped 200,000 Stryker devices with no malware involved, CISA is urging US organizations to lock down their endpoint management systems.
In an unusual move, Apple has released a support document advising iPhone users to update or risk web attacks from sophisticated new hacking tools.
Cursor, a San Francisco AI coding platform from startup Anysphere valued at $29.3 billion, has launched Composer 2, a new in-house coding model now available inside its agentic AI coding environment, and it offers drastically improved benchmarks from its prior in-house model.
It’s also launching and making Composer 2 Fast, a higher-priced but faster variant, the default experience for users.
Here’s the cost breakdown:
Composer 2 Standard: $0.50/$2.50 per 1 million input/output tokens
Composer 2 Fast: at $1.50/$7.50 per 1 million input/output tokens
That’s a big drop from Cursor’s predecessor in-house model, Composer 1.5, from February, which cost $3.50 per million input tokens and $17.50 per million output tokens; Composer 2 is about 86% cheaper on both counts.
Composer 2 Fast is also roughly 57% cheaper than Composer 1.5.
There’s also discounts for “cache-read pricing,” that is, sending some of the same tokens in a prompt to the model again, of $0.20 per million tokens for Composer 2 and $0.35 per million for Composer 2 Fast, versus $0.35 per million for Composer 1.5.
It also matters that this appears to be a Cursor-native release, not a broadly distributed standalone model. In the company’s announcement and model documentation, Composer 2 is described as available in Cursor, tuned for Cursor’s agent workflow and integrated with the product’s tool stack.
The materials provided do not indicate separate availability through external model platforms or as a general-purpose API outside the Cursor environment.
The deeper technical claim in this release is not merely that Composer 2 scores higher than Composer 1.5. It is that Cursor says the model is better suited to long-horizon agentic coding.
In its blog, Cursor says the quality gains come from its first continued pretraining run, which gave it a stronger base for scaled reinforcement learning. From there, the company says it trained Composer 2 on long-horizon coding tasks and that the model can solve problems requiring hundreds of actions.
That framing is important because it addresses one of the biggest unresolved issues in coding AI. Many models are good at isolated code generation. Far fewer remain reliable across a longer workflow that includes reading a repository, deciding what to change, editing multiple files, running commands, interpreting failures and continuing toward a goal.
Cursor’s documentation reinforces that this is the use case it cares about. It describes Composer 2 as an agentic model with a 200,000-token context window, tuned for tool use, file edits and terminal operations inside Cursor.
It also notes training techniques such as self-summarization for long-running tasks. For developers already using Cursor as their main environment, that tighter tuning may matter more than a generic leaderboard claim.
Cursor’s published results show a clear improvement over prior Composer models. The company lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual.
That compares with Composer 1.5 at 44.2, 47.9 and 65.9, and Composer 1 at 38.0, 40.0 and 56.9.
The release is more measured than some model launches because Cursor is not claiming universal leadership.
On Terminal-Bench 2.0, which measures how well an AI agent performs tasks in command line terminal-style interfaces, GPT-5.4 still leads at 75.1, while Composer 2 scores 61.7, ahead of Opus 4.6 at 58.0, Opus 4.5 at 52.1 and Composer 1.5 at 47.9.
That makes Cursor’s pitch more pragmatic and arguably more useful for buyers. The company is not saying Composer 2 is the single best model at everything. It is saying the model has moved into a more competitive quality tier while offering more attractive economics and stronger integration with the product developers are already using.
Cursor also included a performance-versus-cost chart on its CursorBench benchmarking suite that appears designed to make a Pareto-style argument for Composer 2.
In that graphic, Composer 2 sits at a stronger cost-to-performance point than Composer 1.5 and compares favorably with higher-cost GPT-5.4 and Opus 4.6 settings shown by Cursor. The company’s message is not simply that Composer 2 scores higher than its predecessor, but that it may offer a more efficient cost-to-intelligence tradeoff for everyday coding work inside Cursor.
For readers deciding whether to use Composer 2, the most important question may not be benchmark performance alone. It may be whether they want a model optimized for Cursor’s own product experience.
That can be a strength. According to the documentation, Composer 2 can access Cursor’s agent tool stack, including semantic code search, file and folder search, file reads, file edits, shell commands, browser control and web access.
That kind of integration can be more valuable than raw model quality if the goal is to complete real software tasks rather than produce impressive one-shot answers.
But it also narrows the addressable audience. Teams looking for a model they can deploy broadly across multiple external tools and platforms should recognize that Cursor is presenting Composer 2 as a model for Cursor users, not as a generally available standalone foundation model.
The significance of Composer 2 is not that Cursor has suddenly taken the top spot on every coding benchmark. It has not. The more important point is that Cursor is making an operational argument: its model is getting better, its pricing is low enough to encourage broader use, and its faster tier is responsive enough that the company is comfortable making it the default despite the higher cost.
That combination could resonate with engineering teams that increasingly care less about abstract model prestige and more about whether an assistant can stay useful across long coding sessions without becoming prohibitively expensive.
Cursor’s broader pricing structure helps frame the competitive pressure around this launch. On its current pricing page, Cursor offers a free Hobby tier, a Pro plan at $20 per month, Pro+ at $60 per month, and Ultra at $200 per month for individual users, with higher tiers offering more usage across models from OpenAI, Anthropic and Google.
On the business side, Teams costs $40 per user per month, while Enterprise is custom-priced and adds pooled usage, centralized billing, usage analytics, privacy controls, SSO, audit logs and granular admin controls. In other words, Cursor is not just charging for access to a coding model. It is charging for a managed application layer that sits on top of multiple model providers while adding team features, governance and workflow tooling.
That model is increasingly under pressure as first-party AI companies push deeper into coding itself. OpenAI and Anthropic are no longer just selling models through third-party products; they are also shipping their own coding interfaces, agents and evaluation frameworks — such as Codex and Claude Code — raising the question of how much room remains for an intermediary platform.
Commenters on X, while unverified and not necessarily representative of the broader market, have increasingly described moving from Cursor to Anthropic’s Claude Code, especially among power users drawn to terminal-first workflows, longer-running agent behavior and lower perceived overhead.
Some of those posts describe frustration with Cursor’s pricing, context loss or editor-centric experience, while praising Claude Code as a more direct and fully agentic way to work. Even treated cautiously, that kind of social chatter points to the strategic problem Cursor faces: it has to prove that its integrated platform, team controls and now its own in-house models add enough value to justify sitting between developers and the model makers’ increasingly capable coding products.
That makes Composer 2 strategically important for Cursor.
By offering a much cheaper in-house model than Composer 1.5, tuning it tightly to Cursor’s own tool stack and making a faster version the default, the company is trying to show that it provides more than a wrapper around outside systems.
The challenge is that as first-party coding products improve, developers and enterprise buyers may increasingly ask whether they want a separate AI coding platform at all, or whether the model makers’ own tools are becoming sufficient on their own.
Employees are already using AI tools you haven’t approved. Here’s why blocking them doesn’t work—and what security teams should focus on instead.