Anthropic unveils Claude 4 models, set benchmark in AI performance

Published 22/05/2025, 17:50
© Reuters

Investing.com -- Anthropic has unveiled Claude 4, its most advanced generation of AI models to date, signaling a bold step forward in its race against OpenAI, Alphabet Inc Class A (NASDAQ:GOOGL), and xAI. The release features Claude Opus 4 and Claude Sonnet 4, both of which raise the bar for AI reasoning, coding capabilities, and sustained agentic performance.

Claude Opus 4, now alleged as the world’s best coding model, delivers a record-setting 72.5% on SWE-bench and 43.2% on Terminal-bench, outperforming all competitors on long-running and complex agent workflows. In parallel, Sonnet 4 rounds out the offering with faster response times and enhanced alignment, showing significant gains over its predecessor Sonnet 3.7, including a matching 72.7% on SWE-bench.

"Claude Opus 4 excels at solving complex challenges that other models can’t," reported Cognition, a Claude integration partner, highlighting its superior durability during hours-long computational tasks. Rakuten validated its strength with a single-agent refactoring task that ran for seven hours without performance degradation, one of several indicators that Claude is extending the frontier of what AI can handle autonomously.

Beyond raw benchmark performance, both models introduce new capabilities critical to agentic applications. Among them: extended thinking with tool use, memory improvements for local file-based continuity, and a 65% reduction in shortcuts or loophole behaviors compared with Sonnet 3.7, an ongoing concern for AI deployed in sensitive or mission-critical workflows.

Anthropic also formally launched Claude Code, its development workflow assistant, into general availability. The system integrates seamlessly with JetBrains and VS Code, providing background execution with GitHub Actions and new SDK support to power custom AI developer tools. GitHub has already announced plans to base its next-generation Copilot coding agent on Claude Sonnet 4, citing "agentic scenario excellence."

The Claude 4 platform introduces tiered flexibility with two distinct reasoning modes: near-instant responses for faster turnarounds and extended thinking for tasks requiring deeper analysis. Both Opus 4 and Sonnet 4 are accessible via Anthropic’s API, Amazon (NASDAQ:AMZN) Bedrock, and Google’s Vertex (NASDAQ:VRTX) AI, with pricing remaining at $15 and $75 per million tokens for Opus input and output, and $3/$15 for Sonnet.

These advancements give Anthropic a compelling technical edge over its rivals, many of whom are also in pursuit of safe-but-capable multimodal agents. “These models are a large step toward the virtual collaborator—maintaining full context, sustaining focus on longer projects, and driving transformational impact,” Anthropic said in its release.

With a leadership team formed by former OpenAI researchers, Anthropic has consistently emphasized safety alongside capability, a positioning that could resonate with enterprise buyers wary of unchecked model behavior. The latest Claude models, paired with improved agent tooling and memory-rich workflows, are expected to accelerate its adoption in both commercial and developer ecosystems.

Latest comments

Risk Disclosure: Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers
© 2007-2025 - Fusion Media Limited. All Rights Reserved.