X (Twitter) Facebook Pinterest LinkedIn E-mail

OpenAI has announced GPT-5.4 as its new benchmark model for professional work, programming, and complex workflows with agents, clearly signaling the company’s direction: less emphasis on simple conversation and more focus on transforming their models into tools capable of executing real tasks with less friction, fewer repetitions, and greater precision.

The rollout is already being deployed across ChatGPT, the API, and Codex. In ChatGPT, it appears as GPT-5.4 Thinking, while GPT-5.4 Pro is reserved for users requiring maximum performance on particularly demanding tasks. The core idea is significant. OpenAI aims for its new model to reason and code, navigate tools, manipulate documents, work with spreadsheets, prepare presentations, and maintain context during long processes. In other words: it’s no longer just about responding well, but about doing useful work from start to finish.

The company states that GPT-5.4 combines the best of recent advances in reasoning, programming, and agent workflows. It also incorporates much of the strengths of GPT-5.3-Codex, their model more focused on coding, transferring those capabilities to a more generalist system. This convergence is a key aspect of the announcement because it helps explain why OpenAI presents it not as just another variant but as the new center of gravity for their ecosystem.

One of the most striking aspects of this launch is OpenAI’s ambition to position GPT-5.4 within the realm of specialized professional work. In GDPval, an evaluation measuring agents’ ability to produce work outputs across 44 occupations spanning 9 major industries, GPT-5.4 achieves an 83.0% success or tie rate, compared to 70.9% for GPT-5.2. This is a notable jump. According to OpenAI, the model matches or exceeds professionals in the sector in a very high percentage of comparisons, reinforcing the narrative that AI is no longer just assisting but beginning to participate in task-oriented, work-structured activities.

Progress is also evident in very specific areas. OpenAI reports that GPT-5.4 has been specially fine-tuned for creating and editing spreadsheets, presentations, and documents. In an internal modeling test similar to tasks performed by junior investment banking analysts, GPT-5.4 achieved an average score of 87.3%, versus 68.4% for GPT-5.2. For presentations, human evaluators preferred the outputs generated by GPT-5.4 68.0% of the time, citing better aesthetics, greater visual variety, and more effective use of image generation capabilities.

For OpenAI, this evolution is not only about quality but also efficiency. GPT-5.4 is presented as a model that consumes fewer tokens to solve problems compared to GPT-5.2, which should lead to lower overall costs for many workflows, even if the price per token increases. In the API, GPT-5.4 costs $2.50 per million input tokens and $15 per million output tokens for requests up to 272,000 tokens, while GPT-5.4 Pro rises to $30 for input and $180 for output per million tokens. Beyond contexts of more than 272,000 tokens, rates increase, highlighting that long-context processing remains a powerful but not inexpensive feature.

Another major advantage of the model is its ability to handle a context window of up to 1 million tokens. This capacity is designed for analyzing entire codebases, large document collections, or long multi-step workflows with checks. It’s more than just a headline figure — practically, developers and teams will be able to address much broader problems within a single interaction, making it particularly useful for agents, audits, document analysis, and enterprise automation.

GPT-5.4 also marks a significant shift in how computers are used. OpenAI describes it as their first general-purpose model with native capabilities to operate hardware and software. This enables agents to interact with applications, websites, and desktop environments via screenshots, keyboard, and mouse—both to perform tasks and to validate or correct them if failures occur. In OSWorld-Verified, one of the benchmarks measuring this behavior, GPT-5.4 achieves a success rate of 75.0%, surpassing not only GPT-5.2 at 47.3% but also human baseline performance at 72.4%.

In programming, the company boasts notable improvements. GPT-5.4 matches or outperforms GPT-5.3-Codex on SWE-Bench Pro, achieving 57.7%, and shows gains over GPT-5.2 on several indicators related to tool use and real task execution. It also excels particularly in complex front-end development, delivering more polished results both visually and functionally. Coinciding with the launch, OpenAI introduced an experimental Codex skill called “Playwright (Interactive),” aimed at visual debugging of web and Electron applications during development.

Another key area is tool search and management. OpenAI states that GPT-5.4 enhances performance in ecosystems with many functions, connectors, or MCP servers, thanks to tool search, a system that avoids loading all tool definitions initially. Instead of flooding the prompt with thousands of unnecessary tokens, the model receives a lightweight list and searches for the appropriate tool only when needed. In a test of 250 tasks involving 36 enabled MCP servers, this approach reduced total token usage by 47% without losing accuracy. This suggests that agents will be faster, cheaper, and more viable in complex enterprise environments.

Reliability also sees progress. OpenAI claims GPT-5.4 is their most accurate model yet. In a set of anonymized prompts where users flagged factual errors, GPT-5.4’s individual statements were 33% less likely to be false compared to GPT-5.2, and complete responses were 18% less likely to contain errors. While hallucinations are not entirely gone, this demonstrates ongoing efforts to address a persistent critique of generative models.

In ChatGPT, one of the most visible new features will be the so-called reasoning preamble. GPT-5.4 Thinking can display an initial plan or approach for tackling complex and lengthy queries, allowing users to correct the course mid-response without restarting the conversation. OpenAI presents this as a way to improve controllability and usefulness in extended tasks. This capability is already available on ChatGPT web and Android, with iOS support coming very soon.

Overall, GPT-5.4 doesn’t seem like just a generational upgrade. What OpenAI has unveiled is a model designed to support a transition: from AI that simply responds to AI that works. Its behavior outside internal tests and benchmarks remains to be seen, but the company’s message is clear. The near future of their products involves agents that better understand, program more effectively, use tools more judiciously, work on real documents, and maintain context through much longer processes. And GPT-5.4 is now the central piece of that strategy.

Frequently Asked Questions

What is GPT-5.4 and why is it significant for developers and businesses?
GPT-5.4 is OpenAI’s new reference model for professional work, coding, and agent-based tool usage. Its importance lies in combining reasoning, code generation, long context, computer interaction, and improved document, presentation, and spreadsheet handling in a single system.

What is the difference between GPT-5.4 Thinking and GPT-5.4 Pro in ChatGPT?
GPT-5.4 Thinking is the version integrated into ChatGPT for Plus, Team, and Pro users, while GPT-5.4 Pro is tailored for those needing maximum performance on highly complex tasks. The Pro version is also available via API for more demanding workloads.

What is the purpose of a 1 million token context window in GPT-5.4?
It allows analyzing large codebases, extensive document collections, or long workflows with multiple steps without fragmenting the information. This is especially useful for automation, document audits, financial analysis, programming, and enterprise agents.

How much does it cost to use GPT-5.4 via the OpenAI API?
For up to 272,000 tokens context, GPT-5.4 costs $2.50 per million input tokens, $0.25 per cached input, and $15 per million output tokens. GPT-5.4 Pro increases to $30 for input and $180 for output per million tokens.

via: News GPT 5.4

X (Twitter) Facebook Pinterest LinkedIn E-mail