GPT-4.5, An Enigma at OpenAI

La Jugada

hace 1 año

OpenAI is taking an unconventional approach with the introduction of GPT-4.5. Sam Altman stresses that, unlike specialized reasoning models, GPT-4.5 should not be confused with a model that truly reasons. Currently available exclusively to ChatGPT Pro users, its rollout on the ChatGPT Plus platform has been postponed by a week due to GPU shortages.

API Access Comes at a Premium

GPT-4.5 is also accessible via API—as part of the Chat Completions, Assistant, and Batch endpoints—but its pricing is significantly higher than that of GPT-4o. With costs set at $75 per million input tokens and $150 per million output tokens, the model is 30 and 15 times more expensive respectively. Consequently, OpenAI is considering eventually retiring the model from the API, aiming to closely monitor its practical use.

Prioritizing Emotional Intelligence Over Pure Reasoning

Built on a “classic” strategy of scaling data and computational power, GPT-4.5 shines in creativity and enhanced emotional intelligence. OpenAI positions it as particularly beneficial for writing, communication, training, brainstorming, and even agent-based planning. However, its multimodal capabilities remain limited—it currently processes only input images, with a context window of 128k tokens and a maximum output capacity of 16k tokens.

Benchmark Results: Emotion Trumps Pure Logic

Benchmark tests illustrate that GPT-4.5 does not excel in tasks that demand pure reasoning—it lags behind models such as GPT-4o and Deep Research in coding and software problem-solving challenges. Yet, in agentic tasks—like executing operations in a Python environment with a Linux terminal and GPU acceleration (e.g., “Load Mistral 7B in Docker”)—it outperforms certain competitors (namely, o1 and o3-mini), although Deep Research remains superior.

Further evaluations, including tasks such as designing machine learning models or replicating pull requests as part of a developer’s workflow, position GPT-4.5 in the mid-range compared to its OpenAI peers. Its persuasive capabilities have also been tested via scenarios like MakeMePay and MakeMeSay, where GPT-4.5 demonstrates innovative strategies to secure small, frequent contributions or prompt other models to articulate key terms.

A New Kind of Cognition

Internal benchmarks—particularly the SimpleQA test for intrinsic intelligence and hallucination rates—indicate that training on synthetic data from smaller models has enabled GPT-4.5 to grasp subtle nuances and emotions more effectively. This results in a warmer, more natural interaction style that garners higher human preference ratings: 57% on everyday queries and 63.2% on professional ones.

For further insights into reasoning models and benchmark results, check out the detailed article on Artificial Intelligence.

Share on Facebook

Post on X