Discover GPT-4.5 ‘Orion’: OpenAI’s Most Powerful AI Model Yet Unveiled!

Admin

Discover GPT-4.5 ‘Orion’: OpenAI’s Most Powerful AI Model Yet Unveiled!

On Thursday, OpenAI unveiled GPT-4.5, also known as Orion. This new AI model is OpenAI’s most extensive yet, built with unprecedented computing power and data.

However, OpenAI’s whitepaper mentions that GPT-4.5 is not considered a frontier model.

Starting Thursday, subscribers to ChatGPT Pro, which costs $200 per month, will find GPT-4.5 available in ChatGPT as part of a research preview. Developers using OpenAI’s paid API tiers can also access GPT-4.5 today. Other users, like those on ChatGPT Plus and ChatGPT Team, will get it next week, as confirmed by an OpenAI spokesperson to TechCrunch.

The tech world has eagerly awaited Orion, viewed as a test for traditional AI training approaches. OpenAI used a similar method for training GPT-4.5 as it utilized for previous models, focusing on increasing computing power and data during a phase called unsupervised learning.

In previous generations, scaling up has led to significant improvements in areas like math, writing, and coding. OpenAI notes that GPT-4.5 has a “deeper world knowledge” and “higher emotional intelligence.” However, there’s evidence that the advantages of scaling may be diminishing. GPT-4.5 has not outperformed newer reasoning models from DeepSeek, Anthropic, or even OpenAI’s own recent models.

OpenAI acknowledges that running GPT-4.5 is costly, leading them to reconsider long-term availability in their API. Developers will pay $75 for every million input tokens (about 750,000 words) and $150 for each million output tokens. In comparison, GPT-4o costs only $2.50 per million input tokens and $10 per million output tokens.

In a blog post, OpenAI stated, “We’re sharing GPT-4.5 as a research preview to better understand its strengths and limitations.” They expressed eagerness to discover how users might apply it in unexpected ways.

Mixed Performance

OpenAI emphasizes that GPT-4.5 isn’t a direct replacement for GPT-4o, the main model powering most APIs and ChatGPT. While GPT-4.5 features some new tools like file and image uploads and the canvas tool, it doesn’t currently support ChatGPT’s advanced two-way voice mode.

On the positive side, GPT-4.5 outshines GPT-4o and many other models. For instance, on OpenAI’s SimpleQA benchmark—testing straightforward factual questions—GPT-4.5 shows better accuracy than GPT-4o and earlier reasoning models, o1 and o3-mini. OpenAI claims that GPT-4.5 tends to ‘hallucinate’ less frequently, meaning it’s less likely to generate false information.

Interestingly, OpenAI did not include one of its top reasoning models, deep research, in the SimpleQA performance list. An OpenAI representative mentioned that its performance hasn’t been reported publicly on this benchmark, suggesting it’s not a relevant comparison. Notably, the Perplexity AI’s Deep Research model beats GPT-4.5 in factual accuracy tests.

ywAAAAAAQABAAACAUwAOw==
SimpleQA benchmarks (Credit: OpenAI)

On coding problems, the SWE-Bench Verified benchmark shows GPT-4.5 performs similarly to GPT-4o and o3-mini, although it still lags behind OpenAI’s deep research and Claude 3.7 Sonnet models. However, on the SWE-Lancer benchmark, which assesses the creation of complete software features, GPT-4.5 impressively outstrips GPT-4o and o3-mini but doesn’t match deep research.

ywAAAAAAQABAAACAUwAOw==
OpenAI’s Swe-Bench verified benchmark (credit: OpenAI)
ywAAAAAAQABAAACAUwAOw==
OpenAI’s SWe-Lancer Diamond benchmark (Credit: OpenAI)

GPT-4.5 doesn’t quite match the top reasoning models like o3-mini, DeepSeek’s R1, and Claude 3.7 Sonnet on challenging academic benchmarks such as AIME and GPQA. However, it performs similarly to non-reasoning models on these tests, suggesting strong capabilities in math and science.

OpenAI claims GPT-4.5 excels in ways that benchmarks may not fully measure, particularly in understanding human emotions and intents. It reportedly has a warmer tone and performs creatively well in tasks like writing and design.

In one casual test, OpenAI asked GPT-4.5 and two other models to generate an SVG unicorn. Only GPT-4.5 managed to produce something resembling a unicorn.

ywAAAAAAQABAAACAUwAOw==
Left: GPT-4.5, Middle: GPT-4o, Right: o3-mini (credit: OpenAI)

In another example, when asked about feelings after failing a test, GPT-4.5 gave a response that was the most empathetic, while the others provided helpful suggestions.

“We’re looking forward to understanding more about GPT-4.5’s strengths and weaknesses,” OpenAI said in its blog. They acknowledged that academic benchmarks don’t always reflect practical usefulness.

ywAAAAAAQABAAACAUwAOw==
GPT-4.5’s emotional intelligence in action (credit: Open AI)

Scaling Laws Challenged

OpenAI states that GPT-4.5 is pushing the limits of unsupervised learning. Yet, its limitations also hint at the idea that the existing rules of scaling might not hold up forever.

Earlier, OpenAI co-founder Ilya Sutskever remarked that we may have reached “peak data” and hinted that traditional pre-training might be coming to an end. His viewpoint reflects concerns shared by many in the AI community.

In light of this, the industry is moving toward reasoning models—these models take longer but generally deliver more consistent results. By investing more time and resources, AI labs believe they can enhance performance significantly.

OpenAI aims to eventually merge its GPT models with reasoning models, starting with GPT-5 expected later this year. Although GPT-4.5 faced delays and high training costs, it seems to be a step toward even stronger models in the future.



Source link

ChatGPT,OpenAI