Researchers from the Center for AI Safety and Scale AI have introduced a challenging test called “Humanity’s Last Exam.” This exam aims to assess whether today’s advanced AI systems are close to achieving human-level knowledge in various fields.
Launched in January 2025, the test features 2,500 questions spanning over 100 subjects, crafted with input from more than 1,000 experts from 500 institutions worldwide. The questions are designed to be difficult for AI to answer, requiring deep understanding rather than quick web searches.
When they first tested AI models like OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro, the results were underwhelming. OpenAI’s o1 model scored only 8.3%. Despite this low score, researchers believe that advancements in AI could enable models to reach over 50% accuracy on this exam by late 2025. A year later, the best score achieved was 48.4% by Google’s Gemini 3 Deep Think, while human experts typically score around 90% in their fields.
The questions in “Humanity’s Last Exam” have been rigorously selected. Over 70,000 submissions were made, but only about 13,000 made the cut after being evaluated by experts. The final set contains questions that would challenge even PhD students. For instance, one question asks about Jason’s great-grandfather from Greek mythology, while another involves complex physics concepts like forces acting on a block on a frictionless rail.
The creators of the exam emphasize its depth and coverage. Unlike other tests, such as the Massive Multitask Language Understanding (MMLU) dataset, which mainly focus on coding and math, Humanity’s Last Exam covers a broader range of topics. Even established benchmarks like Francois Chollet’s ARC-AGI struggle with similar challenges.
One point to remember is that performing well on this exam doesn’t mean an AI has achieved true intelligence. As neuroscientist Manuel Schottdorf from the University of Delaware points out, high scores reflect expertise in answering closed questions but don’t indicate autonomous research ability. In other words, mastery of the exam is important but not the end goal. To gain true general intelligence, AI must demonstrate much more than just knowledge recall.
As the AI landscape evolves, tests like Humanity’s Last Exam will play a key role in understanding how close we are to creating machines that think like humans. The conversation around AI’s progress continues to grow, with social media buzzing about these developments. People are both excited and cautious, reflecting on what it means for the future.
This ongoing exploration of AI capabilities will likely shape our approach to technology in various fields, from education to healthcare. As researchers push forward, the quest for true intelligence remains, inviting both curiosity and scrutiny from experts and enthusiasts alike.

