Imagine asking a highly intelligent system a simple question, like an author’s PhD dissertation title or their birthday, only for it to confidently present three different, yet entirely incorrect, answers. It’s a common challenge faced by today’s advanced AI models, known as hallucinations.
Though the goal is to make AI systems more useful and reliable, hallucinations remain a stubborn problem. A hallucination is when a language model confidently generates an answer that isn’t true, or provides “plausible but false statements”. Even advanced models like GPT-5, while showing significant improvement in reasoning, still exhibit these confident errors.
So, why do these sophisticated systems make things up, and what can be done about it? OpenAI has clarified these questions through a research, explaining why hallucinations occur and how its models evolve by tackling them.
What Exactly Are AI Hallucinations?
Hallucinations are not merely errors; they are statements that sound plausible but are factually incorrect. They can manifest in surprising ways, even for seemingly straightforward queries.
For instance, when a widely used chatbot was asked for the PhD dissertation title of Adam Tauman Kalai, an author of a recent OpenAI paper, it confidently produced three different, incorrect titles. When asked for his birthday, it similarly gave three wrong dates. These examples highlight how models can generate detailed, yet entirely false, information with an air of authority.
The Root Cause: “Teaching to the Test”
One of the primary reasons hallucinations persist is rooted in current evaluation methods, which set the wrong incentives. These evaluations often measure model performance in a way that encourages guessing rather than an honest acknowledgment of uncertainty.
Think of it like a multiple-choice test: if you don’t know an answer, taking a wild guess offers a chance of being right, whereas leaving it blank guarantees a zero. Similarly, when language models are graded solely on accuracy—the percentage of questions they answer correctly—they are incentivized to guess rather than to say, “I don’t know”.
For example, if a model is asked for someone’s birthday it doesn’t know, guessing “September 10” gives it a 1-in-365 chance of being right. Opting to say “I don’t know” would guarantee zero points. Over numerous test questions, a model that guesses strategically can appear to perform better on leaderboards than a cautious model that admits uncertainty. This approach prioritizes a false dichotomy between “right” and “wrong,” often overlooking the value of humility and the importance of indicating uncertainty.
A comparison of the gpt-5-thinking-mini and OpenAI o4-mini models on the SimpleQA evaluation illustrates this point:
| Metric | gpt-5-thinking-mini | OpenAI o4-mini |
|---|---|---|
| Abstention rate | 52% | 1% |
| Accuracy rate (higher is better) | 22% | 24% |
| Error rate (lower is better) | 26% | 75% |
| Total | 100% | 100% |
While OpenAI o4-mini shows slightly higher accuracy, its error rate (i.e., rate of hallucination) is significantly higher. This demonstrates how strategically guessing when uncertain can improve accuracy metrics but drastically increases errors and hallucinations. These accuracy-only scoreboards dominate leaderboards, pushing developers to build models that guess rather than acknowledge limits.
The Deeper Dive: How Next-Word Prediction Contributes
Beyond evaluation methods, the fundamental way language models learn also plays a role in hallucinations. Language models initially learn through pretraining, a process where they predict the next word in vast amounts of text. Unlike traditional machine learning, there are no “true/false” labels attached to each statement; the model only sees positive examples of fluent language and must approximate the overall distribution.
It becomes incredibly challenging for a model to distinguish between valid and invalid statements when it hasn’t seen examples labeled as invalid. While consistent patterns, like spelling and parentheses, can be learned reliably and errors in these areas disappear with scale, arbitrary, low-frequency facts—much like a pet’s birthday—cannot be predicted from patterns alone.
These unpredictable facts are a source of hallucinations. Ideally, subsequent stages after pretraining should eliminate these errors, but this isn’t fully successful due to the aforementioned evaluation incentives.
Beyond Accuracy: Unpacking Common Misconceptions
OpenAI’s research clarifies several common misconceptions about hallucinations:
- Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates.
- Finding: Accuracy will never reach 100% because some real-world questions are inherently unanswerable due to unavailable information, limited model abilities, or ambiguities.
- Claim: Hallucinations are inevitable.
- Finding: They are not, because language models can abstain when uncertain, much like a human saying “I don’t know”.
- Claim: Avoiding hallucinations requires a degree of intelligence exclusively achievable with larger models.
- Finding: It can be easier for a small model to know its limits. A small model with no knowledge of Māori, for example, can simply state “I don’t know” when asked a Māori question, while a model with some knowledge might struggle to determine its confidence. Being “calibrated” (knowing when you don’t know) requires less computation than being perfectly accurate.
- Claim: Hallucinations are a mysterious glitch in modern language models.
- Finding: We understand the statistical mechanisms through which hallucinations arise and are even rewarded in current evaluations.
- Claim: To measure hallucinations, we just need a good hallucination evaluation.
- Finding: Good hallucination evaluations already exist. However, they have little impact against hundreds of traditional accuracy-based evaluations that actively penalize humility and reward guessing.
Towards a Solution: Cultivating AI Humility
The path to rectifying hallucinations is clear, though challenging: we need to change the way we evaluate AI.
The solution for hallucinations of AI Models is straightforward: penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty. This isn’t a new idea; standardized tests have long used negative marking for wrong answers or partial credit for blank responses to discourage blind guessing.
It’s not enough to simply add a few new uncertainty-aware tests. Instead, the widely used, accuracy-based evaluations must be updated so their scoring actively discourages guessing. If the main scoreboards continue to reward lucky guesses, models will continue to learn to guess. By fixing these scoreboards, we can broaden the adoption of effective hallucination-reduction techniques.
Key Takeaways
- AI hallucinations are confident but incorrect statements generated by language models.
- Current evaluation methods often incentivize guessing over admitting uncertainty.
- Addressing hallucinations requires penalizing confident errors and rewarding expressions of uncertainty.
- Hallucinations are not inevitable and can be mitigated by cultivating AI humility.
- Accuracy-based evaluations need updates to discourage guessing and promote effective hallucination-reduction techniques.
Join our community by subscribing to our Weekly Newsletter to stay updated on the latest AI updates and technologies, including the tips and how-to guides. (Also, follow us on Instagram (@inner_detail) for more updates in your feed).
(For more such interesting informational, technology and innovation stuffs, keep reading The Inner Detail).







