Warning about ‘hallucinated’ ChatGPT

A number of recent studies have shown that GPT o3 and o4-mini – the most powerful models in OpenAI’s product portfolio – are fabricating false information more than their predecessors.

Photo: Fireflies

Just two days after announcing GPT-4.1, OpenAI officially launched not one but two new models, named o3 and o4-mini. Both models demonstrate superior inference capabilities with many powerful improvements.

However, according to TechCrunch, this pair of new models still suffer from “hallucination” or fabricating information. In fact, they suffer from “hallucination” more than some of OpenAI’s older models.

According to IBM, hallucinations are when a large language model (LLM) — typically a chatbot or computer vision tool — receives data samples that do not exist or are unrecognizable to humans, resulting in meaningless or misleading results.

In other words, users often ask AI to produce accurate results, based on the data it has trained on. However, in some cases, the AI’s results are not based on accurate data, resulting in “hallucinated” responses.

In its latest report, OpenAI found that o3 “hallucinated” 33% of the questions on PersonQA, the company’s internal benchmark for measuring the accuracy of a model’s knowledge of humans.

For comparison, that’s double the rate of “hallucinations” for OpenAI’s previous reasoning models, o1 and o3-mini, which were 16% and 14.8%, respectively. Meanwhile, the O4-mini model did even worse on PersonQA, experiencing “hallucinations” for 48% of the test.

More worryingly, the “father of ChatGPT” doesn’t actually know why this happens. Specifically, in the technical report on o3 and o4-mini, OpenAI writes that “further research is needed to understand why the “hallucinations” get worse” as the reasoning models scale.

The o3 and o4-mini performed better in some areas, including programming and math-related tasks. However, because they needed to “make more statements than generalize,” both models suffered from producing “more correct statements, but also more incorrect ones.”

Admin.

By hightechz.net

Leave a Reply

Your email address will not be published. Required fields are marked *