Calendar An icon of a desk calendar. Cancel An icon of a circle with a diagonal line across. Caret An icon of a block arrow pointing to the right. Email An icon of a paper envelope. Facebook An icon of the Facebook "f" mark. Google An icon of the Google "G" mark. Linked In An icon of the Linked In "in" mark. Logout An icon representing logout. Profile An icon that resembles human head and shoulders. Telephone An icon of a traditional telephone receiver. Tick An icon of a tick mark. Is Public An icon of a human eye and eyelashes. Is Not Public An icon of a human eye and eyelashes with a diagonal line through it. Pause Icon A two-lined pause icon for stopping interactions. Quote Mark A opening quote mark. Quote Mark A closing quote mark. Arrow An icon of an arrow. Folder An icon of a paper folder. Breaking An icon of an exclamation mark on a circular background. Camera An icon of a digital camera. Caret An icon of a caret arrow. Clock An icon of a clock face. Close An icon of the an X shape. Close Icon An icon used to represent where to interact to collapse or dismiss a component Comment An icon of a speech bubble. Comments An icon of a speech bubble, denoting user comments. Comments An icon of a speech bubble, denoting user comments. Ellipsis An icon of 3 horizontal dots. Envelope An icon of a paper envelope. Facebook An icon of a facebook f logo. Camera An icon of a digital camera. Home An icon of a house. Instagram An icon of the Instagram logo. LinkedIn An icon of the LinkedIn logo. Magnifying Glass An icon of a magnifying glass. Search Icon A magnifying glass icon that is used to represent the function of searching. Menu An icon of 3 horizontal lines. Hamburger Menu Icon An icon used to represent a collapsed menu. Next An icon of an arrow pointing to the right. Notice An explanation mark centred inside a circle. Previous An icon of an arrow pointing to the left. Rating An icon of a star. Tag An icon of a tag. Twitter An icon of the Twitter logo. Video Camera An icon of a video camera shape. Speech Bubble Icon A icon displaying a speech bubble WhatsApp An icon of the WhatsApp logo. Information An icon of an information logo. Plus A mathematical 'plus' symbol. Duration An icon indicating Time. Success Tick An icon of a green tick. Success Tick Timeout An icon of a greyed out success tick. Loading Spinner An icon of a loading spinner. Facebook Messenger An icon of the facebook messenger app logo. Facebook An icon of a facebook f logo. Facebook Messenger An icon of the Twitter app logo. LinkedIn An icon of the LinkedIn logo. WhatsApp Messenger An icon of the Whatsapp messenger app logo. Email An icon of an mail envelope. Copy link A decentered black square over a white square.

New study finds way to spot AI hallucinations

The rise of artificial intelligence is a cause for concern (PA)
The rise of artificial intelligence is a cause for concern (PA)

Researchers have found a new method which could be used to help spot when generative AI is likely to “hallucinate” – where it invents facts because it does not know the answer to a query – and help prevent such incidents occurring.

A new study by a team from the University of Oxford developed a statistical model which could identify when a question asked of a large language model (LLM) used to power a generative AI chatbot was likely to produce an incorrect answer.

Their research has been published in the journal Nature.

Hallucinations have been identified as a key concern around generative AI models, as the advanced nature of the technology and their conversational ability means they are able to pass up made up information as fact in order to respond to a query.

At a time when more students are turning to generative AI tools to help with research and to complete assignments – tasks many models are being marketed as helpful at – many industry experts and AI scientists are calling for more action to be taken to combat AI hallucinations, in particular when it comes to medical or legal queries.

The researchers at the University of Oxford said their research had found a way of telling the difference between when a model is certain about an answer or just making something up.

Study author Dr Sebastian Farquhar said: “LLMs are highly capable of saying the same thing in many different ways, which can make it difficult to tell when they are certain about an answer and when they are literally just making something up.

“With previous approaches, it wasn’t possible to tell the difference between a model being uncertain about what to say versus being uncertain about how to say it. But our new method overcomes this.”

But Dr Farquhar said there was further work to do on ironing out the errors AI models can make.

“Semantic uncertainty helps with specific reliability problems, but this is only part of the story,” he said.

“If an LLM makes consistent mistakes, this new method won’t catch that. The most dangerous failures of AI come when a system does something bad but is confident and systematic.

“There is still a lot of work to do.”