13 December 2023

Hallucinations are a major Problem for CX-focused generative AI. - guest blog by Jonas Berggren

AI Hallucinations

Roberto Mata was a passenger on an Avianca Airlines flight in 2019. He claims that he sustained an injury from a serving kart on the flight and that the injury was caused by the negligence of the airline cabin crew.

Mr Mata engaged a lawyer to present his case to the court. Steven Schwartz, an attorney with Levidow, Levidow & Oberman and licensed in New York for over three decades, handled Mata’s representation.

Mr Schwartz submitted the negligence case to the court and cited six similar cases where the airline had always paid compensation. The only problem was that he had searched for the cases using ChatGPT and ChatGPT hallucinated when creating the search results.

None of the cases really existed. Schwartz himself ended up having to defend his own position as he was then charged with wasting court time by submitting false case information.

What happened?

ChatGPT is very smart. It has been trained on billions of connections between different words - in the case of ChatGPT 4, the word connections number over one trillion. But if ChatGPT is asked a question and it is not 100% sure of the reply, sometimes it will insert ‘almost right’ or ‘likely’ information instead of just answering that it doesn’t know.

It’s a bit like when you ask a child a question, and they start making up an answer that is wrong because giving any answer is preferable to just saying, ‘I don’t know.’

In his 2023 book ‘The AI-Empowered Customer Experience’, Simon Kriss explains how hallucination works. Simon’s book says:

We are asking the AI to write an example crime report for a stolen bicycle but giving it limited information to work with. Let’s see what else it hallucinates to fill in the gaps.

Generative AI (gAI) Prompt: Write me an example of what the narrative of a completed crime report might be for the overnight theft of a bicycle. Write it in a formal criminal justice tone of voice.

Because we gave the prompt very little information it will hallucinate its own. When I did this in ChatGPT, the hallucinations included that the reporting person was a 43-year-old male who was visibly distressed and that the Trek 500 bicycle had an approximate value of $2,500. None of this is true, nor was it supplied to the gAI engine.

What Simon’s book - and the Avianca case - clearly demonstrates is that when a Generative AI tool like ChatGPT is not sure of the next statement, it will often fill in the blanks with the most likely text. Simon didn’t specify the age of the person whose bicycle was stolen, but it may be required information on a crime report so ChatGPT just made some assumptions.

This has important ramifications for the use of gAI in the customer service environment. We must appreciate that ChatGPT (or similar AI models) is not just a dictionary of answers. It is not just a big list of Frequently Asked Questions. It is an AI system that is trained on a large amount of data and each answer is generated in real-time - it is not pre-scripted.

We should not be surprised about this tendency to hallucinate. Humans do it all the time. If you have never heard of the Simons and Chabris selective attention test, then you can try it for yourself on YouTube here. This is proof that when our eyes are focusing on one task, they might be completely ignoring something else. Our eyes are such detailed cameras that if our brain had to process every single ‘pixel’, it would be impossible ever to get anything done.

Musicians will also know that when they are in a flow state, their hands will move to the shape of chords and notes without any active thought about which finger needs to go where. In fact, this is why learners of the guitar and piano struggle - they are still thinking about each individual finger and where it needs to be.

As Large Language Models get bigger, companies such as OpenAI have promised that hallucinations will become much less frequent. However, if you are designing a chatbot using gAI for your customers, then how often will it be acceptable for the bot to hallucinate an answer with information that doesn’t really exist?

Perhaps answering correctly 99% of the time is OK for a bot that is focused on giving recipe advice to retail customers. The worst that could happen is a poor mix of ingredients. But what about regulated industries?

If a government tax agency uses gAI to advise customers on tax regulations, then it is clearly not acceptable for 1% of the advice to be false. The same applies to a doctor using gAI to get a second opinion on a diagnosis. What about a customer using a bot to ask a bank for financial advice?

There are many examples of customer interactions where gAI hallucinations may prevent this from being a possible solution. Humans may still make mistakes, but when they are not sure of an answer, they will check, crosscheck, and ask for advice from a supervisor - not just hallucinate an answer.

AI companies may be reducing the prevalence of hallucination, but while it still exists in some generative AI answers, it will remain a problem for some important customer interactions.

A profile image of Jonas Berggren

Written by Jonas Berggren, Head Of Business Development NE

Jonas Berggren joined Transcom in 2020 as Head Of Business Development Northern Europe. Prior to this, Jonas was the co-founder and partner of Feedback Lab by Differ. Earlier in his career, Jonas held the position of CEO at Teleperformance Nordic.

Read more on CX