The automatic transcription tool Whisper, launched in September by OpenAI, has raised concerns among healthcare professionals due to its tendency to introduce “hallucinations” in the transcribed texts. Although this open-source technology was designed to transcribe conversations in multiple languages, several engineers, researchers, and clinicians have detected accuracy issues in the generated content, casting doubt on its reliability in critical environments such as hospitals.
With over 30,000 doctors and 40 health systems using Whisper to record meetings and patient consultations, the consequences of these errors can be significant. Despite Whisper being trained on more than 680,000 hours of audio data collected from the internet, recent studies indicate that “hallucinations” in its transcriptions are common. One researcher found distortions in 8 out of 10 transcriptions; another, after analyzing over 100 hours of generated text, detected that half contained inaccurate information; and a developer found issues in nearly all of his 26,000 transcriptions.
What Are Whisper’s Hallucinations?
The so-called “hallucinations” of Whisper range from violent or racist phrases to invented diseases and nonsensical expressions that appear during silences in the recordings. Furthermore, in some transcriptions, the system has inserted typical phrases from YouTube videos, such as “Thank you for watching.” This trend is more common in chatbots, but it is unusual in transcription tools that are expected to faithfully reproduce the original audio.
These issues with Whisper have led some hospitals to reconsider its use in critical contexts. OpenAI itself has thanked researchers for sharing their findings and has announced that it will continue working to improve the model’s accuracy, particularly in reducing hallucinations. Additionally, it has emphasized that the tool should not be used in situations where the decisions are high-risk.
A Call for Caution in the Adoption of AI in Healthcare
This incident with Whisper has highlighted the challenges of applying artificial intelligence in the healthcare sector, especially when precision is essential for patient safety. With advancements in AI, the medical community insists on the need to subject these models to rigorous testing before implementing them in high-stakes environments, such as hospitals.
As OpenAI continues to improve its technology, healthcare professionals, engineers, and research centers are evaluating the impact of Whisper and other AI systems in the field, underscoring the importance of ensuring the reliability of these tools in such a sensitive area as healthcare.