Concerns Arise Over Transcriptions from OpenAI’s Whisper
Software engineers, developers, and academic researchers are raising serious concerns about the accuracy of transcriptions generated by OpenAI’s Whisper, as reported by the Associated Press.
While generative AI technology has been known to produce inaccurate or misleading information, the issue of “hallucinations” in transcription is particularly troubling. Transcripts are expected to faithfully represent the content of the original audio, but researchers have found instances where Whisper has inserted false information such as racial commentary and fictitious medical treatments.
According to the AP report, a University of Michigan researcher studying public meetings discovered hallucinations in 80% of audio transcriptions generated by Whisper. Similarly, a machine learning engineer analyzed over 100 hours of Whisper transcriptions and identified hallucinations in more than half of them. Additionally, a developer reported finding hallucinations in nearly all of the 26,000 transcriptions created using Whisper.
An OpenAI spokesperson acknowledged the issue and stated that the company is actively working to improve the accuracy of their models, with a specific focus on reducing hallucinations. The spokesperson also emphasized that the usage policies for Whisper explicitly prohibit its use in high-stakes decision-making contexts, such as medical settings.
“We appreciate the feedback and research provided by experts in the field,” the spokesperson added.