Transcription errors with NZ accent
J
Jonathan Peters
I'm not sure what ASR model is under the surface with Heidi, though my first impression is it seems to struggle with accurately converting speech with New Zealand accents to text. I have found Nvidia's recently released parakeet models more accurate than competitors for TTS in NZ. Can, say parakeet-tdt-0.6b-v2 be deployed within Heidi?
Tom
Hey Jonathan Peters, thanks for your feedback! I have a few more questions for you:
- Can you provide specific examples or phrases where the transcription errors occur most frequently?
- Have you noticed if the transcription accuracy improves or worsens in different environments or background noise levels?
- Are there specific words or medical terms that are consistently transcribed incorrectly?
J
Jonathan Peters
Hi Tom. Thanks for getting back to me!
I won't be able to answer question 2 as all of my sessions are in similar quiet environments with minimal background noise. To answer questions 1 and 3 will take a longer assessment period. My latest (and first) Heidi experience led to a transcript that was very difficult to decipher and was full of errors. I'll keep trying to use Heidi for a few months and get back to you if I pick up any clearer patterns.
Further thoughts:
From reading what documentation I can find about Heidi it seems that this ASR model is a bespoke one made specifically for medical contexts (i.e. not deploying one of the more mainstream/frontier ASR's behind the scenes)... Can you confirm that?
In the future could it be possible to select different ASR models within Heidi?
My role as a clinical psychologist means that my conversations with clients typically don't take a medical flavour. An additional point of concern was that the transcription of my last session stopped after around 6500 words (about halfway through the session): I might have hit a token limit in the middle of transcription. From this, an alternative hypothesis to the NZ accent being the point of difficulty is (if I'm right about the medical orientation of the model), the model might not be tuned for the environment of an hour long talking therapy session. Keen to keep this conversation alive!
Cheers
Jonathan
J
Jonathan Peters
Tom I have an answer to question 3 that signals support for the NZ accent difficulty hypothesis: Heidi consistently transcribes "Pumice" as "Thomas"... "Pumice" in NZ English accent sounds a lot like "Thomas" in USA English accent. I've tried that word pair in Parakeet; it has no such issue. Hope that that is helpful!