The task of transcribing research interviews takes a frustratingly long time. Not only do you need to do the laborious work of typing out every work that is uttered on your audio recording/s, you also need to distinguish between different voices (which can get tricky with focus groups) and continually rewind the audio to catch any words that are jumbled, mispronounced, or hard to hear.
This means that even if you’re a lightning-quick typist, your transcription process can still be slow.
How long does it take? A good rule of thumb is to allow about 5-6 hours of transcription per 1 hour of audio. Professional transcribers might be able to reduce that to 2-3 hours;* but this is still a huge time investment, especially if you have a lot of audio to transcribe.
A reliable automated transcription tool would be the holy grail of research software. Imagine if you could skip those long hours of tedious typing and get perfect transcripts at the click of a button!
Alas, although software developers have done amazing things to simplify many research tasks, they haven’t yet created a perfect option for transcription. Current voice-to-text technology struggles to interpret a wide range of accents, pronunciations, and vocabularies; and human participants (and interviewers) will often speak simultaneously or inaudibly, making it difficult for any software to interpret what is being said.
Bottom line: human ears are still the gold standard transcription tool.
But this is changing. Automated transcription tools do exist, even if they aren’t perfect. They won’t fully do the job of creating accurate transcripts for you, but they can give you a headstart.
When you use an automated transcription service, you sign up either for a subscription, or for an account with a set price per audio minute. You submit your audio files to the service provider, and they use their technology (typically AI speech recognition and natural language processing) to approximate a written version of what is said in the audio recording. They then return that written copy to you.
Otter (US$12.99/month for up to 100 audio hours) and Scribie (US$0.10 per audio minute, or US$6 per audio hour) are two options that work reasonably well compared to some competitors. NVivo also has an automated transcription service, though it has mixed reviews.
The quality of these automated transcripts can vary a lot between service providers and between recordings. Many claim an accuracy rate of around 80% (or as low as 60% for poor quality audio). The team at Academic Consulting Ltd** (who provide data analysis training for AUT staff & students) recently reviewed a selection of automated transcription providers and found some common issues:
- an inability to identify when speakers started and stopped
- an inability to represent nonverbal cues or emotions
- inconsistent ability to represent filler words
- poor punctuation
Because no human beings review the automated transcripts, these errors stay in the version sent to you. That means you need to work through and make corrections manually. (However, there are plenty of services that provide full or partial professional transcription to reduce the incidence of errors.)
So: should you pay for automated transcription through one of these services? Our advice is: maybe, but it depends on your circumstances.
If you are a very slow typist, automated services can cut down the amount of time you spend on your transcripts. You will still need to spend a fair amount of time correcting the written documents that the service provides; and you will still need to work to your audio recordings to ensure the fullness and accuracy of the transcripts. But an automated service can reduce the initial workload of getting the bulk of the words onto (digital) paper.
If you have a lot of audio to transcribe, automated transcription can save a lot of time. Those with dozens or hundreds of recorded interviews may find the task of manual transcription too overwhelming; yet the cost of professional transcription too prohibitive. In that circumstance, automated transcription (with manual corrections) can be a good compromise. You may like to pay for just one or two transcripts first to give you an idea of their quality and test out different service providers. Some providers offer a free trial, which is good for this purpose.
If your audio is relatively clear, you may also like to consider automated transcription because the accuracy of the outputs can make it worth the money. Clear audio means better voice-to-text recognition, which means fewer manual corrections. This can improve the value of automated transcription.
And of course, if you have research funds to spare, it may make sense to pay for automated transcription rather than let the money sit. (Though if you have the cash to spend, you might prefer to pay that little bit more for a higher-quality professional transcriber.)
However, a couple of caveats with automated transcription:
- You will still need to spend a lot of time on corrections. Don’t make the mistake of thinking that automated transcription will reduce your time commitment to zero. Inaccurate transcripts make for sloppy (and perhaps even irresponsible) research. It would be horrifying to misquote a participant because of a transcription error.
- Check with your ethics advisor. Automated transcription services need to store and work with your audio files, and in some cases that could be a violation of the data security or confidentiality provisions in your ethics approval. As soon as any third party has access to your files, you will need to know how the files are being stored, how/when they are deleted, and who has access to them; and you will need to ensure that any access is appropriate and approved.
Have you tried an automated transcription service? Share your experience in the comments!
*Professional transcribers aren’t just fast typists; they also increase their efficiency with professional transcription tools such as specialised audio players, audio quality enhancement tools, and even foot pedals for hands-free control of audio recordings. That’s why their speed of 2-3 hours per audio hour is often unachievable for even quick-fingered researchers.
**Special thanks to Andrew Lavery for his quality analysis of automated transcription services, and to Dr Lyn Lavery for contributing her knowledge to this article.
Thanks for the article. It summarized the highlights of using automated services for transcription well. I like to ask if there are any feedback on using Office365 for this purpose?
(Transcribe your recordings: https://support.microsoft.com/en-us/office/transcribe-your-recordings-7fc2efec-245e-45f0-b053-2a97531ecf57)
Regards,
Hi Shabnam, I personally haven’t tried Office365 for transcription purposes – would love to hear from anyone who has! One thing to be careful of with Office365 transcription is the privacy implications. Its transcription service uses Word for the Web, which could violate some ethics approvals depending on how the audio files are accessed and stored.