Critics are quick to argue that the technology isn’t mature enough to be widely implemented into all industries, while the advocates are just as quick to refute. Their stance? Speech recognition is ready – and is now both faster and more precise than people.
At Capturi, one of our top speech recognition specialists, Morten Højfeldt Rasmussen, has worked with the technology since 2004. In this article, we dive into his advanced brain to understand the benefits and disadvantages of weaving speech recognition into our everyday workflows.
The blog post is part 2 of 3 in a blog series.
Blog post 1: The magic of speech recognition: What are the possibilities for businesses?
Benefits of speech recognition
1. Work faster
The main and obvious advantage of using speech recognition is to streamline any work process that involves writing. We’ve all been in a situation where we couldn’t keep up with the conversation because we were trying to jot down notes. In reflection, it would’ve been much more helpful to use speech recognition instead. Morten explains:
“Personally, I think it’s wildly interesting that you can go to a meeting with your mobile without having to consider taking notes. You can focus 100% on your meeting."
In a world where administration and documentation are a major part of virtually all industries, speech recognition is powerful enough to transcribe conversations and facilitate many writing tasks – as most of us can agree we’d be better off spending our time collaborating than being rushed to write something down.
2. Be more intuitive and objective
To many people, it’s more natural to speak your words instead of writing your thoughts down. Conversely, the pace at which our brain comes up with an idea is much faster than the pace we can write it down.
This advantage of speech recognition is that the technology provides a better platform for employees who find it easier to collaborate without the cumbersomeness of writing something down to remember for later. Speech recognition takes care of that for you. In addition, because it’s a machine, speech recognition is 100% objective. Morten says:
“The speech recognizer does not have a specific agenda and doesn’t cherry-pick specific parts of a conversation. As a machine, it’s impartial, consistent and punctual. It always outputs something."
The great benefit of using speech recognition is avoiding human error. We are not perfect and our tendency (unconsciously or consciously) is to be subjective – and our business can sometimes suffer because of it.
3. Save time
We live an increasingly busy life, why wouldn’t you want to consider any solution that helps save time? The third advantage of speech recognition is just that – time optimization. A Stanford experiment from 2016 shows that it’s actually three times faster (and more accurate) to use speech recognition than typing. As Capturi uses speech recognition to transform speech into text, it also utilizes the technology’s unique ability to intelligently organize information. Morten explains how this is possible:
“Each word is tagged with a start and end point, divided into intervals and tagged with a probability. For each time interval, all alternatives are considered. If I say 'one', for example, have I said one or won? We save and test all the different alternatives in the transcription, and, in that way, we can still search all the variables”
When you don’t have to take notes or the time to search through those notes, you free up time for creative thinking, personal advancement and other opportunities in your career. You can simply record and, if necessary, search and relisten to topics from your meetings.
Disadvantages of speech recognition
1. Sensitivity to environment
One of the challenges of speech recognition is sound sensitivity. Morten explained that it is especially challenging with open office concepts, where there are often more noises. Of course, sound affects the recording, so it requires the operator to think about limiting sound interference – closing windows, doors or finding a quieter corner in the building.
2. Learning curve
Another disadvantage is that it may be awkward to record and listen to yourself talking. Just as it may be strange for some to see themselves on video or hear themselves on a voicemail, it can be weird for you and your coworkers to hear sequences from meeting conversations.
As with all other new tech, it will require a bit of a learning curve to use it successfully. However, once you become a master of speech recognition, you’ll improve your quality of work life immensely.
In general, hearing your voice can make you become more self-aware and possibly even limit yourself. However, from an efficiency perspective, this discipline can be seen as a more positive consequence that nudges you to be more professional in what you choose to say in meetings – you might even save the "what did you do last weekend?" conversation for the lunch break.
3. Accuracy is not always 100%
The final challenge of speech recognition is its inability to exactly match a person’s accuracy. It would take considerably longer time to get a person to sit and transcribe conversations, but ultimate, that would be more precise.
Morten explained that you can train the technology to adapt to the acoustic sounds of all different kinds of people, “but translating the spoken word into written text will never be very "pretty" or productive because of repetitions of words and non-grammatical sentences.”
Furthermore, the technology becomes even more impaired when dialects come into play:
“What we do now is that we’re taking speakers from all over the country – both women and men and different ages – to cover dialects as wide as possible without taking too many people with very powerful dialects (it would cause too much error when recognizing speech). You can’t just take standard English, because there would be a lot of people who would fall off the spectrum. That would be like learning a whole new language"
Although a speech recognizer can be adapted and trained, as Morten explained, people will always be better at capturing the small variations of dialects and linguistic features. In addition, we will also always be better at comparing and summarizing these variations than a machine.
However, technology is constantly improving. Over the last five years, there has been a ton of speech recognition advancements. Big names like Google, Microsoft, IBM and other industry leaders are undoubtedly pushing the technology. You can bet that we will be following along and reporting on the exciting future developments!
Join us next time, where we will publish the third post in our blog series and dive into how you can streamline your meetings. Read the first blog post here.