Production Expert

View Original

Speech Intelligibility - The Facts That Affect How We Hear Dialog

The dictionary definition of intelligibility is “the quality or condition of being intelligible - capable of being understood; comprehensible; clear enough to be understood.” It seems that there is a growing issue with people not being able to hear the dialog in content aimed at domestic consumption, whether that is TV programmes or OTT shows. The number of complaints has increased as a result of shows like Jamaica Inn, or Happy Valley and SSGB here in the UK. In this article we are going to take a look at the factors that control our ability to discern what is being said - speech intelligibility.

How Voice Frequencies Affect Speech Intelligibility

Vowel sounds, low frequencies, are much louder than consonants but consonants are extremely important for speech intelligibility….

Why Consonants Are Essential To Speech Intelligibility

Vowels have more energy but it's the consonants that can make or break speech intelligibility. In order to ensure high speech intelligibility you must preserve the consonants…

How Voice Directivity Affects Speech Intelligibility

Speech intelligibility changes depending on where the microphone is placed. Learn how to get the best out of the mic regardless of placement.

The McGurk Effect

When we talk to one another you may be surprised that it’s not just our ears that are paying attention. Our eyes are also picking up visual cues as well that can help fill in the gaps and give us a better sense of what we should be hearing, especially in more challenging environments. 

Our brains combine data from both our sight and hearing senses to produce a fuller picture of what we hear, but for this to be effective the visual and aural information has to match and it may surprise you to learn that our eyes can win over what we hear and evidence for this is what is called the McGurk effect.

Ba or Ga?

It was cognitive psychologist Harry McGurk, who discovered this by accident in 1976, the effect shows up when we see a person’s mouth produce one sound whilst hearing another sound. The most common example used is when we 'see' a person mouthing “ga” whilst we hear the sound “ba”. For some reason, as a result of this mismatch, we hear the word “da” instead. Watch this video and see what you hear. Then, try it again with your eyes closed.

The understanding behind this is that the McGurk effect happens because our brains fail to recognize that two stimuli aren’t coming from the same source and the fact that certain syllable combinations aren’t merged together indicates that there’s some underlying mechanism deciding what types of audiovisual information should or shouldn’t be integrated.

Researchers still don’t completely understand how our brains link disparate events together, but for now, it’s another reminder that we can’t always trust what we see and hear.

Predicting What Will Be Said And The Affect Of Sound Effect Cues

Here in the UK, Salford University has undertaken undertaken some research on intelligibility with a team studying if the addition of relevant sound cues helps with intelligibility for both people with normal hearing. 

They used four experimental conditions, testing for the recognition of a particular keyword like ’sword’. They tested for the predictability of the word as well as adding a relevant sound effect. The tests were undertaken with multi-talker babble, as a masking sound set 2dB below the wanted speech for the people with normal hearing.

For low predictability, they tested the sentence “Mary should think about the sword” where you would not be likely to predict the word 'sword' would become.

For the high predictability test, they used the sentence “He killed the dragon with his sword” where the word 'sword' could be predicted. 

Initial Results

See this chart in the original post

For people with normal hearing the predictability of a word coming improves intelligibility by 73.5%. where as adding the sound cue, in this case, the swish of a sword improved the intelligibility by 69.5%. They found that by adding predictability and a sound cue together further improves the intelligibility by 18.7%.

Acknowledgments

Thanks to the DPA Mic University and Salford University for the help with resources for this article.

Conclusion

I hope this article shows that intelligibility isn't just about relative volume to other sounds, but is affected by the clarity of the constants, which are much quieter than the vowel sounds. Added to that, the constants have a relatively narrow frequency band, that lands around the area that chest-mounted microphones have a dip in their frequency response, whereas the area where the boom mic would be, above the actors, is a very good place to pick up the consonants. With the McGurk Effect, and the research from Salford University, what we see, is that the way the words are arranged is also important to speech intelligibility.

What all of this shows is that it isn’t just about the way we capture the speech that matters, but scriptwriters have a part to play in intelligibility with the order and structure of the words. In conjunction with the director and sound team, scriptwriters also have a part to play in what sounds surround the dialog and poor choices here can have a negative impact on intelligibility.

Watch out for a follow-up article in which we learn that subtitles are becoming the norm for normal hearing people and we will ask whether the loudness delivery specs have actually contributed to the increased use of subtitles by normal hearing people.

See this content in the original post