Production Expert

View Original

Loudness and Dialog Intelligibility in TV Mixes - What Can We Do About TV Mixes That Are Too Cinematic?

I have been relatively quiet for a while on the issue of loudness and loudness workflows in a broadcast setting. However, since my last articles on loudness, there have been a number of issues that have been exercising the little grey cells as Agatha Christie's Poirot would say, which I plan to explore further in this article. If you haven't read my previous articles on the subject then do check the More On Loudness section at the bottom of this article especially Are TV Mixes Becoming Too Cinematic?Intelligibility Explained - We Don’t Just Hear With Our Ears and Are TV Mixes Getting Too Big For The Domestic Living Room?

BS1770 - A Brief Recap

Before we get into the nitty, gritty, a bit of a recap on how we measure loudness using the BS1770 standard on which all loudness related delivery specs are based.

In the BS1770 specification, there are 2 key parameters, which are pass/fail criteria - Integrated loudness – the average loudness for the whole programme and maximum true peak level.

However, there are a number of other parameters that are very useful to help us mix loudness compliant content. These are Momentary Loudness, which is the average loudness over the last 400ms, Short Term Loudness, which is the average loudness over the last 3 seconds and is great for level setting and the one I want to take a detailed look at now - Loudness Range which gives a sense of the dynamic range of a mix.

Loudness Range - How It Can Help

Take a look at this table of recommendations for the suggested Loudness Range for different delivery systems, I find this a great help as guidance on where to pitch the loudness range of my mixes. A lot of what I do is radio, in fact long before these loudness workflows came along I would consider what are the key narrative elements in the program I am working on, and make sure that they are clearly audible even in the most challenging listening environments like a car travelling on the motorway where the ambient noise is very high.  I know that other elements that I add to enhance the narrative won’t be heard in high ambient noise environments, but those in a quieter space, will be able to enjoy those details as well, but I know that the key elements will be audible for everyone. This also means the dialog will need to be at a constituent level otherwise a realistic whispered line will drop below the ambient noise in the car and be lost.

LRA Being Added To Delivery Specs And Recommendations

It would appear I am not alone in considering the importance of loudness range. The Digital Production Partnership (DPP) updated their unified UK delivery specs for all UK broadcasters and added this guidance on Loudness Range…

Loudness Range -  This describes the perceptual dynamic range measured over the duration of the programme - Programmes should aim for an LRA of no more than 18LU
Loudness Range of Dialogue - Dialogue must be acquired and mixed so that it is clear and easy to understand - Speech content in factual programmes should aim for an LRA of no more than 6LU. A minimum separation of 4LU between dialogue and background is recommended.

I have heard a number of people point the finger at the loudness delivery specs and say something like, “everything was OK before they were introduced”. But were they? Is it not a coincidence?  I believe that the loudness normalisation delivery specs are not to blame, that around the time the loudness workflows were introduced the production values for TV drama changed and TV drama took on the film production process workflows and although that has had huge benefits in the storytelling, the budgets and the cinematography, when it comes to the sound it is my view that you cannot mix TV drama destined for the small screen, which is likely to watched in small noisy rooms, the same way as a theatrical mix would be mixed.

In the comments in my article from last year Are TV Mixes Getting Too Big For The Domestic Living Room? community member Dan Smith said...

Here in Canada, the CBC and Radio Canada both require that the LRA is less than 10 and some request 8. Also, the program integrated AND the dialogue stem both must be -24LUFS. Lastly, your momentary must not exceed +10LU of your target.
So with a -24LUFS target here, your momentary must remain below -14LUFS. Arguably, the LRA and momentary are the most important as this bakes in the dynamics. All of this works really well to make consistent programming that is not overly dynamic for consumer systems with intelligible dialogue at all times!

I completely agree with the Canadians in specifying the loudness for the dialog stem. Dialog is key and I always prefer to set the dialog to be around target loudness in my loudness planning and then build everything around it. In fact, I recommend in my loudness training resources, if there is time, to do a complete pass working just on the dialog, getting that as close as possible to target loudness whilst NOT turning my mix into a sausage factory with no light or shade.

This is really important and I would like to see amendments to the specs that includes a dialog level as well as the overall programme level, a dialog LRA as well as overall LRA recommendations, and to include in the recommendations that the short term loudness at the junctions between programs or programs and adverts should be very close to target loudness, to help reduce any jumps in loudness across the content junctions.

However, LRA Isn't The Magic Bullet

That said, loudness range isn’t the magic bullet, fix this and you fix all our problems. For example, community member Oliver Lucas commented to my article Are TV Mixes Getting Too Big For The Domestic Living Room?

Very good points Mike, thanks. I think mixing in the right environment absolutely makes sense. From my experience the problem with LRA is, that you might have shows with a lot of parts that do not contain music or speech. A colleague recently mixed a drama for TV where this was the case. It was a 30 min episode with about 4-5 minutes of quiet atmospheres and light music. The LRA went off the charts, although there was enough dynamics limiting, especially on the dialog stem. What I am trying to say is - the numbers can be helpful and give a first impression, but more limiting would have destroyed the programme. The mix was just fine for TV, even with an LRA of 19.

Dialog Intelligibility Is Key

I am a firm believer that the dialog is key and needs to be intelligible. When you are telling a story and it doesn't matter whether its drama or documentary, any sound elements that are key to the narrative must be clearly audible. If they are not you quickly lose the plot, quite literally. Now I am not suggesting that we go back to actors enunciating every word like an elocution lesson, the dialog needs to be delivered in a style that fits the scene or it becomes unbelievable. But this is one part of the problem.

Directors Know What Is Being Said So They Can Always Hear It

I am of the opinion that one of the biggest issues at play here is that everyone involved in the production knows what is being said, they have lived with through pre-production script editing and shooting. This means they probably know the script as well as the actors who have to learn it. What this familiarity with the script means is that they can hear the words even when they are not that intelligible. For example, this happens when the drama is being shot, the director knows what is being said, and even if the sound team ask for a retake it is likely to be received with a hard stare and "I can hear it what's your problem"! When we get to the dub, when the director comes to sign off on a scene, again they know what is being said and so may be asking for the FXs and/or music to be lifted to increase the sense of drama in the scene.

Add to this, from my perspective TV drama is becoming more cinematic and from a sound perspective, which doesn't translate to a domestic situation where neither the playback system or background noise of the room can be controlled unlike a cinema theatre, where there is end-to-end control. We must remember to consider how the content we create is going to be consumed and it what environment.

Mixing In Big Rooms

In my article Are TV Mixes Getting Too Big For The Domestic Living Room? I took an initial look at this issue. The National Association Of Broadcasters Engineering Handbook states that larger rooms with a higher reference SPL will yield wide dynamic mixes where smaller rooms with a lower SPL will yield more constrained mixes. So I asked...

Could the size of the mixing space be having an impact on TV mixes?  if mixing TV content in larger theatrical spaces is producing mixes that are too dynamic, what size of room should we be mixing in?

10 months on, having talked to a number of people about these issues, I acknowledge that one of the key requirements for working in a large room is that there are a growing number of people who need to be in the room to sign off on the mix. Whether we like it or not, in a lot of cases facilities are not going to change mix room sizes, so we have to work out how to deal with it. That said, I am also aware of high profile production facilities that choose to keep the room size smaller for broadcast mixes and to mix at 79dB SPL.

But back to the big rooms…

What About The X Curve?

Another factor we haven’t explored is theatrical rooms have theatrical speakers systems with horn tweeters designed to project to the back of the theatre and then there are the issues of the theatrical (X-Curve) and broadcast (Linear). We need to understand that in a big room designed for theatrical mixing, the size won't be the only thing affecting your mix!

I haven’t had an opportunity to do any tests personally, but I cannot help wondering what impact the theatrical speakers and the X-curve might have on intelligibility. My gut feeling is that mixing on a theatrical system with horn speakers will produce a mix that when played on studio monitors or a home-based system will produce a duller mix.

Other Considerations When Mixing In Big Rooms

Perhaps consider having some kind of alternative monitoring system, like a near-field system, with studio monitors and configured in the ITU layout.

The ITU Speaker layout will more closely reflect what the consumer’s system should be like. The differences between a cinema layout and an ITU layout will affect things like panning movements, especially front to back, as there tends to be very little phantom imaging when panned halfway between front and back on a cinema layout, but will be more obvious with an ITU layout.

My preference is that mixes should be monitored through a near-field system but also consider as well as, or instead of, the near-field monitoring having something that mimics a TV speaker, although what that looks like is a tough question and something we will look at later, but this is another very useful way to check for intelligibility.

I acknowledge that the downside of an ITU layout is that it is more of a challenge to fit a lot of people within that speaker configuration, which will be one of the reasons more and more TV is being mixed in bigger spaces to accommodate the larger number of people that need to be in the room, is the customer always right?

That has to be the 6 million dollar question. It is all about people and more importantly the relationship you have with the people making the final decisions. Community member Mike commented on my article Are TV Mixes Getting Too Big For The Domestic Living Room? saying...

I've found it's a tight balance between all elements. It also depends on the type of show you're mixing. Most importantly, it's usually what the people sitting behind me want with regards to the final mix. I can only expertly guide them and make them aware of my limitations when it comes to how I need to deliver to the network. A lot of it is also perception. Help them believe some of these elements are louder without actually going beyond your limitations. It's tough being a psychologist, magician, artist, editor, mixer, and engineer all at the same time.

Realism - Good Or Bad?

I am also sensing that there is a growing trend towards more realism. Surely, mixing a show with too much dynamics and with the music too loud, especially during dialog scenes, is definitely an area where the mixer has to say in how it should play.

I do not believe you can mix TV shows with natural dynamics in the dialog. If you do, it makes the dialog harder to hear and understand. How you choose to reduce the dialog dynamic range is up to you.  It can be with the faders, clip level or compression. But restricting the dynamics is essential. Realism is just not possible and I feel that the realism kick is flawed at so many levels, consider the lighting, the way its shot, how the story is put together, none of it is real so why apply realism to the sound, its bonkers.

I believe that mixing TV drama like it's a feature is daft. For example, at night the consumer will almost certainly have the TV volume much quieter, especially if they have young children and so all that quieter stuff won’t be heard and if that includes quiet dialog the narrative can get lost.

Coming back to the people and your relationship with your client, the simple answer is yes the customer is always right, even when they are wrong. Perhaps I am fortunate, but as a rule, I work with clients who respect my judgement and want and expect my input, that is why they hire me. But I appreciate that isn’t always the case. But we should always advise, as strongly as we feel able, but ultimately the client is always right even when they are wrong.

The Broadcaster And The Consumer

The next stop is the broadcaster, and in the current climate, I am using the term broadcaster loosely to include the likes of Netflix and Amazon Prime into this category as well as broadcaster like the BBC and ABC, because Netflix and the others don’t have any transmitters, they are effectively ‘the one’ delivering to ‘the many’ in the standard definition of a broadcaster.

Consumers Complaints To The Broadcaster

At the start of my loudness training resources, I quote the results of looking at one 40-day period before the loudness workflows came in, over the Christmas holidays in 2010, during which the BBC received a number of complaints relating to TV loudness issues.

  • 61% related to the background sound being too high, which is the issue of intelligibility.
  • 20% were about volume jumps between content, so program trailers or announcements being louder than the content around them. On commercial TV channels, the biggest issue was with adverts and trails being much louder than the programs they sit in.
  • 19% were about the volume range within programs being too high and this often relates to dialog being quiet and indistinct and then suddenly everything gets much louder.

On other networks here in the UK most of which are advert funded, I understand that the issues were similar, although it would be fair to say that the core of the complaints were about adverts and the loudness wars because ‘I want my advert louder than anyone else’s’. 

What We Mix Should Be What The Consumer Hears

One of the underlying concepts with the BS1770 loudness workflows is that we mix the show and nothing happens to it all the way through to the consumer. In reality, it seems that broadcasters still have a black box or three in the way so that what the consumer listens to isn’t what we mixed in the studio.

A Lot of my work in the UK is radio and I am looking forward to the day when the loudness workflows are adopted, because at the moment, because of the FM loudness wars, depending on what network I am mixing for, I end up mixing round the transmission processing so that there is little the processing can do to my mix to mess it up, rather than mix to suit the house style of that network. I am having to work around the processing rather than deliver the most appropriate mix. But I am still working in a peak normalisation world at present as radio here in the UK hasn't moved to loudness normalisation. But in TV production, we have adopted loudness workflows.

No Black Boxes

In case you hadn't guessed it already, I have to declare that I am not a fan of the trend to sell a black box that will solve all the broadcaster's loudness compliance issues. Although the technology is improving all the time the nature of all of them is that they are reactive. What I mean by that is they react to the audio that is fed into them. There is no real look ahead or ‘intelligence’.  At the core of the loudness workflows must be the idea that we as audio professionals are best placed to make the judgement call about how a show should be mixed, taking into consideration the content, the style of the show, the network’s house style, the consumer, and the likely environment the consumer might be watching the show in. It is my view that these black boxes fly in the face of one of the key benefits of the loudness workflows which is 'what we mix is what the consumer hears' nothing added, or nothing taken away.

But this relatively new digital end-to-end delivery chain still seems to have processors and encoders in the chain, and these units are controlled with browsers and menus and it only takes one setting out of place to screw up things. With the introduction of surround there have been a number of issues where shows have been transmitted in the wrong format, with channels in the wrong order, flags not set correctly so the wrong audio has gone out, often resolved by the mix engineer calling the broadcaster to tell them what is wrong.

We need to press for these units to be taken out of the transmission chains once loudness normalisation workflows have been adopted. Maybe it will need the standards bodies to step in but we need these black boxes out.

Coming back to the delivery specs, I am pleased that all the delivery specs are based on one standard BS1770, which makes program delivery so much easier. In the UK, the broadcasters got together and agreed one common UK delivery spec with the Digital Production Partnership and we are seeing the DPP teaming up with the ATSC to produce a similar unified spec for the US and I was hopeful that these unified specs would work across all the networks and genres.

However, as we have shown in this article there are a growing number of variations and recommendations being added to these unified specs. This should be the way that we can handle house styles for channels and also reflect the variety of environments content is being consumed in, is to have a variation of the spec.  For example, a speech-based station like BBC Radio 4 in the UK would have a much tighter loudness range than the classical music station BBC Radio 3. It might be that we need to look at variations of the spec for program genres, so that a drama could have a slightly wider dynamic range than a live music entertainment show like X- Factor.

The Effect Of Downmixing On Dynamic Range And Intelligibility

With the increase in multi-channel content, first 5.1, and more recently the growth of Dolby Atmos content consumed in a domestic environment, broadcasters have had to wrestle with the thorny issue of backwards compatibility. What to do with the consumers still listening in stereo or mono, especially when they are still a significant majority of the audience. The cost of multiple format delivery is complex and expensive. For example in live content in the early days, there were two simultaneous mixes undertaken one for surround and one for stereo, but this was an expense the broadcasters wanted to do away with. As a result, there is a growing trend by broadcasters to only transmit the 5.1 mix and downmix it into stereo, either at the transmission centre or within the consumer's equipment. 

But are there any issues with downmixing, does downmixing affect the loudness and/or intelligibility of the content? There has been some academic research into whether downmixing has a negative impact on intelligibility or not.

Roger Dressler from RWD Consulting, LLC, who is now a consultant on consumer audio entertainment technologies having been Director of Technology Strategy at Dobly suggests...

that the down-mix process may distort the mix in such a way as to reduce intelligibility by altering the subjective balance of the mix.

A downmix has also been shown to reduce the Loudness Range of the 5.1 mix compared to the derived stereo mix. Not unsurprisingly if the surround channels have a lot of content on them the integrated loudness of the downmix can very easily get out of control.

Delivery specs require that the centre channel is reduced by 3dB in the downmix, and while that may be technically correct I wonder if it's sonically the best thing to do, as there's a distinct acoustic difference between the discrete centre channel in 5.1 and the phantom mono centre of a stereo pair of speakers. When you mix on 5.1 do you monitor the stereo downmix, then go back and check and maybe slightly tweak the mix in 5.1? After all, perhaps more than 90% of the viewers will be listening in stereo, which makes it important for us to check the downmix, whether we are required to deliver an LoRo or LtRt stereo mix, or if it is going to be derived in the consumer's equipment.

Other research has shown that a stereo system with a phantom centre channel will compromise the intelligibility. This effect is a result of acoustical crosstalk that occurs when two identical signals arrive at the ear with one slightly delayed compared to the other. The resultant comb filtering effect cancels out some frequencies in the audio. Other research has shown a small but measurable improvement in intelligibility by utilising a central loudspeaker for speech instead of a phantom centre.

Commenting on frequency response problems caused by signal path delays David Clark in the paper “Measuring Audible Effects of Time Delays in Listening Rooms,” presented at the 74th AES states that...

clearly the ‘phantom’ center is an entirely different listening experience than pure left or pure right.

This confirms that having a centre solid actual speaker rather than a phantom centre will have a positive impact on intelligibility because there are no comb filter effects on the centre channel audio. It would seem to me that the outcome of this goes beyond the need for a centre channel, that diverging the dialog or commentary from the centre channel into the front left and right is going to have a negative impact on intelligibility with up to 3 paths with their respective comb filtering all smearing the dialog. I believe this makes a very strong argument for not using divergence at all.

The Consumer's Equipment

We have finally got to the consumer's equipment. This is another area that has come in for a lot of criticism in the press and even at government level.

Tom Harper, who was the Director of War and Peace has said that, while he respects the views of sound recordists, in his opinion and experience, if there are audibility problems... 

they arise at the broadcast and TV reception point, as the soundtrack is played out on reduced bandwidth to two tiny speakers.

As flat screen plasma and LED screens have become the norm, there is less and less room for the loudspeakers in the consumer's TV. Back in the days of CRT TVs, there was a good-sized cabinet that worked well with a reasonably sized speaker to produce reasonable sound, with a good chance it was also forward facing.  

With flat screen TVs there isn’t anywhere to put the speakers, and so they are often tucked away around the back with very small drivers and then wonder why we get intelligibility complaints. But as I have reiterated many times before, the issues of dynamic range and intelligibility in TV sound are not just down to one issue, and to reinforce the point that it isn't just about the speakers in theTV, in our living room we have a system with separate forward facing monitor speakers, including a centre speaker and we still have intelligibility issues.

This is a bigger issue than just the TV speakers. Using a reference TV in the dub is a useful part of the checking process, even assuming the dialog was well delivered and well recorded. You also have to take into consideration the environment that the consumer is watching the TV show.  There is no point in having quiet whispering dialog that is too low in the mix so that when someone is watching it at home with the washing machine going in the background meaning that the ambient noise in the room is so high that the quiet dialog is lost.

And finally, in looking at the consumer’s equipment, I wonder if a change in what consumers are choosing to use could be a part of this as well. I don't know for sure, but I would guess that more and more people are using sound bars, and the like, which have minimal controls and settings rather than fancy surround receivers, which often have features like "night mode" to compress the dynamic range as well as centre channel adjustment, enabling you to bring up the level of the centre channel to help improve dialog intelligibility.

Conclusions

As we bring this article to a close I wanted to bring together some conclusions we can draw from this study into dynamic range and intelligibility in TV mixes.

Starting back at the very beginning, we need to look at how actors are now being trained now to make sure they capture the techniques learnt by old-school actors before it is too late.  

I believe that most of the issues with location sound could be solved before a frame is shot, if people made sensible decisions from the outset, including how the script can impact on intelligibility as we looked at in my article Intelligibility Explained - We Don’t Just Hear With Our Ears.

When it comes to the mix, the area where to do have some input, let's look at the loudness and dynamic range of the dialog stem, let's have dialog just on the centre channel, lets look at the LRA and see if we can come up with some genre-specific recommendations that will work within the unified standards and finally let's take another look at the downmix issue to see if there is a better way.

As for music and SFX levels, it seems lots of directors wants the music as loud as possible for different reasons. Perhaps they have learned the dialogue, which means they can still 'hear' it, and for some, loud music compensates for bad storytelling.

In the current specs, there are no tools to restrict a really demanding and unconvinceable director. To resolve this, we could specify LRA limits, both overall and to the dialog and perhaps add Short Term value limits into broadcasters delivery specs to all content, not just for short-form content as we have now in the EBU R128 spec.

But should we be tightening the delivery specs so that directors cannot do what they want to do? Surely it is for the people at the top of the food chain to make it clear all the way down the pecking order that intelligibility and appropriate dynamic range are important and that doesn't always have to be done with rules and regulations, is there not a place for education? 

As I have said, this is a multi-issue problem and needs multiple solutions. Actors need to articulate clearly in a believable way, sound recordists need to be able to do their job well and directors need to respect their advice. Then when it comes to the edit and mix, directors need to make sure that they don't ask for music and other sounds to obscure the key narrative elements, and support the mixers when they say that it will not work for the consumer.

Directors may need reminding that the aim of any production is to tell a story in sound and vision, rather than making pictures with a bit of sound. Producers, directors and writers should all remember that ultimately we are making a product and if the customer cannot enjoy that product then the product isn’t going to sell and/or get the ratings.

Realism needs to be left at film or art school, what we do is not realistic, everything we do is false, so please let's not try and retain realism for the aural half of TV productions. After all, my Mum only gets to watch this once. Surely if people can’t hear, they will turn off?

See this gallery in the original post