Production Expert

View Original

What Happens To My TV Mix After Delivery And Does It Matter?

Have you ever wondered what happens once you have delivered your mix to the broadcaster or OTT publisher? Do you know? Should you care? In this article, Damian Kearns asks Michael Nunan, Senior Manager for Broadcast Audio and Post Production Operations at Bell Media to explain what happens next and why we should care.

Upstream Or Downstream

Something I learned a long time ago is that what happens upstream, affects what winds up downstream. It’s a simple fact of existence. With this in mind, I often talk about audio routing in metaphorical terms, using city bus routes as an analogy.

5.1 Session in Pro Tools

Rivers And Streams

But in considering how to deliver multiple 5.1, stereo and mono stems inside the same template, to please multiple distributors and broadcasters, I’ve begun to look at audio routing more as a collection of streams and rivers. The flow can be managed and shaped but the audio itself must remain what it is, essentially, as separate elements merge or separate to form the files I’m tasked with creating. 

This is my philosophy: The stereo mix I deliver must contain all the same elements as the 5.1 mix upriver from it and in the same proportions, rather than be a parallel construct inside a separate routing structure. There are practical reasons for this:

  • I want the relationship between my audio elements to stay the same when I fold down a 5.1 Mix, M+E, or Mix Minus Narration.

  • The submixes like the Music & Effects or the Mix-Minus Narration files I create should also really be as identical as possible to the master multichannel mix destined for broadcast or theatrical playback. For these submixes, I replicate my multichannel 5.1 mix processing exactly. 

  • I want the 5.1 and stereo levels to read identically or as close to identical as the fold-down process will allow.

  • I want to pass even the most stringent delivery specifications out there, on the first try.

  • More importantly, I want my mix to be my mix, not my fix. I do not want anything coming back on me after I’ve delivered my files. 

Mix Window In a Pro Tools 5.1 Session

We Ask The Questions For You

I’ve never shied away from contacting broadcasters and distributors for clarification regarding their specific technical requirements. It’s been my experience that once that conversation has started, the mixing process has begun because I’m already mentally working out how to deliver what they require.

I’m not suggesting everyone inundate distributors and broadcasters like this all the time. I’ve only ever needed to have these conversations on a handful of occasions in the past 20 years, but it is sometimes the case that the wording of a delivery ‘spec’ can be a little hard to understand. 

To help to unpack and explain the delivery specs I reached out to one of the top engineers in Canada, who is on both the creating and receiving ends of multichannel television audio content, Michael Francis Nunan. 

Michael brings nearly 30 years of television audio and general audio expertise to this conversation about multichannel mix delivery. From large-scale live event mixing to promos, and long-format TV Production and Post Production audio editing and mixing, he has mastered end-to-end audio delivery. 

He’s an expert in surround sound mixing; mentoring and supporting our industry for everything from ‘theatre scale’ Dolby Atmos array mixing to mixing mono radio commercials. As a sound technician, instructor and lecturer, Michael occupies a place at the very top of our profession here in North America. He is currently Senior Manager for Broadcast Audio and Post Production Operations, for Bell Media

With his extensive background in mind, we will explore, amongst other things, the central question of this article. That question is: 

“What are the best practices for 5.1 to stereo fold down for television audio delivery?”

In order to answer this and other questions, we started the conversation at the stage when I’ve finished my mix 5.1/stereo mix of the sound-to-picture workflow, where I’ve outputted my various deliverables and I’ve sent the files onto a broadcaster.

What Happens Next?

I started by asking Michael to tell me what happens at the end of my mix? What processes do my audio files undergo between my final delivery and the ears of the target audience?

See this content in the original post

Michael: If you’ve done everything right, then there’s a pretty straight line between your mix and the audience. But the technical ecosystem, which enables broadcasting, has been built assuming that you haven’t done everything right! 

As a minimum proposition, your mix will pass through a Loudness Processor, which is designed to ensure that the broadcaster cannot emit a signal which doesn’t comply with ATSC RP A/85. So, if you’ve delivered a mix that stays within the -24LKFS (+/- 2) bounds laid out by A/85 (on a rolling 15-second window), then the processor should allow your mix to pass un-molested. But that’s a big if. 

iZotope Insight - Check the Loudness Range

What About The Loudness Range?

The principal problem is Loudness Range. I think we’ll likely elaborate on this later, but it’s important to be able to objectively manage the subjective truth that not all mixes at a given loudness are going to “read” the same way. Dynamic Range is still an important consideration, which isn’t directly addressed merely because we’ve transitioned from a level-based world into one where we’re mixing in a loudness-based regime. 

Any idea of Range notwithstanding, the broadcaster working to ATSC A/85, is obliged to emit a signal that is at -24LKFS (+/- 2dB) or risk being fined by the regulator (CRTC in Canada, FCC in the USA), despite the fact that most content on most networks/channels was NOT created by that network. So the processors in the On-Air chain are an insurance policy of sorts.

As a final possibility: depending on whom you’ve delivered your mix to, and depending on their specific relationship with their BDUs (‘Broadcast Distribution Undertakings’ ie. cable, satellite, and over-the-air (OTT) distributors), it is always possible that one or more BDUs may also be passing the signal through an additional Loudness Processor.

[Editorial note: Michael and Damian are discussing workflows in the US and Canada. Other territories have different specs and different levels of sanctions they can impose on broadcasters and territories. For example, not all regulators can fine a broadcaster for non-compliance. This also means that the use of Loudness Processor units is less prevalent in other territories.]

Quality Control

Damian: Before the mix is allowed to air, it has to pass Quality Control (QC). In the analogue days, this meant a person sitting in a room listening to the show and watching the meters. Many of us probably think this is still the case. Is this still the case?

See this content in the original post

Michael: For many traditional broadcasters, yes. But increasingly, we’re seeing ‘automated’ QC systems - which attempt to exploit AI/ML (Artificial Intelligence/Machine Learning) in order to be able to do faster-than-real-time QC. Regardless of human or machine, being subjected to QC shouldn’t be a worry if you’ve delivered a perfectly ‘on-spec’ programme.

Damian: Machines aren’t people though so are there any major differences between how a person might QC, versus AI?

Michael: The major difference is context. AI/ML systems are effectively in their infancy. Necessarily, this means that many of them need to treat all content as equal. So a 30-second promo gets handled the same way that a 1-hour documentary gets treated. A human operator on the other hand doesn’t need to be instructed that these 2 content forms are likely to create situations that need to be handled differently.

Is Everything Digital Now?

Damian: The broadcaster you work for, Bell Media, has a sizable presence across the Canadian spectrum. Can you tell me approximately how many TV channels the company operates? Are any of these channels still broadcasting analogue signals or is everything digital now?

MIchael: 27 Specialty channels, 4 Pay TV services, and 3 conventional networks (25 local stations). From our perspective, everything is digital. That is, we haven’t emitted an analogue signal from network headquarters for many years.

That said, many smaller cable companies still have some analogue customers and we’re aware that some of our signals continue to be distributed in what is effectively NTSC (National Television System Committee, which set the analogue broadcast standard in North America until the digital broadcast age).

On a day-to-day level, in the audio department, that analogue layer is supported by the fact that we’re always obliged to create and deliver a conventional 2ch stereo mix, in addition to whatever higher-order “parent” mixes we might be creating. By that I mean, multichannel audio.

Damian: So, some of the channels are still being distributed as NTSC analogue. Are those channels broadcasting the stereo downmixes, printed by post audio mixers, or do they derive the stereo (2.0) from the 5.1 master mix? 

Downmix Or Dedicated Stereo Mix?

Michael: Yes. Confused? Those 2 things must be the same thing. A delivered stereo mix absolutely must be created by a simple downmix of a 5.1-channel parent mix. If you do something ‘special’ to your 2mix, (up to and including the very old-fashioned idea of doing a separate stereo mix!) then at best, only a few analogue-only (standard definition) consumers will hear it. But many, many people will listen to you in digital 2 channel stereo, despite the fact that they’re looking at an HD signal (regardless of whether that HD picture is being seen on a television, a computer screen or a mobile device). The only audio that accompanies the HD image is the 5.1 Full mix. So unless a Viewer has a home theatre setup, they’ll be hearing a live metadata-sourced downmix of the 5.1 parent.

See this content in the original post

Damian: That’s a major revelation to any reader who labours under the assumption that their created 2.0 Mix is actually what the end user hears. It turns out, it’s not

Back To Loudness Range

Let’s return to the dynamic range of a mix, often a contentious issue. Loudness range is the difference in LU (Loudness Units) between the quietest average LKFS/LUFS in the programme and the loudest average LKFS/LUFS. 

[Editorial note: LKFS and LUFS are the same thing. LKFS stands for Loudness K-weighted Full Scale and LUFS stands for Loudness Unit Full Scale. LKFS tends to be used in the ATSC standards and LUFS is used in the EBU (European Broadcasting Union) standards like EBU R128, which is the European equivalent of the ATSC A/85 standard used in the US and Canada.]

What’s an optimal loudness range for a mix, in your opinion?

Michael: Great question. This is, I think, the final frontier in the Broadcast Loudness War… and oddly, it’s the one aspect of the viewers’ total experience of loudness that isn’t detailed in our Broadcast Regulations.

More specifically, ATSC RP A/85 (and the Canadian CRTC documents that reference that standard) doesn’t provide recommendations for dynamic or loudness range. Weird. You can readily imagine 2 shows, perhaps an auto-racing program on a sports channel like TSN, and a Hollywood movie playing on another network channel like CTV, both with identical -24LKFS average loudness, but with wildly different Loudness Ranges. And two potentially wickedly different experiences for the viewer.

See this content in the original post

I’m sorry to say I don’t have an easy answer, other than to say that anyone who's engaged in the art and craft of mixing television programs should start paying attention to the LU measurement on your shows, if only to start developing for yourself an innate understanding of what various LUs ‘sound’ like. Because I suspect that eventually, specific LU requirements will start appearing on ‘Tech Spec’ documents from broadcasters.

If you really hold my feet to the fire, I’ll say that I think you need a very good reason to create a show with an LU higher than 8. 

Room Calibration And Mix Room Size

2012 Olympics Post Production

Damian: You and I have spoken casually before about room calibration and how this plays into achieving that target -24 LKFS and a manageable, comfortable loudness range. It’s been my experience that loud mixes are the result of monitors being calibrated too quietly and dynamic mixes are the result of monitors being calibrated too loudly. What’s optimal here?

Michael: Or said another way - ‘Good mixes’ are the product of calibrated monitoring environments, and monitoring levels, which are relative to the way in which people will actually listen to the show. Meanwhile, ‘bad mixes’ were probably really good mixes in the room in which they were mixed. But that room was very likely uncalibrated and it doesn’t matter what the mix sounds like in the mix room - only what it sounds like once you’ve released it into the wild.

See this content in the original post

If you’re a music engineer, and you know that a mastering engineer with all of their tools and skills stands between your mix and the consumer, you can maybe ignore this advice. But if you’re not, then you’d better know - in a provable engineering sort of way - that your room is as accurate as you can make it.

Pro Tools at Banff

Damian: So for TV, what’s my target? Is it room size dependent?

Michael: Admittedly, this is a soap-box topic for me… but I’ll say this: Why mix TV shows in a room that bears no resemblance to the rooms in which people typically watch TV!? Notwithstanding the calibration issue, I think our rooms should be pretty good stand-ins for the average Living Room or Home Theatre.

See this content in the original post

Damian: To me, my monitor levels are typically set to 76 dBSPL C-weighted, slow on nearfields and depending on program materials, somewhere in the range of 76-78 dBSPL C-weighted, slow on my mains, since I occupy a decent sized room. These are reasonable, comfortable levels that allow me to hear everything well, enable me to always hit my dialnorm target and keep my loudness range under 10 LU.

What About The LFE Channel?

A little off-topic but since I have you here, let’s talk about the Low Frequency Effects channel (LFE) in the 5.1 Mix. 

What are your thoughts on shouldering elements into the LFE versus cut LFE effects? How do I get this channel right so the majority of end listeners receive maximum impact and minimal annoyance? I also want to note that anything I send to the LFE has gone through slight compression and a filter that is set to around 80 Hz and completely kills everything above 120 Hz.

Michael: Simple answer: don’t use it. Unless you’re mixing an action movie, or are working to a ‘tech spec’ that insists on LFE content, then there is no defensible reason to use the LFE channel on “normal” television content. I’d listen to an argument about drama (in which case creative intent rules the day), but for factual content, which is what most of us are consumed with each day - I’d say that 5.0 should be the standard operating procedure. 

The LFE is not the Bass Channel. It’s also not the subwoofer channel. Additionally, the LFE channel is frequently dropped altogether in a stereo downmix - so there’s a good chance that a 2 channel listener will never hear that content anyway.

See this content in the original post

Damian: That’s an interesting perspective. I tend to cut directly for it since some distributors and broadcasters do in fact require some LFE presence. But I’ll certainly keep your advice in mind in future. I think the key thing is to not put anything down there anyone will miss in the stereo downmix.

Michael: Agreed. Rules are meant to be broken, but those exceptions tend to underline the rule - not invalidate it.

The Effect Of Lossy Compression

Damian: One of the things that some post mixers may not be aware of in the digital broadcast workflow, is that the audio undergoes some sort of data compression. Dolby AC3 is the psychoacoustically-based compression scheme used here in North America or at least has been until recently. Can you describe what happens to the 48kHz, 24-bit .wav files I deliver and how this happens?

Michael: A comprehensive answer to this is either a lengthy article in its own right or the bulk of a college semester! 

The short answer: File/Data compression is much less interesting other than the fact that all of these delivery codecs are ‘lossy compression.’  And those losses are, in and of themselves, much less important than the idea that the performance of an AC3 system (also EC3/DD+, AC4, etc) is predicated on audio metadata which governs the behaviour of downstream equipment (ie. the Set-Top Box or AVR in the viewer’s home).

Understand Metadata

This stuff is not just a geeky detail. It’s everything! 

  • If you don’t understand metadata (what it is, and how it’s going to impact or influence an experience of your mix)...

  • and IF you’ve never had anything but satisfaction while listening to your mixes on Air…

  • then you’re incredibly lucky and should buy lottery tickets each week!

Damian: What kinds of metadata are we talking about? 

Michael: Everything in the Dolby universe. In a perfect world, every broadcaster would support fully dynamic or agile metadata - in which case the author of the mix could (and should!) also be the author of the metadata which governs its receipt. Alas, since that’s not the case for almost every conventional broadcaster - it remains that as mixers we need to know what metadata the broadcaster will apply to our mix, and then our responsibility is to be able to vet the effect of that metadata on our creations… and adapt them accordingly.

Damian: I can see we’re going to need another article to delve into metadata as a subject in and of itself. 

What About Tone Files?

Moving on, I have to ask this: What is the point of tones at the top of files?

Michael: Increasingly, test tones printed on any digital format are really only useful as a kind of litmus test for the discipline and attention, which created the delivery. 

Should ‘Shouldering’ Be Used?

Damian: Just delving into mix strategy a little here since I know you’re not only management and a ‘tech guru,’ you’re still actively recording and mixing for TV. What does your mix philosophy have to say about ‘shouldering’ elements in a mix? Shouldering is the practice of panning elements so that channels supply the same element to different speakers at the same time. We’ll exclude the LFE  since we’ve already spoken about it. 

Michael: I’m not a huge fan, but I’m also painfully aware that sometimes you really don’t have a choice, in the sense that many times, programs would be desperately dull center-channel-only affairs without it. 

The truth is we’re often faced with scenes that are simply not dense enough to provide interesting complexity in the sound field. That said - pulling the panner ‘off the wall’ or adjusting Centre Percentage or Divergence is an easy, but lazy, way to get around the problem. I’d rather do something to manufacture some de-correlation between the original signal and a copy that I use for the ‘shouldering’.

At the end of the day - we’ve been working in 5.1 every day for 20 years, and it’s still a terrifying reality that most of our original scene content (and elemental sources like stock/library music and SFX) is mono or 2ch stereo. But that’s okay; the 5.1 sound field gives us 10 stereo pairs. That means there are 10 ‘phantom-centers’ in the room. 

Use ‘em! 

See this content in the original post

But remember, it doesn't matter what the mix sounds like in 5.1. It matters what it sounds like after it’s been downmixed to Lo/Ro stereo following downmix coefficients, which typically should probably be ‘-3/-3/-10, unless your broadcaster specifies different coefficients.

Damian: Hang on a minute! The -3 for the centre channel, I get, since it’s going to be split to the left and right stereo speakers. The -3 for the surrounds, again, I get, for the same reason. But the -10 for the LFE? This is a bit of a revelation. I was always under the impression the LFE is left out of stereo fold downs but now you seem to be suggesting that the set-top box or perhaps the TV itself is in fact combining the LFE into the stereo mix? So should I in fact be printing my stereo fold down with this same logic, instead of having it muted?

Michael: The truth is that either way is right, and both are likely to happen to your 5.1 mixes at some point in its life as a piece of content. Many downmix engines will indeed completely mute the LFE channel when summing toward a 2ch mix (recognizing that the rules are not necessarily the same for the 3 major variants of 2-channel delivery: Left-only, Right-only (Lo/Ro), Left-total, Right-total (Lt/Rt) and Left-binaural, Right-binaural (Lb/Rb).

But at least some Dolby decoders (hardware or software, depending largely on their age) will include the LFE in downmixes, at a -10dB trim. Of course, a -10dB value for LFE is also an available metadata parameter that anyone downstream of you might program into a Dolby encoder.

My point is, if you’re mixing to 5.0 as I suggested earlier, then the entire issue is moot.

Understand 5.1 Workflows First

Damian: I just went into shock a little bit. Is there anything we haven’t covered here that you would like to add?

Michael: 5.1 is a gateway format. If you get it right, then everything that comes next is an easy transition. If you get it wrong or stumble along with an incomplete understanding of the ecosystem, then even the small transition to 7.1 is going to be tough.

Meanwhile, things like Atmos, Auro3D, High-order Ambisonics (HOA) and brain-busting stuff like head-tracked Binaural (for VR/Gaming) is likely to be pretty impenetrable. 5.1 delivered via Dolby AC3 seems like a very old and established idea - because it is - but just because it’s been around for a long time doesn’t mean everyone has mastered it. It’s worth making the investment.

Thank You

Damian: Michael, thanks so much for this. As with all our conversations, I learned something valuable this time. 

Michael: Thanks, Damian! This was fun. Let’s do it again soon. 

Takeaways

My main takeaway from this conversation, getting back to the water analogy, is everything needs to flow through my individual 5.1 stems, then to my 5.1 Mix/M+E/Mix Minus Narration submaster faders and onward, intact, to the various fold-down busses.

All stereo and mono files ought to be treated as derivations of the 5.1 parent files, not as separate paths. So for instance, if I want my stereo mix to sound ‘thicker,’ I had better apply any additional processing to my audio tracks, or 5.1 stems, or (less ideally,) to the 5.1 Mix/M+E/Mix Minus submasters themselves, so that everything I print benefits from my creative or technical decisions. 

As you’ve just read, there’s a lot to consider in the mix, prior to delivering files to clients, distributors or broadcasters. It’s good to read and adhere to specifications carefully and ask a peer or technical wizard, if necessary.

Beyond the creative, there are a lot of reasons to think carefully about how to craft the ideal experience for our audiences. What we want, at the end of the day, is to feel safe in the knowledge that our audio visions have been realized by the ears that hear our work.

See this content in the original post