Production Expert

View Original

Loudness Standards For Music Discussed By Experts

If you are making content that is destined for any or all of the music streaming services or on-demand services then this article, based on a recent podcast with Bob Katz and Rob Byers, is for you.

Why should content creators and mastering engineers need to take an interest in the loudness of content on streaming services? Surely, we can just mix the content and let the streaming services normalize it?

In this article, we aim to answer this question by looking in detail at the newly updated audio streaming and on-demand services recommendations from the AES with the help of Bob Katz and Rob Byers, two of the team responsible for the latest recommendation from the AES.

More About Our Guests Bob Katz and Rob Byers

Bob Katz has played the Bb Clarinet since age 10 and an audio engineer since 1971. Currently President and mastering engineer at Digital Domain, Bob is an AES Life Fellow, engineered 3 Grammy-winning albums and numerous nominees. He gives seminars worldwide in English, Spanish and French. Author of two books, one of which is considered the Mastering Engineer’s Bible and he also holds a U.S. Patent on his ambience extraction invention, which has been licensed to Weiss Engineering, Z-Systems and UAD. In his spare time, Bob is a columnist and has written numerous audio-related articles, reviews and a few AES Papers.

Rob Byers is an audio engineer, field recordist, and mixer who loves working in audio! His time in broadcasting has allowed him to work with organizations like NPR, American Public Media, and the podcast Criminal. He has recorded in environments from well below zero to well below the surface of the earth, documented life-changing events like the aftermath of Hurricane Katrina, led live international broadcasts from foreign soil, and recorded artists from Lizzo to Yo-Yo Ma. Rob also trains audio journalists and podcasters and has written educational guides for producers of all experience levels. He is now the Director of Broadcast and Media Production at American Public Media.

The Conversation

Mike Thornton: It probably comes as no surprise that we're going to be discussing loudness here today and how we can make sure that the content we create, whether it's music tracks or speech or combinations of music and speech, that our content will sound at its best when it's delivered to consumers by one of many streaming and online services out.

So, to the first question. Why does loudness matter and why as content creators, should we care? Isn't that the job of streaming services?

Bob Katz: It's not the loudness as much as the quality that counts. If we look at a characteristic that we call Peak to Loudness Ratio (PLR). Although it is not an infallible determination of quality, it is a pretty good indication that the transients are getting through and if you squash the transients too much, then the quality gets disturbed.

Mike Thornton: Bob, can you just explain what peak to loudness ratio means for those that may not have come across that term?

Bob Katz: We know how to measure loudness nowadays with a BS 1770 loudness meter. So first we measure the integrated loudness, but if that loudness gets too close to full scale, or in many cases too close to -1dBTP (TP stands for True Peak), then it means that probably a lot of compression or limiting has been applied and that's an indication that possibly you might start to lose transients and clarity. The ratio between the highest peak you hit and your integrated loudness is called the peak to loudness ratio.

[If you want to learn more about Peak to Loudness Ratio then check out our article You Can Now Put The Life Back Into Music When Preparing Tracks For Music Streaming Services.]

Mike Thornton: So for you, this is one of the key factors to take into consideration?

Bob Katz: Well, of course, I use my ears.

Mike Thornton: Point well made.

Bob Katz: When you look at a meter and you see that the peak to loudness ratio is going down as you start compressing, consider if you should be listening more carefully and see if maybe you are over-compressing.

Mike Thornton: Rob what about you, how would you answer this question?

Rob Byers: This is really important and I'm glad you're asking it. Loudness to me is all about consistency. It’s all about the end-user or the listener, who are your audience and providing them with a consistent experience. This paper, TD1008 covers a number of areas, firstly it covers music. It also covers podcasting and it addresses voice assistants on your smart speaker or on your phone.

It's all about giving our audience a consistent experience regardless of the platform they're on or the kind of content they're listening to. As a content creator, you want to care about loudness within the piece of work that you're creating, like mixing a podcast or mixing an album and of course, you are going to factor in the things that Bob was talking about and create a consistent experience within the small little world that you're creating in that piece of content. But that piece of content is then going to go out into the wider world and will have to stack up against all the other content that's out there too.

This is what TD 1008 is trying to do, the enormous task of bringing some consistency to all the content that's out there and bringing some consistency and a pleasurable listening experience to the audience.

Bob Katz: Let me jump in and say that Mike asked “isn't that the job of the streaming service"? Because of course, the streaming services do the normalisation. But if we, as the creators, don't have any idea of what the streaming service is going to do to our content, we need to know what the final product is going to sound like. So, we should have an idea of what the streaming service is going to do to our content.

Mike Thornton: Yes, and there is the challenge that some streaming services have different loudness normalisation to others, especially those with perhaps a higher loudness normalisation, say something like -14LUFS, then that peak to loudness ratio, that Bob was talking about, really starts to come into play and, just being aware of that, when we're producing content. Because if you've got content going to a provider that's using say -18LUFS, you can just turn it up and push it into a limiter, but then exactly those issues of peak to loudness ratio are going to come into play.

So is there anything that one can bear in mind as a creator to make sure that our content is likely to play nicely on all these different platforms?

Bob Katz: First of all, TD 1008 is striving to get the different streaming services to conform to a single recommendation. I can't say standard because it's not an official standard, so we'll call it a recommendation. Services like Apple, and TIDAL, are at the most, only a couple of dB apart as long as Spotify’s middle default setting of -14LUFS is followed. Then all that Spotify has to do to arrive at TD 1008, for a music track normalised as individual songs, or podcasts, normalised to the same loudness is to come down to -16LUFS. So currently they are about 2dB hot in that respect, but at the same time, the listener doesn't listen to the average.

If I look at an album, my ears go for the loudest song and that's one of the reasons why we've set up album normalisation and set the loudest song to -14LUFS with downward only normalisation as per the recommendations. So if I send in a song to Spotify at -15LUFS and -1dBTP then they should not raise it.

TIDAL and Apple should not raise it either because it would bring the true peak above -1 unless they were to add peak limiting, which we're trying to discourage.

Mike Thornton: We've been talking about -14, -15 and -16 LUFS as where we're currently working, but of course that's significantly different to broadcast and OTT systems where, depending on what part of the world you're in, we are looking at around -23 or -24 LUFS. So, I was pleased to see that the new AES TD1008 recommendations have acknowledged that the target we are working towards for everything, including music streaming, on-demand services, podcasts etc as well as broadcast and OTT services will be at -23 or -24 LUFS at some point in the future.

However, although that is the target, we are are not there yet, so can you remind us, why are we are having to use -14 to -16 LUFS for music streaming and on-demand services and what's preventing us from actually making everything work to -23 or -24 straight away?

Rob Byers: It's the devices. There are still quite a few, I would call them legacy devices out there in the world that don't have the gain to play back content that's down around -24 LUFS.

Mike Thornton: Right. So it's the legacy consumer’s devices that are the issue here, hence the aim to get to -24 LUFS. But, until those legacy products are a relatively small part of the equation. Bob, do you think it's likely to be a progressive transition? So we might go down to -16, then -18, then -20 before getting to -24 LUFS?

Bob Katz: Well, actually the saviour for that issue will be metadata. Because I could snap my fingers and say, boom, metadata is active, and the consumer might not even detect a difference if it's done properly. But this means that we are not only waiting for the devices to get more headroom and more gain, we're also waiting for them to become metadata compliant. The good news is that there's a standard for that, which is called CTA 2075, which specifies that devices are conformed with reading metadata.

I don't want to get into metadata too much in this podcast, but just think, if it's implemented correctly, it could be the solution to the problem you're talking about, which is how do you get from-16 to -24 without having to do it in little steps. I think metadata will solve it.

Mike Thornton: So effectively, the consumer's equipment will read the metadata flag and make the adjustments at the consumer’s end to match what that consumer's device is capable of handling. Is that fair?

Bob Katz: Yes. and if the device is really smart, it'll figure out where it was before the metadata and try to bring the target for the device close to that original. There is a lot to deal with including headroom. If you take a track at -24 LUFS and you bring it up to -18LUFS internally in the device, are you going to get overloads and all kinds of things.

Rob Byers: And if I may, if the devices are really smart, they could be aware of the environment that they are in. They could dynamically adapt that target to the noise of the loud subway train or doing the dishes in the background.

Mike Thornton: Would that bring in some dynamic or loudness range reduction so that if it's in a noisy environment, it reduces the loudness range so that there aren't elements of the content, which are too quiet to be heard over the background noise.

Rob Byers: I think it would have to at that point. And of course, then you get into all sorts of areas of what sounds good? What opinions are, around doing that kind of thing to the content? But hopefully, in that hypothetical futuristic world, the consumer would still have some control over how that metadata is applied and how that dynamic range reduction is applied.

Are they able to turn it off? Is it on by default? Is it off by default? All those kinds of things will have to be addressed, but that's a world I'm really excited about. I want to be able to listen to my content wherever I am. Even if I'm doing the dishes, I want to be able to hear it and understand it. I want it to be intelligent and that's sometimes hard right now, especially in the podcasting world where the production quality is pretty wide. It can be really hard to hear some of that content while you're doing other things and you're not focusing just on what you're listening to.

Bob Katz: Mike, the devil is in the details and it's going to take a long time, but as the transition happens, we will adapt.

Mike Thornton: What do you define as a long time? Because these days, sometimes a year is a long time in terms of the speed at which the developments move. What are we talking about? More like the five years?

Bob Katz: Probably five. It sounds like a good number.

Rob Byers: TD1004 [the predecessor to TD1008] came out in 2015, that took about a year to put together. This document [TD1008] took well over a year and the editing group, we spent many hours per week for over a year trying to draft this, get the language right. I would say four or five years, somewhere in that range, to come up with another document like this and release it to the world. But for now, we need to do the advocacy and bring the rest of the industry along with us with TD1008.

Bob Katz: As I understand it, Spotify has been eagerly awaiting the release of TD 1008, because they would like to get on the bandwagon.

Rob Byers: That’s fantastic.

Mike Thornton: That’s really good to hear and of course Bob, at the AES seminar session, that you hosted, at the virtual AES conference in October, it was one of those moments where you got a number of key people around the industry in the same virtual room and I certainly sensed real progress because we had so many key people all in the room together.

Bob Katz: Can you see my head nodding?

Mike Thornton: On radio, no ;). But certainly, as an onlooker, it was really gratifying to hear what all the different streaming services were doing and it was also great to hear the desire for standardisation. Obviously, that was understandable from Rob D’Amico from Avid, who are trying to deliver content to all these different streaming services with AvidPlay, but we also heard it from some of the other providers didn't we?

Bob Katz: Well, Avid is an aggregator and we had 3 aggregators and 3 different streaming services represented in the room.

Mike Thornton: Indeed, including YouTube, which was great to have to hear from directly. Now that we've had a chance to recover from that, how do both of you feel about how that particular session went?

Rob Byers: I attended that session, very much as an audience member and as a listener and from that perspective, it was incredibly informative. It was a big moment for the audio industry, for the music industry to see representatives from those different services, all in one conversation together and openly talking about their specs, and what they hope for the future and having a real conversation about some of these real finer important points.

It's really nice to hear that they acknowledge many of the concerns that have been expressed over the years from content creators. And Bob hats off to you for pulling that group together and creating a conversation like that.

Bob Katz: Wow. Thank you very much.

Mike Thornton: Bob. Did that session exceed your expectations?

Bob Katz: I'm still high from it to tell you the truth.

Mike Thornton: Fair enough. Back to TD1008, as someone who has been involved in the loudness world, certainly in the broadcast sector for longer than it has been a standard in the broadcast sector, I was very curious about this 2LU offset between music and speech, because I'd always understood and indeed taught that BS 1770 measures perceive loudness, irrespective of the content. In my head, that was the whole point of loudness normalisation and now we appear to have, what if I can perhaps bluntly call 2LU fudge factor. Can you give us a bit more insight into how that came about?

Bob Katz: Mike, it doesn't really matter whether the error is in the measurement or in the perception or in the preference. I prefer to think that it has to do with the listener’s personal preference, not to hear speech as loud as music and that's what it has boiled down to. I was in broadcast for many years and when we did the Metropolitan Opera, we had VU meters and the VU meter was hardly moving when the host was speaking and when the music was going on, the VU meter was pinning, and it sounded just fine.

So regardless of whether the meters are accurately determining the loudness, I believe that it probably is, our personal preference is to hear speech at a lower level to the music. Rob, what about you, you've been in the business for a long time?

Rob Byers: Yeah. Well said, Bob. This part of the recommendation came from some deep research and I think we referenced a couple of different papers and in one of those papers was written by someone who was on the TD1008 draft committee, Scott Norcross from Dolby. So there's research that supports this idea and much to what Bob was saying.

[The 2 documents Rob refers to are…

When we rolled out loudness for production at American Public Media back in around 2012 or 2013. Anecdotally, I was hearing back from engineers that they had this preference as well, and that they were noticing this difference, especially with the classical music programming that they were working on. And it was really clear then, and I don't mind going on record to say that, BS 1770 isn't perfect. You see a lot of chatter about that on online forums and you hear people talking about this. But as Bob said, it really doesn't matter where the flaw is. It really comes down to the listeners preference and what our ears are telling us.

Mike Thornton: Yeah, certainly my experience here in the UK is that when we moved over for television to BS 1770 from peak normalisation, prior to that, one of the areas that appeared to have challenged our folk most of all was, was what we call Light Entertainment here in the UK. Shows with speech and music and audience response and that really proved to be a challenge because, it almost seems a bit like you described the music that when, at the end of a music performance, the audience applause, and we expect that to be louder than the music, as well as the speech in the interview segments.

Bob Katz: Correct.

Mike Thornton: So, it's very interesting, hearing you talk about this, and it is certainly making me have another look at this whole area because the thing about that BS1770 model is that it is trying as hard as possible to reflect perception, not just straight measurement.

Bob Katz: And in fact Mike, integrated loudness potentially has a flaw, as a concept, because if you have a song that has large dynamic movement, the ear might react to the louder passages and perceive that that song is as loud as another song that is much more compressed. So, it's an approximation integrated loudness as an approximate.

Mike Thornton: I think we're all agreeing that, as you say, BS1770 is not perfect, but certainly in my experience, even though it's not solved all the problems, it certainly has made life a lot easier.

Rob Byers: Absolutely.

Mike Thornton: I would like to circle back to album normalisation that Bob referred to earlier because with loudness we have got used to, is now described as Track Normalisation, taking the integrated loudness of a single piece of content and making that -23 or -16LUFS, whatever it happens to be, and then taking a different piece of content and normalising that to the required loudness level, but Album Normalisation takes a different approach?

Bob Katz: The artist's intent is the purpose of album normalisation. If the artist wants to have a song, be soft relative to the loud song, then we need to follow the artist's wishes and it also sounds more natural to the listener. Eelco Grimm's research and also some of my own informal research seems to show that the listeners prefer to hear the songs in an album at the relative levels that they were produced at and it may surprise those of you listening to this podcast, but track by track normalisation is compression. It's dynamic compression. You might say, what do you mean, track normalisation is compression? If you bring up all the songs in an album so that the soft songs are brought up to be as loud as the loud song, you’re compressing the dynamic range of the album.

And that's why track normalisation is actually wrong. Very wrong.

Mike Thornton: I totally get it. Back in the day, we put a needle in a groove on a piece of vinyl and listened to an album, or more recently, stuffed CD in a slot and listened to an entire album as the artist intended it to be. But the reality of streaming services is more often than not, we're now listening to playlists, which are taking individual tracks from different albums. So how does album normalisation kind of fit into that model?

Bob Katz: One of the key tenets in TD1008 is that album normalisation is a lot smarter than you might think. Album normalisation actually allows you to shuffle songs in a playlist and hear them out of context, in a new context that will reflect, for the most part, what the artist's intent would have been if the artists had invented that playlist.

Now, there are some imperfections, and I can explain them, but for the most part album normalisation is a form of artificial intelligence. If I take the loudest track of a Beatles album and I bring it up to -14LUFS and then I take the loudest track of a Sinatra album and I make that be -14LUFS and then allow all the rest of the songs on the Beatles album to have the appropriate relative loudness and all the rest of the songs on the Sinatra album follow with that album’s relative loudness, [so loading them into the streaming service using album normalisation]. Naturally, our ears have been, quote-unquote, normalised to the loudness of the loud songs on both of those albums. Now when you play them out of context and they still feel right at the same setting of your volume.

Mike Thornton: So effectively, if there's a track, say that Beatles album which say is 3 or 4LU quieter than the loudest track on that album, if we were to play that quieter track in a playlist, it would still be played at 3 or 4LU quieter? So effectively reflecting the artist's intent for that particular track that it shouldn't be played as loud as the loudest track on the album.

Bob Katz: Relative to the loud songs in that playlist, which have all been normalized to -14LUFS, for example.

Mike Thornton: I guess some of that is tied in with metadata because presumably that sort of offset figure, needs to be logged in, in the metadata, so that the system knows to play that track quieter.

Bob Katz: Right. Now I'm going to tell you where the system breaks down. Let's take a classical music album played on a harpsichord so there's no one loud song because harpsichords don't have any dynamics. The harpsichord is supposed to be played softly. But the system will raise any of the harpsichord tracks to -14LUFS and they will sound louder than Metallica! So, there are issues, but if you have an album, which has at least one loud song, and that song is meant to be played loudly, then the album normalisation works.

Mike Thornton: That's really good news because again, we're making sure that's the artist's intent makes it all the way to the consumer’s ears.

Bob Katz: Well, we have to convince Spotify and Apple, that album normalisation should also be on all the time, even in mixed playlists and I hope we'll get there, but a lot of people need to be educated on this concept. That's why I brought it up.

Mike Thornton: So again, it's another one of these work-in-progress situations,

Bob Katz: Right.

Rob Byers: Bob you're saying it needs to be the default. Is that right?

Bob Katz: All the time. The only time there's an exception, is this. Let's take a DJ show where they're playing the greatest hits of all time and one of them is Tracy Chapman’s Driving My Car, which is a nice folky kind of a song. Well, with that kind of a playlist, everything is supposed to be the same loudness, like a radio station. So, if it's a radio-style DJ show then album normalisation should probably not be enabled.

Mike Thornton: You effectively allow the content creators to set the feel and the integrated loudness for the whole radio program and that's it.

Bob Katz: Or, if they're played back automatically, then turn off album normalisation and turn on track normalisation for DJ style.

Rob Byers: That's what we've done with The Current, which is an alternative station at Minnesota Public Radio, where we took our entire music library, which because of the kind of music The Current plays, features a very wide range of genres. For example, one hour, it can have Sinatra in it and then Lizzo comes up, a few songs later. It has a very wide range. But we took that library 8 years ago and actually downwards normalised everything to -24LUFS. It had some negative impacts, but it also had some very positive impacts in terms of automated playback and it made the DJ's job quite a bit easier. They find themselves leaving the music fader at unity and no longer have to pump the fader all the way up when a song is beginning quietly or it's Sinatra or acoustic music, et cetera. It's not perfect, by any means, and I'd probably go back and make some other decisions if we did it again now. But, I think it underscores your point about those DJ styles.

Before we wrap, Mike, could I ask you a question? I'm going to put you on the spot here, maybe just a little bit. What target are you aiming for, for this podcast?

Mike Thornton: This podcast is mixed to -16LUFS with an LRA of around 6. I didn't want to go as high as -14LUFS for all the arguments that we've discussed regarding peak to loudness ratio. So this podcast, as well as all our video and tutorial content on both YouTube and our premium tutorial content, is all done to -16LUFS.

Bob Katz: Did you do a shootout to -18?

Mike Thornton: No, it was a decision that as ‘Mr. Loudness’ here in our team, I didn't want to go up to -14 or even -11LUFS. I thought, no, let's sit somewhere a little bit further down and I decided on -16LUFS but I made that decision well before even TD1004 came out. Obviously, as we progress, the idea of having a little bit of extra dynamic range would be very nice to have, but -16LUFS is where I'm at.

Acknowledgements

Thank you to our guests, Bob Katz, and Rob Byers for sharing their thoughts on all things’ loudness for streaming and on-demand services.

We hope it will help you better understand how to prepare content for music and on-demand streaming services.

See this content in the original post