Poor Intelligibility - Whose Fault Is It Anyway?

September 13, 2017 Alan Sallabank - Pro Tools Expert

Recently on social media, I noticed another flurry of activity about poor intelligibility in TV Drama, this time in Australia. The usual accusations were thrown around, ranging from the direction and acting to the techniques used to capture the location sound, through to the post production techniques, even to the relatively new technical specifications that broadcasters and distributors worldwide have set.

This got me thinking. Since the previous round of debate about this, mainly in the UK, I've had the benefit of chatting with some location sound crews, post production crews and also directors and film editors about this. I thought I'd lend my perspective on this.

Don't Blame The ......?

It's very easy to jump on the side of whichever part of the production process you're most familiar with. Everyone from the sound crew to creatives, get very defensive and news media outlets with vested interests - those who also own media distribution outfits, use it as a handy stick to beat their competitors with. All this distracts away from the fact that it's a group responsibility. I was going to start by exploring the technical aspects, but as I thought about it more I realised that the technical is interleaved with the creative.

Blame The Location Sound Crew?

Let's start with the location sound. For me this whole debate got kick started with "Gosford Park", directed by Robert Altman, 16 years ago. One dinner scene, in particular, caught the eye - a big sequence at dinner where multiple conversations were taking place. As I recall the location sound was recorded multitrack, across several Tascam DA88 recorders - a very new technique in those days. Robert Altman wanted to be able to select whose attention we were drawn to, in post production. This meant that each actor had to have their own personal mic and individual recording track so that in post you could switch between the conversations, a lot of them semi-improvised. Combining that with multi-camera filming allowed the director to completely restructure the scene in any way he desired in post production.

This provides the location sound crew with many challenges. Firstly, if they don't know which camera is going to be used, they have to take the safest option with conventional mic-ing techniques - microphones on boom poles, to keep them out of shot and to avoid shadows being cast by the equipment. Being out of shot/shadow for every possible angle is not necessarily compatible with the optimum microphone position. It's not just the microphones that have to be out of shot, it's the sound crew as well. This causes an over reliance on personal mics and multi-track recording. I've heard some production sound mixers (as is their job title), complain, "I'm not a mixer anymore, just a recordist". On some big shows like the US remake of "House Of Cards" or the film "Les Miserable", they have used CGI techniques to "paint out" microphones, but this technique has remained the preserve of high budget productions.

This is where best recording practice been forced into a corner by the creative desire. Which brings us to...

Blame The Director / Writer / Actor?

In a world where it's now possible to convey virtually anything visually, from the most far fetched plots to microscopic detail with Natural History, you can understand why directors and writers feel that the world is their oyster. If they want a scene like the famous one in Gosford Park they can do it. It's incredibly exciting for writers and actors - they get to convey things that previously we had literally only dreamt of. In the sixties and seventies where television made a huge surge, there was a saying - "The Colour's Better On Radio".

Actors have a huge opportunity to explore different techniques to portray their characters. If this involves "realism" - delivering dialogue in a "realistic" way rather than in a way that can be heard at the back of a theatre, if it's appropriate to the script, why not?

I've worked on every type of production from micro-budget internet one-off dramas to big US studio dramas. What I find in the former is that the show is usually someone's "baby". They've been intimately connected to that piece and view it as a piece of art first. On the latter, I've found directorial involvement in post production is far less and the show is generally run by the "show runner" - a kind of associate producer.

On a 12+ part series, you don't get just one director looking after every episode. Because of this, the show runner plays an important role in keeping an overview. They don't know each of the shows as intimately, but that's an advantage, as they are more capable of being detached enough to flag up intelligibility issues and they are mindful that the show is actually a product, which needs to make money. US shows are far more likely to be written by a team as well, rather than the director, which reduces the chances of confirmation bias - thinking that the script is intelligible, when in fact you're simply reciting it from memory and falsely believing everyone else can hear it with your clarity.

Blame The Post Production Crew?

I was at a trade show a month ago and bumped into an old acquaintance who is a production sound mixer. He was about to take part in a panel discussion about this very subject. His take was, "a lot of the time I've captured a perfectly usable boom track - why hasn't it been used?"

I view my job as a re-recording mixer, to bring to life the director's vision using sound. If the director wants the dialogues to sound "close for far" as it were, then that's what we get. Generally, whenever I've attempted to use reverbs and EQs to give personal mic dialogues a more realistic perspective, I've been overruled by the director. For them the script absolutely is king and if a character is reciting a line intimately when they are obviously miles away from the camera, that's the style the director wants, so that's what they get.

It's not my job to challenge the artistic direction - it's my job to deliver it and contribute towards it where appropriate. If I were to dig my heels in about intelligibility, it'd be like the location sound crew giving performance feedback to the actors on set. It's just something you do not do. Not if you want to continue working that is.

I haven't even mentioned quality control yet. Now we have to meet strenuous loudness regulations and due to the nature of these measurements, a high noise floor on dialogues forces the average loudness reading up. The problem is even worse with period pieces. Many locations in the UK certainly are situated close to busy motorways or flight paths. There's no point shooting at a marvellous stately home if under every line of dialogue you can hear the eight lane motorway half a mile away. We generally have to find the cleanest sound source and due to the nature of the beast, that is often the personal / radio mic.

Blame The Broadcasters?

Responding to public pressure, broadcasters and distributors have started implementing new loudness regulations, which vary subtly around the world. This is because, for an increasing percentage of broadcasters and distributors, their "network" is mainly a mechanism to deliver you advertising, with entertainment between the commercials. We the consumers kept complaining that the commercials were engaged in a loudness war, which was actually turning us away from them. So the broadcasters and distributors responded.

The issue with this is that in today's world of worldwide franchises, you have to ensure maximum compatibility. This means that even if you are watching on a "pay per view" or subscription basis and thus don't have to put up with commercials, the content you are watching may be viewed somewhere else, with commercials in-between. Broadcasters, or as they should be known now in this age of internet streaming, distributors, cannot afford to make multiple different versions to suit all different mediums and territories. And with different versions also comes the risk that the wrong version will get shown.

Blame Us?

As consumers we want convenience. We want content "on demand", when we want it, where we want it. When I was young, if I wasn't in front of a TV that could receive BBC1, at 17:05 on a Friday afternoon, I missed "Crackerjack". There wasn't anything I could do about it. I couldn't watch it on my iPad. I'm old enough that I couldn't even set a video recorder to catch it for me. But enough of me being a reminiscing old fart. Convenience has strings attached.

In order to enable you to watch your favourite show "on demand" or even live via a conventional aerial, there's an awful lot of technical "interference" that needs to take place, to both the sound and the picture. Let's be clear here. At no point, unless you're watching the master video file with WAV sound, are you ever going to hear or see it the same way as the re-recording mixer. Both the sound and picture get heavily data compressed using "lossy" algorithms - most commonly H264 for the video and a variant of AAC for the sound.

A lossy audio codec reduces the data bandwidth needed by literally throwing away things it thinks you can't hear. It then tries to make up these things on playback, with varying degrees of success. For a start, there is a noticeable effect on frequency response, particularly in the higher frequency range, which has an awful lot to do with clarity.

Intelligibility is not entirely a sound issue. A few years back I mixed a drama series which due to the subject nature had a lot of dark scenes and lots of blues and reds, which have traditionally always been problem colours for TV. There was a scene where the camera entered a room from the other side to our main character and slowly got closer, while she was reciting a piece to camera. When I was working with the master HDCamSR Digital Video tape and listening to the lossless sound, all was good, no issues.

A few months later I saw it go out on air, on a standard definition channel, with Dolby Pro Logic sound instead of 5.1. I decoded the Pro Logic back out to 5.1 on my home cinema and watched it on my Plasma TV with my set top box up-res-ing the picture to 1080HD. The first thing I noticed was a hell of a lot of burbling coming from the rears. Artefacts caused by the lossy audio encoding were predominantly out of phase, so they were popping up loudly in the surround speakers. What really caught my eye though was what the low bit rate H264 video encoding had done to the actress' lips. It had turned them into a stationary bar of black pixels as if we were disguising some profanity. This has an instant negative effect on the legibility. Suddenly I couldn't use my eyes to enhance the legibility of the audio and the audio artefacts were distracting as well.

So Who Do We Blame?

The answer is simple. Nobody. Despite some people remembering the past with a rose tinted filter, where all actors spoke in perfect "received pronunciation", where there was little use of Foley, sound effects and music, this is actually not a new syndrome. Intelligibility arguments have been going on since the dawn of recording technology.

We just have to remember one thing. A good mix doesn't actually suit all. The range of parameters that an average production has to take into account these days far outweighs anything before. For a start, our attitude as consumers to the media we're watching has changed hugely. Now we get miffed if we have to actually be somewhere at a predefined time to see or hear something. We'd rather watch something on a 5-inch screen, albeit in HD, when and where we want.

Intelligibility hasn't changed. We have. What needs to be reset is our attitude to what we're watching. You want to watch "Planet Earth"? Sit down with a cup of tea, turn off your phone and watch it in the best circumstances possible. Pay some respect to the people who have worked incredibly hard to bring you the program. If you're not going to pay proper attention, I for one would rather you didn't bother.

See this gallery in the original post