Production Expert

View Original

Personalised HRTFs The Holy Grail For Modern Mixing

Partner Content

Recently we have heard a lot about personalised HRTFs and why we need them. But do we always need them every time we are working in binaural? In this article, we explore when it's best to use standard or generic HRTFs and when it's best to use a personalised HRTF for converting audio into binaural.

Julian Rodgers in our article What Are Custom HRTFs And Why They Matter? said this…

“I used to run a music tech course and every year I’d be faced with a room full of new students with more enthusiasm than experience. There are a few audio crowdpleasers which I’ve found to be ideal material with which to grab the attention of people interested in audio who haven’t yet come across them. However, the best for getting an instant response was playing a binaural recording over headphones. Whoops of excitement would show me who was really “getting” it but more interesting to me were the people who were left underwhelmed. Why did some people experience an utterly convincing, immersive experience while some just heard a slightly odd stereo recording?”

The thing Julian’s students were experiencing where most people were bowled over by the experience but some were left unimpressed was the issue of translation. If your physiology differs too much from the dummy mic, used to capture the binaural recording, the effect will be compromised.

If the crucial summations and cancellations to which your hearing is so finely tuned don’t fall in the right places, you are left with indistinct localisation and the 3D quality doesn’t occur. This translation is what is called Head Related Transfer Function or HRTF for short.

Image courtesy of Wikipedia

How Do We Hear?

Before we explain what an HRTF is, it will help to understand how we hear. The basic mechanism by which we locate sounds from left to right is well understood. Sounds coming from the right reach the right ear first, are a little louder and the left ear is masked by the head being between that ear and the sound source making it sound slightly different. However, this doesn’t explain how we can perceive sounds to be coming from behind or above us. For this, we are going to turn to Wikipedia…

“Humans have just two ears, but can locate sounds in three dimensions – in range (distance), in direction above and below (elevation), in front and to the rear, as well as to either side (azimuth). This is possible because the brain, inner ear, and the external ears (pinna) work together to make inferences about location. This ability to localize sound sources may have developed in humans and ancestors as an evolutionary necessity, since the eyes can only see a fraction of the world around a viewer and vision is hampered in darkness, while the ability to localize a sound source works in all directions, to varying accuracy, regardless of the surrounding light.

Humans estimate the location of a source by taking cues derived from one ear (monaural cues), and by comparing cues received at both ears (difference cues or binaural cues). Among the difference cues are time differences of arrival and intensity differences. The monaural cues come from the interaction between the sound source and the human anatomy, in which the original source sound is modified before it enters the ear canal for processing by the auditory system.”

What Is An HRTF?

Our ears are unique to us. The precise shape of the outer ear, the pinna, is unique to us as is the shape of our heads and torsos, all of which colour the sound our ears receive and our hearing systems become very finely tuned to our specific physiology. This information is known as a Head Related Transfer Function and can be captured and used to process audio in real time, much like convolution reverb uses an impulse response. All these characteristics influence how (or whether) a listener can accurately tell what direction a sound is coming from.

Why Do We Need HRTFs?

Binaural audio used to be a niche area but more recently, binaural has come to the attention of a whole new type of user because of the use of binaural in gaming and AR applications. Then there is Dolby Atmos. While ‘proper’ Atmos involves listening on a speaker array of 7.1.4 and higher and much of the film and TV Atmos content out there will be consumed using some extremely clever soundbars, almost all Dolby Atmos for Music content will be heard over headphones as a binaural rendered version. This binaural rendering has developed over the last few years and it is important to distinguish between Dolby Atmos and Apple’s Spatial Audio. The two are not the same and there are some issues with Spatial Audio and binaural which are specific to Spatial and not to Atmos.

Why Do We Need Personalised HRTFs?

If we go back to Julian’s students, we can see that one generic HRTF is not going to be satisfactory for everyone, there isn’t a one-size-fits-all solution. Personalised HRTFs are going to be needed if end users are going to be able to enjoy an immersive audio experience using headphones.

When Should We Use Generic HRTFs Or Personalised HRTFs In Audio Production?

There is no doubt that for the best immersive audio experience, the end-user should be using a personalised HRTF that takes into account their unique body and ear shape and how that impacts the way they perceive and locate sound coming from different locations. That said, getting a PHRTF has not been a cheap exercise, so even for the end-user, generic HRTFs are going to be the order of the day for quite some time.

But for audio production workflows, when should we use a personalised HRTF and when should we use a generic one? To get a broader perspective we asked a number of users as well as Jeffrey Read, CEO of Perfect Surround Inc, the brand behind the Penteo upmix and downmix plugin.

Jeffrey, before we get into the nitty-gritty of using HRTFs, can you give us a brief overview of how Penteo is used, as it seems to me that Penteo is gone beyond just being an upmix and downmix plugin, it has become what many consider to be a universal format translator?

“You asked me to be brief so here are 6 ways that Penteo is typically used…

  1. Upmix when multi-tracks unavailable ie Stereo > 5.1 or 7.1.2.

  2. Mandatory format conversions – ie 7.1 Surround > 7.1.2 Atmos.

  3. Reuse existing stems – ie 5.1 Surround > Any Atmos.

  4. Build the sound stage freeing up more mix creativity time for greater immersive effects by the sound designer.

  5. Remastering/restoration into different formats

  6. Compose, Mix and Monitor Immersive formats via Headphones

It’s important to note that Penteo enhances and does not compete with existing immersive encoders, decoders and panners. Yes, we can both output Binaural, but both have a place for different functions.”

I was curious as to why tools like Penteo would be used for outputting a binaural render, especially when working with Dolby Atmos rather than using the Dolby Atmos Renderer so I asked Jeffrey why mixers would need a binaural output that doesn’t use Dolby’s renderer?

“There are 4 reasons why it is beneficial to have Binaural output from an upmixer versus (for example) the Dolby Atmos Renderer…

  1. Preview before Rendering: When building the Atmos stage (lower ring) – preview the mix in binaural to ensure a high-quality feedback before loading everything into the renderer

  2. For Gaming: It can be beneficial to load binaural objects into the renderer rather than binaurally rendering from the renderer

  3. Remote monitoring: Since the pandemic, directors, producers and other stakeholders have not been able to sit in a studio. Penteo’s binaural output enables stakeholders a full preview experience via headphones wherever they are. As long as they have headphones, they can have an immersive experience.

  4. Composing directly in Binaural: Composers often need to compose and mix in settings where full immersive studio setups are not available. Penteo allows any format to be composed binaurally from any immersive format in any location, without the need for specialised monitoring.”

Jeffrey, you have been referring to Penteo in some of these examples, in your view why should people consider using Penteo in their binaural workflows, using your generic HRTF?

“Penteo is the only upmixer that includes binaural output and Penteo is also the only upmixer that allows binaural to be outputted from any of the 31 different output formats that Penteo supports (all Surround, all Atmos, Auro3D, DTSX and Ambisonics formats up to 16 channels).”

3 Users Share Their Experience Of Binaural And HRTFs

That is the view of one plugin manufacturer, but we wanted to hear from some users as to why they use generic and personalised HRTFs in the production process, so we asked 3 users to share their thoughts and experiences…

Andrew Halasz

Andrew Halasz is a professor in the Cinema Arts Department at Point Park University in Pittsburgh, Pennsylvania. He coordinates the sound design concentration in the program. Professionally he worked as a music recording engineer and then post-production sound person for a decade or so prior to moving into teaching. He has spent the last five years doing creative work in virtual reality, having produced a VR documentary about music and the Holocaust entitled ‘By the Waters of Babylon’. This was originally formatted for a VR headset and recently was re-formatted for a planetarium dome exhibition.

His work in VR led him to Ambisonic recording and production which consequently began a deep-dive into immersive sound. As Atmos developed and become more available, he began working in this format as well - both with creative, professional development, and curriculum creation for teaching. Monitoring binaurally became essential in these cases because VR is essentially headphone-based, head-tracked, non-static binaural listening, and because his university has not yet upgraded to an Atmos set-up, all the Atmos work is being monitored virtually via binaural rendering.

Jesper Eriksson

Jesper Eriksson has been in the audio business since the late 80s in Scandinavia starting in the analog domain learning the tricks and tools of the trade. In the early 90s, he had the wonderful experience to convert to file-based digital workflow with Digidesign's Pro Tools. Subsequently, he has worked with every incarnation and is today working in Pro Tools Ultimate 2022.6 with full Dolby Atmos capability.

Over the years he has been involved in all levels of audio productions corporate, radio, film, TV and music recording and mixing, game sound both with voice and sound FX creation, DVD and Bluray sound remastering, restoration and encoding both for Dolby, DTS and on every audio platform.

Today most of his time is taken up with sound design and mixing for TV, Film and music as well as consulting work across the wide audio business.

Paul Hill

Paul Hill is a filmmaker and editor first, and since of us need to wear multiple hats, Paul’s second hat is sound mixing. He works at the Wexner Center for the Arts at Ohio State University. They have an artist residency where they host filmmakers and video artists and assist them in their post-production needs, mostly editing but often finishing too. These projects range anywhere from documentaries to gallery installations to narrative features and shorts.

Now that you know a little more about these 3 users, onto the questions, and their answers…

How does binaural fit into your workflow?

AH: Binaural monitoring is used extensively in what I do.

Sound for VR - By nature, with virtual reality, you are listening to a binaural rendering. This work brought me into contact with many VR sound software solutions all employing binaural rendering. And of course, the final product will be experienced by the user as a head-tracked binaural mix.

Field recording - I do a great deal of field sound effects recording using a Coresound 2nd Order Ambisonic OctoMic. When editing/mastering these recordings, monitoring the encoded recording requires listening through a binaural decoder. I often decode these to other surround or immersive formats - 5.0, 7.0, Atmos - for use in conventional film sound design/mixing. After decoding the Ambisonic recording to the surround/immersive format, I will monitor a binaural virtualisation of that particular speaker set-up. Currently, the only physical set-up I have access to is the 5.1 mix room at my university (soon to me 7.1 at least, but hopefully Atmos), so a binaural preview is essential.

Teaching Immersive Sound - Binaural monitoring has been invaluable to teaching immersive sound. Because our classroom/labs do not have surround monitoring, when teaching the principles of immersive sound, I'm able to binaurally render the spatial panning/placement that I'm doing, or surround/Atmos clip I'm demoing with a tool such as Penteo16 16 Pro+. I then send this to the students' computers via a browser-based streaming solution like Audiomovers Listento or Source Connect. The students will hear the spatial placement and panning of elements in their headphones. This is so much more effective than simply explaining the effects of immersive sound. Additionally, until my university gets an Atmos monitoring set-up, all Dolby Atmos learning/practice is done by monitoring virtually with Dolby's binaural renderer. In addition, students can switch between the binaural headphone monitoring and a live re-render to the room's 5.1 speakers.

JE: When I use it for monitoring and checking mixes so it is useful to have the personalised versions to double-check between speaker and headphone listening. With so many versions of binary formats, I see a future for personalised HRTF.

Beyond monitoring, I use the Sennheiser AMBEO mic to record sound FX and backgrounds. I choose to keep the recording in the original format so I can jump between formats. This saves space and streamlines my workflow by being able to go from quad to any higher format all the way to Dolby Atmos.

Have you started to use personalised HRTFs?

AH: So far the extent of my experience with a personalised HRTF has been using Dolby's beta application to create a personalised HRTF for the Dolby Atmos Production Suite. I've looked into having a SOFA (Spatially Oriented Format for Acoustics) file of my personal HRTF created for some of the binaural renderers that support SOFA imports. I am looking into whether I can secure some research funds for this in the coming academic year.

JE: I have been involved with the beta testing of the Dolby PHRTF Creator iOS app and I am finding that it is working very smoothly.

PH: I have not started using personalized HRTFs yet but hope to in the near future. That said, I don’t think it’s necessary for every application. I think binaural and 3D audio is truly amazing given that you only need to wear your headphones, which we all have. Whereas, 3D imagery requires us to wear special glasses, which we don’t all have and that makes it more gimmicky in general and harder to catch on. Now that we can listen to our favourite music in binaural with spatial audio because it's easily accessible on streaming music platforms. I’ve also been exploring podcasts that are in binaural but so far in my experience, they’re not very good.

If so, in what circumstances do you use personalised HRTFs and when do you use generic HRTFs?

AH: At this point, I only use the personalised HRTF in the Dolby Atmos Production Suite but interestingly I haven't yet detected a significant difference between the generic one Dolby provides and my personal one. Regarding personalised vs. generic HRTFs, I think it was Julian, who shared on a podcast, this apt analogy… “A personalised HRTF is the equivalent of mixing in a calibrated room - as opposed to a generic HRTF where the room/speakers are not necessarily calibrated.” So, as with conventional speaker mixing, I would use the personalised HRTF for intensive mixing - really the bulk of the process - to work in a more "calibrated "monitoring situation. But just as one would run out to their car with a cassette to preview a studio mix, the generic HRTF is a great reference, allowing you to hear through an HRTF that the majority of the public might listen through.

Of course, everyone's anatomy is different. The listening experience with the generic HRTF will still be different for all, but it is getting closer. Technology may eventually bring about headphones that instantly make the measurement to create a personalised HRTF on-the-fly, but we are not there yet.

JE: I have my personalised HRTF in my Dolby RMU so I can flip between my personalised and Dolby’s generic HRTFs.

PH: I’m fairly new to binaural and Ambisonics in general. I probably use binaural more in a personal capacity at the moment. My first exposure to it was when I bought a pair of Sennheiser Ambeo binaural headphones/microphones that plug into an iPhone when they went on sale from $300 to $60. I was amazed at the quality and found myself recording everything with these. From there I learned that it’s possible to make binaural audio files.

When using generic HRTFs, which tools do you use to convert audio into binaural?

AH: Generally I use Penteo 16 Pro+ to monitor any of the immersive material I'm working on. With my VR work, I've used many solutions to ultimately provide a binaural rendering including Audio Ease, Blue Ripple and Noise Makers. When I work in Reaper, there are even more tools such as SPARTA, COMPASS, IEM (most free because they are the result of academic research) that provide binaural renderings but are only available as VST plugins.

Ultimately, I gravitated towards Penteo 16 Pro+ because it is a bit of an immersive sound Swiss Army knife. It allows decoding from Ambisonics to binaural or any immersive format I might want to export to. When decoding from Ambisonics to Atmos, for example, I will then bus to another instance of Penteo 16 Pro+ to monitor the Atmos decoding binaurally. Penteo also allows decoding from 2nd order Ambisonics to 7.1.4 or 7.1.6, which is great for ambiences. In this case, the height channels would live as objects and not in the 7.1.2 bed.

Penteo being AAX is great as it allows me to stay in Pro Tools. Because the majority of immersive tools that I have used are VSTs, they require me to work in a DAW that can host VSTs such as Reaper.

JE: I use my PHRTF in the Dolby renderer! When it comes to plugins, my workhorse is the Penteo 16 Pro+ with its own specific generic HRTF and it is in my template. I find that it is so easy to use and does a great job with all its possibilities across all formats, both in and out.

PH: Recently I did the sound mix for a 7-channel multichannel audio piece that was synced to a 3-channel video installation. However, the artist could not be present for the mix so I needed a way for him to hear the mix as if he were standing in the gallery space surrounded by the 7 speakers and binaural was the best way to simulate this. I have been using DearVR Pro to make 3D soundscapes and then making binaural exports using a generic HRTF.

Is there anything else you would like to add regarding the use of HRTFs or working in binaural?

AH: Being able to approximately virtualise a listening environment/speaker set-up is quite frankly awesome.

There are stories of mixing through the pandemic - where creative team members and producers were able to preview work by listening to the binaural rendering of mixes. One studio, I believe, created a BRIR of one of their dub stages for binaural mix previews. I mixed a 5.1 film primarily through a binaural rendering during this time, only doing a final pass in a 5.1 room.

Not having room for a home theatre in my house, I'm able to enjoy fairly close approximations of spatial mixes with Apple Airpods Max and this is also generally how I consume more movies and streaming entertainment.

I think technology will continue to advance the possibilities of binaural listening - eventually allowing personalised HRTFs to be created by the devices we are using to listen on making the experience more accurate to the creator's intentions.

JE: By keeping all my sound effects recording in the native Ambeo 4-channel format, I can go to any higher format using the Penteo 16 Pro+ plugin, see below…

Production Workflows Need Generic HRTFs

Having heard from a manufacturer and a number of users, it is clear to me that if I were to use my personalised HRTF in the audio production process, say to output a listening copy of an immersive audio mix, then I would be imposing my body and ears on my client and that would be no better, and more likely to be worse, than using a carefully designed generic HRTF, which will likely to be a much better average, rather than one individual’s listening mechanism.

As Jeffrey Read from Perfect Surround Inc said earlier in the article, there are 4 reasons why it is beneficial to have binaural output from an upmixer versus (for example) the Dolby Atmos Renderer…

  1. Preview before Rendering

  2. For Gaming

  3. Remote monitoring

  4. Composing directly in Binaural

But even though the Dolby Atmos Renderer can now support Personalised HRTFs, we should think carefully about when to use a PHRTF. As a mixer, if we need to work outside of a space equipped with Dolby Atmos monitoring then switching to your own PHRTF should provide a better immersive experience on headphones, but if we are delivering a listening mix from the renderer for a client to listen to, then unless we have the client’s PHRTF, then we are much better off using Dolby’s generic HRTF and if we haven’t hit the renderer yet then a tool like Penteo 16 Pro+ is great for outputting a binaural version using their generic HRTF.

The End User Should Benefit From Personalised HRTFs

There is no doubt that the immersive experience for the end-user will be much better if everyone could tap into a personalised HRTF rather than having to depend on a generic one-size-fits-all solution.

To that end, we are seeing the growth in PHRTF solutions, including Genelec with Aural ID being early adopters and more recently Dolby got in the game with a beta version of their Dolby Atmos Personalised Rendering but this only supports the Dolby Renderer software. The price point of these options means these will only be available to the wealthy enthusiast or professionals. But what about a solution for real end-users?

During the WWDC 2022 keynote address, Apple’s senior vice president of software engineering Craig Federighi briefly mentioned that in iOS 16, users will be able to use the iPhone's TrueDepth camera to scan your ears to create a personalised spatial audio profile. Although not specifically spelt out in the keynote, this is likely to be implemented as an HRTF.

Apple says that “the tuned spatial audio feature brings an even more precise listening experience”.

Personalised Spatial Audio will be available for AirPods 3, AirPods Pro, and AirPods Max users running iOS 16. It requires an iPhone with a TrueDepth camera to scan your face and each one of your ears. Interestingly, no iPad has this feature, even the ones with the TrueDepth camera.

Once you finish setting up Personalised Spatial Audio on your iPhone, with AirPods 3, AirPods Pro, and AirPods Max, apparently all other AirPods you own will be able to use this function. 9to5 Mac has tried this out with a developer version of iOS 16 and this is what they found…

“With this new feature available, it was very noticeable how having a 3D mapping of your face helps Dolby Atmos with Spatial Audio technology. Although I already enjoyed this function, some songs and albums sounded echoed a bit, as if the lead singer was singing lower than the rest of the band. Now, with Personalized Spatial Audio, everything finally sounds in place. For those who have the original HomePod, the listening experience sounds similar, as if the song was finally in its full capacities.

While AirPods Max was the only AirPods that didn’t struggle with the “echoed” songs, AirPods 3 and AirPods Pro users, on the other hand, will benefit the most with Personalized Spatial Audio.”

If you are interested in trying this out then Personalized Spatial Audio will be available from Apple with the public beta of iOS 16, available now, while the version for all users is expected to be available later this autumn, when Apple releases iOS 16.

But it needs to be said that this implementation of a personalised HRTF is somewhat muddied by Apple’s use of their Spatial Audio version of binaural, but only time will tell as to whether it is better for end-users or causes more problems than it solves.

Conclusion

There you have it. HRTFs are here to stay. We need them to render binaural versions of immersive audio like Dolby Atmos. Both generic and personalised HRTFs have their place. We need generic HRTFs for the production process and also for providing clients with listening mixes for now. Maybe we will get to a point where we have clients’ individual PHRTFs on file so we can render out a personalised binaural version of a mix, but for now, we will need to use the generic HRTFs built into the tools we use like Penteo 16 Pro+ and the Dolby Atmos Production Suite.

For the end-user, PHRTFs will be the best option, but until they are generally available then here too, generic HRTFs will still be the order of the day.

See this content in the original post