The 5 Types Of Audio Defined In An ADM File

June 3, 2024 Julian Rodgers

In this article Julian explains the 5 types of audio defined in the ADM model. While ADM is very technical, the differences between these five types the list points out is interesting.

We frequently talk about ‘metadata’. Most of us understand that it means ‘data about data’, or to put it another way ‘information about information’. A common examples explaining this is the metadata found in digital images which can cover things like the geotagging which shows where the image was taken, the time and date but also things like the camera used and its aperture settings. There a lot of data about the data.

I think it would be great if audio files contained information about what mic was used but that’s not something I’d anticipate. However the Audio Definition Model, which is perhaps best know in this blog as the ADM we use as a deliverable for Dolby Atmos mixes, contains a great deal of metadata. Most commonly we associate this with the panning metadata which allows Objects to be correctly rendered but there is much more to both object based audio and to metadata and the Audio Definition Model than Atmos pan data.

An interesting example is one I found in a document from the EBU website which I used in the creation of my article on the ADM model. It described the five ‘types’ of audio contained in the ADM. I wasn’t at all sure what a ‘type’ of audio could be. But the description of them all illustrated distinctions I was aware of but had never categorised clearly in my head. Here thay are. I hope you find them as interesting as I did.

Channel-Based

We’re all used to this and the distinction between the different widths channel based audio comes in. Audio of a particular width (mono, stereo, 5.1, 9.1.6 etc.) can be reproduced directly by a playback system of the same number of channels. If the number of channels differs it can be processed to fit, for example downmixing 5.1 to stereo.

Stereo is two channels, but which is which?

Each channel needs to be suitably labelled to make sure that, for example the centre channel reaches the centre speaker rather than the rear left surround of a 5.1 playback system.

Scene-Based

An example of scene based audio is Ambisonics. Each channel, instead of representing a speaker in the playback system, represents a defined aspect of the soundfield and is independent of the playback system used to reconstruct a representation of the soundfield.

The soundfield represents a complete picture the audio reaching a point in space at which the audio is being captured. It is a spherical representation of the audio with all the directional information specific to the location at which the capture took place.

The number of channels captured dictates the spatial resolution of the sound. More channels brings more positional acuity. These are referred to as ‘Orders’ of Ambisonics. 1st order contains 4 channels, for 2nd order there are 5 additional components on top of the 1st order ones, and for 3rd order a further 7 components, for a total of 16 component channels.

by Dr Franz Zotter zotter@iem.at CC BY-SA 3.0 or GFDL, via Wikimedia Commons

These Ambisonic signals are converted to channel based audio suitable for replay over a playback system of a given layout using a set of decoding equations. Ambisonics is at least superficially familiar to most audio professionals but did you know it was an example of ‘scene’ based audio?

Object-Based

Objects in the Dolby Atmos Renderer

It was only when I was getting to grips with the idea of Dolby Atmos that I first came across the idea of Objects, and the metadata which accompanies them. It was also this difference between Objects and Beds, and the different between Dolby Atmos and conventional channel based surround like 5.1 which introduced me to the concept of an audio Object. The Atmos Object might be the best known but there are other types of Object. The word is used in two senses. The first being audio with accompanying positional metadata, as well as size and distance. But there are other ways Objects are used, for example an alternative language versions, or the level of the crowd noise at a sporting event might be presented as an Object in a broadcast and can be switched between or have its level manipulated.

Mid-Side Recording

Matrix-Based

You might have heard of a matrix being used to to generate audio channels in a different format. One example you’ll be familiar with is Mid-Side. This has to be reconstructed to L+R stereo to be listened to correctly. This is achieved using simple maths in a decoding matrix. This is useful not only for clever mastering techniques or mic arrays but also in broadcast, for example FM radio broadcasts. Another example of matrixed audio is the down mixing of 5.1 audio to stereo using Lt/Rt

Binaural-Based

Neumann KU100 Binaural Mic

Binaural audio is far more familiar and mainstream than it used to be. What was once a niche technique for enthusiasts has become the de facto standard way for consumers to access Dolby Atmos and Spatial audio music. The weakness of the format in years gone by - the fact that it could only be properly experienced over headphones, making it of very limited use as a consumer format, is no longer the issue it once was because of metadata which identifies binaural based audio as only suitable for headphones.

How Is This Useful?

The ability of a playback system to access information about what kind of content is contained in the audio streams it has available and to use it appropriately is exactly the point of Audio Definition as found in the Audio Definition Model. From directing the correct channel’s content to the correct loudspeaker through to being able to create a convincing immersive sound field from a bunch of audio tracks and some metadata telling the renderer how to address however many speakers happen to be on that playback system, none of this would be possible without information about the information.

I’d still like it if I could right click on an audio file and find out what mics had been used though…

See this gallery in the original post