Livestream Audio Workflow - In Depth How To

March 1, 2021 Guest User

In this article Steven Demott describes his journey into livestreaming and gives a detailed walkthrough of the setup he’s using for high quality online streaming of music performances.

Like many audio professionals, I have invested significantly into livestreaming over the past year. Up until that point I had been watching the livestream arena with interest, wondering if we should be getting the studio into livestreaming events, specifically concerts and other live music events, but it wasn’t until lockdown happened that I dove in.

How Did This Happen?

I got into livestream audio last March, when I was approached by a client on a Wednesday with the request to have them setup and ready for livestreaming by that weekend. Yes, you read that right. I was asked to get a complete livestream setup happening in a matter of days. No problem, right?

Luckily I wasn’t tackling this alone. The team we assembled was diverse, bringing together expertise from many disciplines to pull this off.

Basic Livestream Flow

In general, to livestream you need to get your audio and video into a livestream server, which then encodes it and displays it for all the world to enjoy. In my case there were some very specific needs. Along with live music, we had live dialogue, multi-camera inputs, lower third overlays, and pre-recorded audio and video playback. That’s a lot of sources.

OBS Studio

OBS (Open Broadcaster Software) Studio is a very robust piece of software that connects to just about any streaming server to deliver your livestream to the world. It allows you to incorporate several sources, divide sources into “scenes” that can be switched between with a click of the mouse or a user defined keyboard shortcut, and manage and monitor your connection throughout the stream.

As robust as OBS Studio is, there are a few downsides. The biggest downside is that it’s open source software. That means a few things. First, there is no company to go to when you have a problem. You need to figure things out for yourself, and/or lean on the user community for answers to questions or to solve issues. Luckily, OBS Studio is extremely popular, so there is a large user community, and a lot of tutorials available. Second, open source software is typically pretty geeky. What I mean by that is that configuration can be more difficult, especially if you’re used to software that guides you through setups. That said, that also means that you can typically configure every aspect of the software to suit your needs. And third, from an audio standpoint, you will find that OBS Studio’s audio meters are not super useful (more on that in a moment).

The downsides aside, OBS Studio has been super stable for almost a year of heavy use.

Blackmagic Design ATEM Mini Pro

We chose the Blackmagic Design ATEM Mini Pro to do our camera/input switching. In all honesty, I had little input on this because one of our video guys already had this piece of hardware and, time being of the essence, we used it because we had it available & it worked.

The ATEM Mini Pro has four inputs, of which we use three. The first two are live camera feeds. The third is fed from a MacBook Pro running a piece of presentation software called ProPresenter. ProPresenter is how we kick off pre-recorded segments and other full screen design elements.

We are not passing any audio through the ATEM Mini Pro. As much as I like this switcher, the audio in/out section is very mediocre. Even when asking Blackmagic Design about the input/output specs on the device I got a lot of “I don’t know” answers about things most of us audio people find extremely important. I was not willing to trust the audio on this thing, especially with the quality I was expected to deliver.

Midas M32R Digital Mixing Console

All the audio comes into the Midas and then goes out to the OBS system. If I am being 100% honest, I have to admit that I don’t really like this board. It’s probably fine for a small PA setup or for stage monitors, but handling a complex mix for livestream has pushed this board to its limits. We have purchased a replacement system that will be far more robust, consisting of: an iMac Pro, 3 Metric Halo ULN-8 3ds, 3 Avid S1s, and an Avid Dock. The new system should be up and running in the next couple of weeks. We’re still waiting on delivery of some of the items.

Because the audio and video are traveling separate paths, you need to check the final output in OBS, and offset the audio to sync up with the video. You need to do this for each source path. Luckily, both of our live cameras are the same model (Sony FS7iii), so the delay is the same for both of them. Through measuring, I was able to determine that I needed to delay the audio for the camera feeds by 210ms to line up with the video as they hit OBS, and I needed to delay the pre-record content coming from ProPresenter by 183ms.

I determined the audio delay settings by recording the OBS feed directly in OBS with no delays on the audio channel. I recorded myself standing in front of the cameras clapping, and I took that video into Pro Tools to measure the offset by selecting the start of the clap’s audio (by tabbing to transient), and shift dragging until I saw my hands come together in the video. I then looked at my selection length to get my audio offset for our live camera feeds.

At that point, I set the audio offset on the Midas channels (not in OBS) to the value determined in Pro Tools, and recorded me clapping again. I did this for 2 reasons: it allowed me to see and hear the final delay setting to make sure they were correct, and I could use that recording to play through our ProPresenter system to find the audio offset through the same method I used when determining the camera offset.

Another aspect of our livestream setup was the need to hear the pre-recorded segments in the room, but not feed live audio back into the room during the live segments. This is because we are all in one very large room. I am not in an isolated area where I can make noise when we’re live. To accomplish this, I set up the M32R so that the main 2-bus was only feeding the OBS system, and created a matrix output for the room that was fed only from the pre-record inputs, and all post-fader. All live inputs are excluded from that matrix, and never get fed back into the room’s speakers.

I also have all input types feeding their own mix bus, so that I can apply general EQ & compression settings on a per source type basis. For instance, all the headset mics feed a mix bus where I have a very general speech EQ setting, allowing me to use the individual input channels for voice specific settings.

MOTU M2

We have two MOTU M2s helping us route audio. One of them takes the audio from the ProPresenter MacBook Pro and feeds it into 2 channels on the Midas. The other is connected to the OBS MacBook Pro and takes the Midas 2-bus output to feed our audio mix to the livestream.

Here’s a diagram of our whole system.

LUFS, LKFS, dBTP, WTF?

If you work in Audio Post, you are already comfortable with the idea of loudness standards. For others, streaming audio has made it something you’ve at least heard about, even if it hasn’t been incorporated into workflows yet. The big deal we all need to be aware of now is how loud our mix “appears” to the human ear from beginning to end. This is LUFS (Loudness Units Full Scale), which is the same as LKFS (Loudness K-weighted relative to Full Scale), by the way.

Some very smart people have learned to look at the overall loudness of a signal over a period of time and average it out how the human ear would perceive it. There are 3 main metrics used to convey this: momentary, short term, and integrated.

Momentary loudness is the rough equivalent of a peak/PPM meter for audio. It’s showing you that exact moment in time, and constantly changes. Short term loudness is taking the average over the time span of a few seconds (typically 3 seconds), and integrated loudness is averaging the loudness from the beginning to the end of the program material, whether that’s a 3 minute song or an hour long livestream. If you want to know more about Loudness we have lots of info on the site, and good place to start is this article on the AES recommendations for Loudness standards for streaming.

The general idea behind this is: if you keep your loudness consistent, the listener/viewer on the other end has a much better experience. Especially in a complex broadcast consisting of a full band at some moments, a single acoustic guitar and singer at another moment, and a single speaking voice at other times. Those are very different levels of loud, and you don’t want your audience having to adjust their volume from one segment to the next. That is not a good user experience they might not hang around. You want the experience to be seamless and natural, so you have to monitor your integrated loudness, while keeping under the limit for peak signals (dBTP).

Orban Loudness meter

dBTP (decibels True Peak) in this scenario is the absolute measurement of the highest point of your reconstructed sound. The key here is the reconstructed part of that sentence. That means we’re interested in the peak signal level once everything has been encoded by your streaming software, sent to the streaming server, decoded by the streaming server, and then broadcast. You see, you need to have a meter that can figure out what that dBTP number will be on the other end of all this processing. That’s where the problem lies.

As I said earlier, the meters in OBS Studio are not super useful. They seem to show some semblance of PPM, with an option to use more CPU power to show a “true” PPM meter (whatever that means). The problem with a PPM meter for livestream is that I really need to see average loudness over time (aka integrated loudness), and not so much my peak, and, since we’re streaming to YouTube, we are trying to hit their loudness specs of -14 LUFS, with a maximum peak of -1dBTP. I actually shoot for -2 dBTP to give myself some wiggle room.

Enter the Orban Loudness Meter. Orban gives this super accurate meter away as a free download. I keep it running on the OBS system reading the output that OBS is sending to the streaming server.

When we move over to our new system I plan on using NUGEN’s MasterCheck on our mix system, since I know & trust it.

What’s Next?

This is far from over. Livestream seems like it’s here to stay. I’ve helped my client make a significant investment in new livestream equipment to up our production quality, and carry us through the rest of the year and beyond. Some of the changes we have planned revolve around our new hardware setup. We’re also going to follow the lead of some of the bigger livestreamers, and move the mixing into a DAW, where we’ll have access to more robust signal processing options.

In addition, we will be utilizing higher quality interfaces with onboard DSP that will connect via an AoIP network, and allow multiple source connections, therefore simplifying the setup, and reducing several stress points.

I’m excited about all of this, because I continue to learn and grow as we progress. I will even be taking on an assistant, whom I will train. Passing on knowledge is as important as acquiring it. Those of us with experience, must be investing in the next generation.

See this gallery in the original post