Voice processing for Podcasts and Voice-Overs

(Last Updated On: April 9, 2020)


Podcasts and voice-overs are growing fast in popularity. Online platforms have grown significantly in the past couple of years with podcasts for all sorts of topics. You can purchase books as audio books these days. A large portion of YouTube videos have narration or voice over video.

We have been going mobile more than ever before; we live inside cars, trains and planes where listening to books or podcasts makes a lot of sense. And the podcasts and video platforms brought democracy to the world of media production. Yes, you can start yours today with little investment if you have some good ideas.

This demand then translates into the growing need of processing voice in a professional manner. This is the topic we want to cover here.

Where it all starts

A good microphone and audio interface for your DAW is the start of a decent voice capture. It is not the main subject of this article, as we want to present an approach for processing the voice once it’s recorded. But for sure the initial investment needs to be in a decent microphone, professional audio interface, and setting up a decent noise-free environment.

A Rode Procaster and a Universal Audio Apollo are an integral part of our setup. In this first step, background noise can become an issue so be sure to check our article on the subject.

The key for success here is not only having the right equipment, but also experimenting with it until you find an ideal setup which works for your voice or for the voice you are capturing. This probably deserves a future article as well. But for now we will assume you’ve gone over this step and know how to capture your voice in a way that sounds decent to you.

The elements in the chain

So now you have your podcasts and voice-overs recorded, and you put them side by side with a YouTube video from your favourite influencer and bam ! Nowhere near as loud and clear… what is going on ? Well their voice over, podcast or video has been mastered and processed for clarity and loudness. Let’s have a look at the typical elements we’ll use in the chain to achieve improvements to the recordings you have:

  • Noise Gates to get rid of undesired noise between sentences. Tricky animals, as too much gating can make it sound unnatural or lead to undesired effects. We normally attempt background noise reduction techniques and leave noise gates as a last resort.
  • Equalisers to make the voice stand out. Different voices occupy slightly different parts of the spectrum. Quite often, the frequency response of the microphone being used will colour the voice capture in a way that doesn’t sound entirely natural, or doesn’t stand out. This is particularly important if you want to have background music as well.
  • De-essing and plosive removing. Quite extensive subjects in their own right, but essentially de-essing reduces or eliminate the excessive prominence of sibilant consonants. Pronunciation of “s”, “z”, “ch”, “j” and “sh” in English is the main cause of this, but it occurs in other languages as well. Plosives, the annoying little pops caused by the sudden release of air when we speak, can be tamed with proper screens in your microphone. Some specific plugins will take care of whatever is left.
  • Voice levelling. You can in theory adjust gain manually across the audio track, but many modern plugins do this automatically against a target setting, thus increasing gain dynamically and achieving results faster.
  • Compression or limiting. Yes there will be loudness inconsistencies in the voice capture even after the leveller does its job, and compression ensures consistency across the length of the track. This is subjective, and we for example prefer gentle compression to respect the original dynamics of the voice.

The order and the amount in which the above elements will be used varies from job to job.

The goals

The goal is to achieve consistency of loudness, spectrum footprint, dynamics and intelligibility across the board. The loudness goals we need to meet will depend on the target platform.

For example -14 LUFS (Loudness Units relative to Full Scale) is the recommendation on YouTube at the time of writing, and other platforms have similar goals. For example, when delivering podcasts to Apple Podcasts the recommendation is -16 LUFS.

We aim to achieve common ground, but in some cases we might end up with different files or equivalent for different platforms, which is also fine.

While intelligibility can be measured objectively, it pays to listen in different speakers and headphones, and to ask different people who are not familiar with the material to have a listen.

The spectrum footprint and dynamics can be checked with spectrum analysers and RMS and peak meters.

Last but not least, making sure we’re away from the clipping ceiling is also quite essential, as no one likes digital distortion.

The goal is having transparent, clear sounding podcasts and voice-overs, not winning loudness wars.

Tools of the trade

This process is done ‘in the box’ using plugins in the DAW itself. Many vendors have amazing plugins for the job, but it takes experience to know when and how to use them. Some of our favourite tools come from Waves, Accusonus, Universal Audio, Zynaptiq and Gullfoss.

Processing Voice with ERA and UA plugins
Processing Voice with ERA and UA plugins
Using Gullfoss as a de-esser
Using Gullfoss as a de-esser

Meters can be native to the DAW – Logic Pro X has a few good ones – but Waves also has interesting meters that we can use. With the meters we can ensure from an objective standpoint that we are not exceeding the recommendations for loudness, level and clipping.

Waves Meter
Waves Meter

Some Examples

We’ve collected some examples of before and after the processing here, using our own recording of Harvard Sentences

The processes we utilise are also present in our own podcast, Where Music Meets Technology.

In conclusion

In a world of growing demand for podcasts and voice-overs, the right tools and the right amount of experience are required to achieve results which will stand out yet sound natural.

Many variables will affect the quality of the output:

  • The person speaking,
  • Time of the day or season of the year,
  • The familiarity with the material,
  • The microphone and it’s positioning,
  • The room or ambient where the recording is being captured,
  • The Audio Interface and the DAW,
  • The processing used after the recording.

Building experience on this takes patience and time. Editing and processing podcasts and voice-overs is one of services we offer, so if you want to know more and get a quote please reach out via our email.

If you want to exchange some ideas so you can try to do it on your own, it’s also fine. We would love to hear from you.

Did you like this article ? Sign up to our newsletter to get awesome articles like this regularly:

Newsletter Signup Form

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.