New Feature:

Auphonic Audio Examples

This page contains a short description and audio examples of our algorithms. Everything is done automatically – you can try Auphonic yourself with all unprocessed files and will get the same results.

The following algorithms are discussed:

Each audio example is divided into multiple segments and is annotated with details about the algorithms (written above the waveform). Please click inside the waveform to scroll through the audio, zoom or alternatively, use the fullscreen button below the player to see more, and click on "Show Input" to compare with input files.
We recommend listening with headphones so you can hear all the details!

Adaptive Leveler

The Auphonic Adaptive Leveler corrects level differences between speakers, between music and speech and applies dynamic range compression to achieve a balanced overall loudness. In contrast to our Global Loudness Normalization Algorithms, which correct loudness differences between files, the Adaptive Leveler corrects loudness differences between segments in one file.
The algorithm was trained with over three years of audio files from our web service and keeps learning and adapting to new data every day!

We analyze an audio signal to classify speech, music and background segments and process them individually by:
  • Amplifying quiet speakers in speech segments to achieve equal levels between speakers.
  • Carefully processing music segments so that the overall loudness will be comparable to speech, but without changing the natural dynamics of music as much as in speech segments.
  • Classifying unwanted segments (noise, wind, breathing, silence etc.) and then excluding them from being amplified.
  • Automatically applying compressors and limiters to get a balanced, final mix (see also Loudness Normalization and Compression ).

Our Adaptive Leveler is most suitable for programs where dialog or speech is the most prominent content: podcasts, radio, broadcast, lecture and conference recordings, film and videos, screencasts etc.

The following example is a recording with the internal microphone of a mobile phone (Samsung Google Galaxy Nexus). The Adaptive Leveler will apply dynamic range compression, will amplify quiet speech and balance the volume between music and speech. Furthermore our Noise Reduction Algorithms will remove broadband background noise.

1. Unprocessed Audio:

2. Processed with Adaptive Leveler:

2. Processed with Adaptive Leveler and Noise Reduction:

The second demonstration file is an excerpt of the Abenteuer Energiewende podcast at the Karlshochschule International University. The interview includes multiple speakers with noticeable level differences and background sounds.

1. Unprocessed Audio:

2. Processed with Adaptive Leveler:

A third example file is a report about the 30th Chaos Communication Congress in Freies Radio Blau Leipzig. It starts with a live conference recording including music that is too loud, and has bad encoding quality, then continues with sections of loud speech (live conference recording) and quiet speech (studio recording, better quality).

1. Unprocessed Audio:

2. Processed with Adaptive Leveler and Noise Reduction:

Auphonic Multitrack Algorithms

The Auphonic Multitrack Algorithms use multiple parallel input audio tracks in one production: speech tracks recorded from multiple microphones, music tracks, remote speakers via phone, skype, etc. Auphonic processes all tracks individually as well as in combination and creates the final mixdown automatically.

Using the knowledge from signals of all tracks allows us to produce much better results compared to our singletrack version:
  • The Multitrack Adaptive Leveler knows exactly which speaker is active in which track and can therefore produce a balanced loudness between tracks. Dynamic range compression is applied to speech only – music segments are kept as natural as possible.
  • Noise profiles are extracted in individual tracks for automatic Multitrack Noise and Hum Reduction.
Furthermore we have added two multitrack-only audio algorithms:
  • Adaptive Noise Gate / Expander:
    If audio is recorded with multiple microphones and all signals are mixed, the noise of all tracks will add up as well. The Adaptive Noise Gate decreases the volume of segments where a speaker is inactive, but does not change segments where a speaker is active. This results in much less noise in the final mixdown.
  • Crossgate:
    If one records multiple people with multiple microphones in one room, the voice of speaker 1 will also be recorded in the microphone of speaker 2. This crosstalk (spill), a reverb or echo-like effect, can be removed by the Crossgate, because we know exactly when and in which track a speaker is active.

Please see the Auphonic Multitrack Algorithms Documentation for detailed information about our multitrack algorithms.
If you want to use our multitrack version, please also read our Multitrack Best Practice to get some important pratical tips!

To get an idea what our algorithms can do, listen to the Multitrack Audio Examples below: each file is divided into multiple segments, which are all annotated with details about the algorithms (written above the waveform) – click inside the waveform to seek/zoom in files and to compare segments, use the "Show Input" button to listen to individual input tracks.

The first audio example consists of three speech and one music track from Operation Planlos. It demonstrates how the Multitrack Adaptive Leveler adjusts the loudness between speakers and music segments. Disturbing noises from inactive tracks are removed by the Adaptive Noise Gate.

1. Audio Mixdown without Adaptive Leveler:

2. Processed with Adaptive Leveler, Gate and Crossgate:

3. Processed with Adaptive Leveler, Gate, Crossgate and Noise Reduction:

Now an audio example from the Podcast with two female and one male speech track. The Adaptive Noise Gate is able to remove most of the background noises from other tracks, the rest is eliminated by our Noise Reduction algorithms.

1. Audio Mixdown and processed with Adaptive Leveler:

2. Processed with Adaptive Leveler, Gate and Crossgate:

3. Processed with Adaptive Leveler, Gate, Crossgate and Noise Reduction:

Example 3 is from the from Bits Of Berlin podcast and contains four tracks - one music, one male speech and two female speech tracks and demonstrates the combination of all algorithms. The first segment in the music track (the intro) is classified as foreground, the second segment (at the end) as background music and therefore automatically gets a lower level during the mixdown.

1. Audio Mixdown without Adaptive Leveler:

2. Processed with Adaptive Leveler, Gate and Crossgate:

3. Processed with Adaptive Leveler, Gate, Crossgate and Noise Reduction:

In example 4, recorded in a very reverberant room at a conference by Das Sendezentrum, it is possible to hear the spill (crosstalk) between the three active microphones. The examples show how the Crossgate is able to decrease ambience and reverb.

1. Audio Mixdown without Adaptive Leveler:

2. Processed with Adaptive Leveler, Gate and Crossgate:

Example 5, a constructed excerpt from NSFW084, illustrates the parameter Fore/Background and our Automatic Ducking feature. It includes two male speech and one music track.
The default setting of parameter Fore/Background is Auto, which is shown in Audio Example 3.

1. Fore/Background parameter of music track set to Ducking:

2. Fore/Background parameter of music track set to Background:

3. Fore/Background parameter of music track set to Foreground:

Global Loudness Normalization and True Peak Limiter

Our Global Loudness Normalization Algorithms calculate the loudness of your audio and apply a constant gain to reach a defined target level in LUFS, so that multiple processed files have the same average loudness.
The loudness is calculated according to latest broadcast standards and Auphonic supports loudness targets for television (EBU R128, ATSC A/85), radio and mobile (-16 LUFS: Apple Music, Google; AES Recommendation), Amazon Alexa, Youtube, Spotify, Tidal (-14 LUFS), Netflix (-27 LUFS) and more. Please see Audio Loudness Measurement and Normalization, Loudness Targets for Mobile Audio, Podcasts, Radio and TV and The New Loudness Target War for detailed information.

A True Peak Limiter, with 4x oversampling to avoid intersample peaks, is used to limit the final output signal and ensures compliance with the selected loudness standard.
We use a multi-pass loudness normalization strategy based on statistics from processed files on our servers, to more precisely match the target loudness and to avoid additional processing steps.

The following example is an unprocessed studio recording from Undsoversity and demonstrates various loudness targets from quiet to loud.

1. Unprocessed Audio:

2. Television US (ATSC A/85), -24 LUFS (no gate):

3. Television Europe (EBU R128), -23 LUFS:

4. Similar to ReplayGain, -18 LUFS:

5. Mobile Audio (similar to Sound Check), -16 LUFS:

6. Very loud, -13 LUFS:

Noise and Hiss Reduction

Our Noise Reduction Algorithms remove broadband background noise and hiss in audio files with slowly varying backgrounds:
First the audio file is analyzed and segmented in regions with different background noise characteristics and a Noise Print is extracted in each region.
Then a classifier decides how much noise reduction is necessary in each region (because too much noise reduction might result in artifacts) and removes the noise from the audio signal automatically.

You can also manually set the parameter Noise Reduction Amount if you prefer more noise reduction or want to bypass our classifier.
However, be aware that this might result in artifacts!

Noise Reduction Usage Tips:
  • Let the noise as natural and constant as it is, don't try to improve or hide it yourself!
  • Please do not use leveling or gain control before our noise reduction algorithms! The amplification will be different all the time and we will not be able to extract constant noise prints anymore.
    This means: no levelator, turn off automatic gain control in skype, audio recorders, camcorders and other devices ...
  • No noise gates: we need the noise in quiet segments which noise gates try to remove!
  • Excessive use of dynamic range compression may be problematic, because noise prints in quiet segments get amplified.
  • Noise reduction might be problematic in recordings with lots of reverb, therefore try to keep the microphone close to your speakers!

Please also take a look at our Adaptive Leveler and Multitrack examples, which also demonstrate the Noise Reduction algorithms.

The first Noise Reduction audio example is a recording by James Schramko with broadband background noise. Listen with headphones to hear all details!

Example 1, Unprocessed Audio:

Example 1, Processed with Noise and Hum Reduction:

The second example is done by a female speaker ( Diane Severson Mori ), reading a poem by Bruce Boston .

Example 2, Unprocessed Audio:

Example 2, Processed with Noise and Hum Reduction:

Noise Reduction Example 3 (from the Attitude of Aggression wrestling podcast) contains parts of multiple recordings with different background noise characteristics. Auphonic will extract a noise print in each segment and will decide if and how much noise reduction is necessary.

Example 3, Without Noise and Hum Reduction:

Example 3, Processed with Noise and Hum Reduction:

Hum Reduction

The Auphonic Hum Reduction algorithms (included in Noise and Hum Reduction) identify and remove power line hum:
First the audio file is analyzed and segmented in regions with different hum characteristics and the hum base frequency (50Hz or 60Hz) and the strength of all its partials (100Hz, 150Hz, 200Hz, 250Hz, etc.) is classified in each region.
Afterwards the base frequency and all partials are removed according to their strength with sharp filters and broadband noise reduction.

The following audio example by FMC contains a 60Hz power line hum with many partials (120Hz, 180Hz, 240Hz, 300Hz, etc.).

Example 1, Unprocessed Audio:

Example 1, Processed with Noise and Hum Reduction:

The second example, an excerpt from the Better PR Now Podcast, includes a female speaker and a changing 60Hz power line hum: our algorithms will detect where the hum is present and will only apply hum reduction if necessary.

Example 2, Without Noise and Hum Reduction:

Example 2, Processed with Noise and Hum Reduction:

Automatic Speech Recognition and Audio Search

Auphonic has built a layer on top of a few external speech recognition engines: Our classifiers generate metadata during the analysis of an audio signal (identifying music segments, silence, multiple speakers, etc.) to divide the audio file into small and meaningful segments, which are sent to the speech recognition engine afterwards. The external speech services support over 80 languages and return text results for all audio segments. Afterwards, we combine the results from all segments, assign meaningful timestamps, add simple punctuation and structuring to the result text.

This is especially interesting in combination with our multitrack algorithms: Then we can send audio segments from the individual, processed track of the current speaker (not the combined mix of all speakers). Hence we get a much better speaker separation, which is very helpful if multiple speakers are active at the same time or if you use background music/sounds. In addition we can automatically assign speaker names to all transcribed audio segments to know exactly who is speaking what and at which time.

Automatic speech recognition is most useful to make audio searchable: Although automatically generated transcripts won't be perfect and might be difficult to read (spoken text is very different from written text), they are very valuable if you try to find a specific topic within a one hour audio file or the exact time of a quote in an audio archive.

We also include a complete Transcript Editor directly in our HTML output file, which displays word confidence values to instantly see which sections should be checked manually, supports direct audio playback, HTML/PDF/WebVTT export and allows you to share the editor with someone else for further editing.

Please see Automatic Speech Recognition for detailed information about our speech recognition integration.

Example 1 are the first 10 minutes from Common Sense 309 by Dan Carlin (male, English), link to the generated transcript with editor: HTML Transcript Editor.
Try to navigate within the audio file in the player below, search for Clinton, Trump, etc.

Example 1, English Speech Recognition:

The second example is a multitrack automatic speech recognition transcript from the first 20 minutes of TV Eye on Marvel - Luke Cage S1E1, link to the generated transcript with editor: HTML Transcript Editor.
As this is a multitrack production, the transcript and audio player include exact speaker names as well.
You can also see that the recognition quality drops if multiple speakers are active at the same time – for example at 01:04.

Example 2, English Multitrack Speech Recognition:

As a reminder that our integrated services are not limited to English speech recognition, the third example is in German. All features demonstrated in the previous two examples also work in over 80 languages.
Here we use automatic speech recognition to transcribe radio news from Deutschlandfunk (Deutschlandfunk Nachrichten vom 11. Oktober 2016, 15:00), link to the generated transcript with editor: HTML Transcript Editor.
As official newsreaders are speaking very structured and clearly, the recognition quality is also very high: try to search for Merkel, Putin, etc.

Example 3, German Speech Recognition:

Try our Algorithms with your own Audio Files!

Try Auphonic