The Auphonic Adaptive Leveler corrects level differences between speakers,
between music and speech and applies dynamic range compression to
achieve a balanced overall loudness.
In contrast to our Global Loudness Normalization
Algorithms, which correct
loudness differences between files, the Adaptive Leveler
corrects loudness differences between segments in one file.
The algorithm was trained with over three years of audio files
from our web service
and keeps learning and adapting to new data every day!
We analyze an audio signal to classify speech, music and background
segments and process them individually by:
-
Amplifying quiet speakers in speech segments to achieve equal
levels between speakers.
-
Carefully processing music segments so that the overall loudness will be comparable to speech,
but without changing the natural dynamics of music as much as in speech segments.
-
Classifying unwanted segments (noise, wind, breathing, silence etc.) and then excluding them from being amplified.
-
Automatically applying compressors and limiters to get a balanced, final mix
(see also
Loudness Normalization and Compression
).
Our Adaptive Leveler is most suitable for programs
where dialog or speech is the most prominent content: podcasts, radio, broadcast,
lecture and conference recordings, film and videos, screencasts etc.
The following example is a recording with the internal microphone of a
mobile phone (Samsung Google Galaxy Nexus).
The Adaptive Leveler will apply dynamic range compression,
will amplify quiet speech and balance the volume between music and speech.
Furthermore our Noise Reduction Algorithms
will remove broadband background noise.
1. Unprocessed Audio:
2. Processed with Adaptive Leveler:
2. Processed with Adaptive Leveler and Noise Reduction:
The second demonstration file is an excerpt of the
Abenteuer Energiewende podcast at the Karlshochschule International University.
The interview includes multiple speakers with noticeable level differences and background sounds.
1. Unprocessed Audio:
2. Processed with Adaptive Leveler:
A third example file is a report about the
30th Chaos Communication Congress
in Freies Radio Blau Leipzig.
It starts with a live conference recording including music that is too loud, and has bad encoding quality, then continues with sections of loud speech (live conference recording) and
quiet speech (studio recording, better quality).
1. Unprocessed Audio:
2. Processed with Adaptive Leveler and Noise Reduction:
The Auphonic Multitrack Algorithms use multiple parallel input audio tracks in one production:
speech tracks recorded from multiple microphones, music tracks, remote speakers via phone, skype, etc.
Auphonic processes all tracks individually as well as in combination and creates the final mixdown automatically.
Using the knowledge from signals of all tracks allows us to produce much better results compared to our singletrack version:
-
The Multitrack Adaptive Leveler
knows exactly which speaker is active in which track and can therefore
produce a balanced loudness between tracks.
Dynamic range compression is applied to speech only – music segments are kept
as natural as possible.
-
Noise profiles are extracted in individual tracks for automatic
Multitrack Noise and Hum Reduction.
Furthermore we have added two multitrack-only audio algorithms:
-
Adaptive Noise Gate / Expander:
If audio is recorded with multiple microphones and all signals are mixed,
the noise of all tracks will add up as well.
The Adaptive Noise Gate decreases the volume of segments where
a speaker is inactive, but does not change segments where a speaker is active.
This results in much less noise in the final mixdown.
-
Crossgate:
If one records multiple people with multiple microphones in one room,
the voice of speaker 1 will also be recorded in the microphone of speaker 2.
This crosstalk (spill), a reverb or echo-like effect, can be removed by the
Crossgate, because we know exactly when and in which track
a speaker is active.
Please see the
Auphonic Multitrack Algorithms Documentation
for detailed information about our multitrack algorithms.
If you want to use our multitrack version, please also read our
Multitrack Best Practice
to get some important pratical tips!
To get an idea what our algorithms can do, listen to the Multitrack Audio Examples below:
each file is divided into multiple segments, which are all annotated with details about the algorithms (written above the waveform)
– click inside the waveform to seek/zoom in files and to compare segments, use the "Show Input" button to listen to individual input tracks.
The first audio example consists of three speech and one music track
from Operation Planlos.
It demonstrates how the Multitrack Adaptive Leveler adjusts the
loudness between speakers and music segments.
Disturbing noises from inactive tracks are removed by the Adaptive Noise Gate.
1. Audio Mixdown without Adaptive Leveler:
2. Processed with Adaptive Leveler, Gate and Crossgate:
3. Processed with Adaptive Leveler, Gate, Crossgate and Noise Reduction:
Now an audio example from the
Einfacheinmal.de Podcast
with two female and one male speech track.
The Adaptive Noise Gate is able to remove most of the background noises
from other tracks, the rest is eliminated by our Noise Reduction algorithms.
1. Audio Mixdown and processed with Adaptive Leveler:
2. Processed with Adaptive Leveler, Gate and Crossgate:
3. Processed with Adaptive Leveler, Gate, Crossgate and Noise Reduction:
Example 3 is from the
from Bits Of Berlin podcast
and contains four tracks - one music, one male speech and two female speech tracks and demonstrates the combination of all algorithms.
The first segment in the music track (the intro) is classified as foreground,
the second segment (at the end) as background music and therefore automatically
gets a lower level during the mixdown.
1. Audio Mixdown without Adaptive Leveler:
2. Processed with Adaptive Leveler, Gate and Crossgate:
3. Processed with Adaptive Leveler, Gate, Crossgate and Noise Reduction:
In example 4, recorded in a very reverberant room at a conference by
Das Sendezentrum,
it is possible to hear the spill (crosstalk) between the three active microphones.
The examples show how the Crossgate is able to decrease ambience and reverb.
1. Audio Mixdown without Adaptive Leveler:
2. Processed with Adaptive Leveler, Gate and Crossgate:
Example 5, a constructed excerpt from
NSFW084,
illustrates the parameter Fore/Background and our Automatic Ducking feature.
It includes two male speech and one music track.
The default setting of parameter Fore/Background is Auto,
which is shown in
Audio Example 3.
1. Fore/Background parameter of music track set to Ducking:
2. Fore/Background parameter of music track set to Background:
3. Fore/Background parameter of music track set to Foreground:
Our Global Loudness Normalization Algorithms calculate the loudness of
your audio and apply a constant gain to reach a defined
target level in LUFS, so that multiple processed files have
the same average loudness.
The loudness is calculated according to latest broadcast standards
and Auphonic supports loudness targets for
television (EBU R128, ATSC A/85),
radio and mobile (-16 LUFS: Apple Music,
Google;
AES Recommendation),
Amazon Alexa,
Youtube,
Spotify, Tidal (-14 LUFS),
Netflix (-27 LUFS)
and more.
Please see
Audio Loudness Measurement and Normalization,
Loudness Targets for Mobile Audio, Podcasts, Radio and TV
and
The New Loudness Target War
for detailed information.
A True Peak Limiter, with 4x oversampling to avoid intersample peaks,
is used to limit the final output signal and ensures compliance with the selected loudness standard.
We use a multi-pass loudness normalization strategy based on statistics from
processed files on our servers, to more precisely match the target loudness
and to avoid additional processing steps.
The following example is an unprocessed studio recording from
Undsoversity
and demonstrates various loudness targets from quiet to loud.
1. Unprocessed Audio:
2. Television US (ATSC A/85), -24 LUFS (no gate):
3. Television Europe (EBU R128), -23 LUFS:
4. Similar to ReplayGain, -18 LUFS:
5. Mobile Audio (similar to Sound Check), -16 LUFS:
6. Very loud, -13 LUFS:
Our Noise Reduction Algorithms remove broadband background noise
and hiss in audio files with slowly varying backgrounds:
First the audio file is analyzed and segmented in regions with different background
noise characteristics and a
Noise Print
is extracted in each region.
Then a classifier decides how much noise reduction is necessary in
each region (because too much noise reduction might result
in artifacts) and removes the noise from the audio signal automatically.
You can also manually set the parameter
Noise Reduction Amount
if you prefer more noise reduction or want to bypass our classifier.
However, be aware that this might result in artifacts!
Noise Reduction Usage Tips:
- Let the noise as natural and constant as it is, don't try to improve or hide it yourself!
-
Please do not use leveling or gain control before our noise reduction algorithms!
The amplification will be different all the time and we will not be able to extract constant noise prints anymore.
This means: no levelator, turn off automatic gain control in skype, audio recorders, camcorders and other devices ...
-
No noise gates: we need the noise in quiet segments which
noise gates try to remove!
-
Excessive use of
dynamic range compression
may be problematic, because noise prints in quiet segments get amplified.
-
Noise reduction might be problematic in recordings with lots of reverb,
therefore try to keep the microphone close to your speakers!
Please also take a look at our Adaptive Leveler
and Multitrack examples, which also demonstrate
the Noise Reduction algorithms.
The first Noise Reduction audio example is a recording by
James Schramko
with broadband background noise.
Listen with headphones to hear all details!
Example 1, Unprocessed Audio:
Example 1, Processed with Noise and Hum Reduction:
The second example is done by a female speaker
(
Diane Severson Mori
),
reading a poem by
Bruce Boston
.
Example 2, Unprocessed Audio:
Example 2, Processed with Noise and Hum Reduction:
Noise Reduction Example 3 (from the
Attitude of Aggression
wrestling podcast) contains parts of multiple recordings with
different background noise characteristics.
Auphonic will extract a noise print in each
segment and will decide if and how much noise reduction is necessary.
Example 3, Without Noise and Hum Reduction:
Example 3, Processed with Noise and Hum Reduction:
The Auphonic Hum Reduction algorithms (included in
Noise and Hum Reduction)
identify and remove power line hum:
First the audio file is analyzed and segmented in regions with different hum characteristics
and the hum base frequency (50Hz or 60Hz) and the strength of all its
partials
(100Hz, 150Hz, 200Hz, 250Hz, etc.) is classified in each region.
Afterwards the base frequency and all partials are removed according to their
strength with sharp filters and broadband noise reduction.
The following audio example by
FMC
contains a 60Hz power line hum with many partials (120Hz,
180Hz, 240Hz, 300Hz, etc.).
Example 1, Unprocessed Audio:
Example 1, Processed with Noise and Hum Reduction:
The second example, an excerpt from the
Better PR Now Podcast,
includes a female speaker and a changing 60Hz power line hum:
our algorithms will detect where the hum is present and will only apply hum
reduction if necessary.
Example 2, Without Noise and Hum Reduction:
Example 2, Processed with Noise and Hum Reduction:
Auphonic has built a layer on top of a few
external speech recognition engines:
Our classifiers generate metadata during the analysis of an audio signal
(identifying music segments, silence, multiple speakers, etc.) to divide the audio file into
small and meaningful segments, which are sent to the speech recognition engine afterwards.
The external speech services support over 80 languages and return text results
for all audio segments.
Afterwards, we combine the results from all segments, assign meaningful timestamps,
add simple punctuation and structuring to the result text.
This is especially interesting in combination with our multitrack algorithms:
Then we can send audio segments from the individual, processed track of the
current speaker (not the combined mix of all speakers). Hence we get a much
better speaker separation, which is very helpful if multiple speakers are
active at the same time or if you use background music/sounds.
In addition we can automatically assign speaker names to all transcribed audio segments
to know exactly who is speaking what and at which time.
Automatic speech recognition is most useful to make audio searchable:
Although automatically generated transcripts won't be perfect and might be difficult to read
(spoken text is very different from written text),
they are very valuable if you try to find a specific topic within a one hour audio file
or the exact time of a quote in an audio archive.
We also include a complete Transcript Editor directly in our HTML output file, which displays word confidence values to instantly see which sections should be checked manually, supports direct audio playback, HTML/PDF/WebVTT export and allows you to share the editor with someone else for further editing.
Please see
Automatic Speech Recognition
for detailed information about our speech recognition integration.
Example 1 are the first 10 minutes from Common Sense 309 by Dan Carlin (male, English),
link to the generated transcript with editor: HTML Transcript Editor.
Try to navigate within the audio file in the player below, search for Clinton, Trump, etc.
Example 1, English Speech Recognition:
The second example is a multitrack automatic speech recognition transcript from the first 20 minutes of
TV Eye on Marvel - Luke Cage S1E1,
link to the generated transcript with editor: HTML Transcript Editor.
As this is a multitrack production,
the transcript and audio player include exact speaker names as well.
You can also see that the recognition quality drops if multiple speakers are active at the same time – for example at 01:04.
Example 2, English Multitrack Speech Recognition:
As a reminder that our integrated services are not limited to English speech recognition, the third example is in German. All features demonstrated in the previous two examples also work in over 80 languages.
Here we use automatic speech recognition to transcribe radio news from
Deutschlandfunk
(Deutschlandfunk Nachrichten vom 11. Oktober 2016, 15:00),
link to the generated transcript with editor: HTML Transcript Editor.
As official newsreaders are speaking very structured and clearly, the recognition quality is also very high: try to search for Merkel, Putin, etc.
Example 3, German Speech Recognition: