Listen to audio examples and learn more about our algorithms.

Intelligent Leveler

Balances levels between speakers, music and speech – no compressor knowledge required.

Ensures all speakers sound equally loud

The Adaptive Leveler corrects level differences between speakers, between music and speech and applies dynamic range compression to achieve a balanced overall loudness. We use classifiers to carefully process music segments and prevent amplification of unwanted noises.

Based on over five years of training with audio files from our web service, the algorithm keeps learning and adapting to new data every day.
It is most suitable for programs, where dialog or speech is the most prominent content such as podcasts, radio, broadcast, lecture and conference recordings, film and videos, screencasts etc.

Read more

Noise & Reverb Reduction

Define whether Auphonic should remove only static or also fast-changing noises and if we should keep or eliminate music.

Crystal-clear audio made easy

Do you want to eliminate any ambient sounds from your audio to get a clean speech file? Or do you just want to get rid of static background noises while keeping the singing bird outside your window on record? With our advanced AI denoising algorithms, you have the choice!

Our classifiers also detect segments containing music or breathings, so you can decide if music is part of the content and should remain in your production or if you want to completely isolate the spoken content and remove all breathings and mouth noises.
For even greater speech intelligibility control, it is possible to separately adjust the amount of noise, reverb or breath reduction to strike the perfect balance between clarity and ambiance.

Read more

Filtering & AutoEQ

Removes unwanted frequencies and sibilance (De-Esser) and creates a clear, warm, and pleasant sound.

Perfectly balanced frequencies

Our AutoEQ algorithm automatically analyzes and optimizes the frequency spectrum of a voice recording, removing sibilance (De-Esser) and creating a clear, warm, and pleasant sound.

The equalization of multi-speaker audio can be complex and time-consuming, as each voice requires its own unique frequency spectrum equalization. Our AutoEQ simplifies this process by creating separate, time-dependent EQ profiles for each speaker, ensuring a consistent and pleasant sound output despite any changes in the voices during the recording.

Read more

Cut Filler Words and Silence

Automatically cut silent segments, pauses, and filler words like "ah", "uhm", "mh", or "ähm" in multiple languages.

Remove pauses and fillers in one click

A few seconds of silence quickly arise due to equipment re-adjustment or short speaking pauses to breathe or think. Many speakers also tend to fill the thinking pauses with "ah", "uhm", "mh", etc. to avoid awkward silence.
Whether it is silence or filler words, listeners usually do not enjoy listening to informationless audio.

Our automatic cutting algorithms reliably detect and remove silent segments and filler words. Simply enable the algorithms in your production without further settings to cut redundant filler content and achieve a high-quality listening experience. If you want to check and apply the cuts manually, we provide cut lists that you can import into your favorite audio/video editor.

Read more

Multitrack Algorithms

Process multiple tracks to create an optimized mixdown - featuring automatic ducking, noise gate and crosstalk removal.

More control with multiple tracks

Auphonic multitrack leverages multiple input audio files to produce a balanced, high-quality final mixdown. The algorithm processes individual and combined tracks, including speech tracks from multiple microphones, music tracks, and remote speakers via phone or Skype. This allows for a balanced loudness between tracks, with dynamic range compression applied only to speech segments and automatic ducking of music/FX tracks.
Denoising per track, adaptive noise gates and a crossgate decrease noise, crosstalk and reverb in the final mixdown by identifying when and in which track a speaker is active.

Auphonic's multitrack algorithm produces exceptional results, making it the go-to solution for audio professionals.

Read more

Loudness Specifications

Define a target loudness, true peak limit, MaxLRA and more for consistency across files and compliance with audio specs.

Produce for Netflix, Audible, Podcasts in one click

Auphonic is the perfect tool for you to never again worry about admission criteria for different platforms (Audible, Netflix, Spotify, podcasts, etc.) or broadcasters (EBU R128, ATSC A/85, radio and mobile, commercials).
You can define a set of target parameters (integrated loudness, true peak level, dialog normalization, MaxLRA, MaxM, MaxS), like -16 LUFS for podcasts, and we will produce the audio accordingly in one click.

Read more

Speech2Text & Automatic Shownotes

Multilingual speech-to-text with auto-generated shownotes and chapters displayed in a shareable transcript editor.

Make your audio searchable and accessible

Auphonic uses a multilingual Whisper model by OpenAI as self-hosted speech recognition engine, including a sharable transcript editor that can easily be integrated into your post production workflow without extra costs. In addition to our Whisper engine, we also integrated a wide range of popular external speech recognition services, including Amazon Transcribe, Google Cloud Speech API, wit.ai and Speechmatics.

Our Automatic Shownotes and Chapters feature gives you AI-generated summaries in multiple levels of detail and timestamped thematic sections, that you can use as shownotes and chapters to boost your podcast's accessibility and search engine visibility.
For multitrack productions, each track is processed separately, so you get a detailed transcript showing which speaker is active at what exact time.

Read more

Video Support, Metadata & Chapters

We produce enhanced audio or video podcasts with chapters and waveform audiograms in all output formats you need.

All audio and video formats supported

Chapter marks or enhanced podcasts are used for quick navigation within audio files and can be entered directly in our web interface or imported from various sources such as text files or audio editors (DAWs).
Auphonic supports all common audio and video file formats, offers customized encoding settings, maps metadata tags to multiple output files and exports them to platforms such as Soundcloud, YouTube, and Spreaker.

By generating videos from audio files, including a dynamically generated waveform, cover image or chapter images as the background, audiograms allow you to create shareable videos from audio-only productions with ease.

In video productions, Auphonic extracts the audio track, processes and merges it with the original video track without any loss of image quality. You can export the processed video to YouTube or create an audio-only version for your podcast platform automatically.

Read more

Try our algorithms with your own audio files!

Frequently Asked Questions

You can use Auphonic for free for up to 2 hours of audio each month. That way, you can fully test our algorithms and services without any commitment.

Yes, certain features like batch productions or watch folders for workflow automation are only available for premium users. You can find out more about our premium features on our pricing page.

Yes, we integrate with several tools which can be used for file transfers, to automatically publish your productions or to automate your workflow. Find out more about our integrations here.

Yes, our software has a developer API available. The API allows you to build custom integrations and applications that can interact with the software. You can check it out here.

The time it takes for the AI algorithms to improve your audio file will depend on the size and complexity of the file, but on average, it typically takes about 10% of the length of the audio file. For example, if your audio file is one hour long, it would take about 6 minutes for our algorithms to improve it.