New Feature:

In a previous blogpost we talked about the Opus codec, which offers very low bitrates. Another codec seeking to achieve even lower bitrates is Codec 2.

Codec 2 is designed for use with speech only, and although the bitrates are impressive the results aren’t as clear as Opus, as you can hear in the following audio examples. However, there is some interesting work being done with Codec 2 in combination with neural network (WaveNets) that is yielding great results.

Layers of a WaveNet neural network.

Background

Codec 2 is an open source codec designed for speech, and aims for compression rates between 700bps and 3200bps (bits per seconds).

The man behind it, David Rowe, is an electronic engineer currently living in South Australia. He started the project in September 2009, with the main aim of improving low-cost radio communication for people living in remote areas of the world. With this in mind, he set out to develop a codec that would significantly reduce file sizes and the bandwidth required when streaming.

Another motivation according to David, was to be free from patented technologies used by closed source codes which he believes “require expensive and awkward licenses and are stifling innovation”. His belief is that this work can be done without requiring the use of patent protected codecs, so all his work is open source.

Potential Applications

Rowe’s perceived applications include VOIP trunking, voice over low bandwidth HF/VHF digital radio, (especially for amateur radio, so as to avoid issues with the use of proprietary codecs), and developing world and remote area communications, including military, police and emergency services.

Why we’re interested here at Auphonic is for its potential for longer podcasts, presentations and audiobooks, allowing for low storage and minimizing the effect of bad network connections.

How it Works

To achieve the lower rates sought, speech has to be reduced into the smallest possible information/data, and this means that the amount of redundant information that is transmitted has to be minimized.

To do this, Codec 2 uses harmonic sinusoidal speech coding. This splits the speech into 10 - 30ms segments, called frames. Each frame is then analysed for the fundamental frequency (or pitch), and the number of harmonics that fit into a 4Khz bandwidth. Further, for each of the harmonics within the 4khz range, the amplitude and phase are recorded.

This information is then coded, and the decoder reconstructs the audio based on this data.

Codec 2 encoding and decoding process Codec 2 Block diagrams - Encoder (left) And decoder (right)
Figure from Rowtel.

Audio Examples and Comparison with other Codecs

Whilst it all sounds great in theory, how does the reality match up? Let’s have a listen.

Here is a short wav audio file:

intro-orig.wav - 1.3 MB (download):

Applying Codec 2 (without the WaveNet decoder) at the different rates available, 3200bps, 2400bps,1600bps,1200bps and 700bps, we get:

3200bps (download):
2400bps (download):
1600bps (download):
1200bps (download):
700bps (download):

These examples show significantly reduced file sizes.
Putting that information more meaningfully in terms of how much storage you would need for an hour of audio:

  • At 3200bps, 1 hour of audio requires only 1.37MB (this would fit on one old 3½-inch floppy disk!)
  • A rate of 2400bps equates to 1.03MB/h
  • A rate of 1600bps equates to 0.68MB/h (Or approximately 2 hours of audio on one floppy disk!)
  • A rate of 1200bps equates to 0.51MB/h
  • A rate of 700bps equates to 0.3MB/h

So great compression, but the result is clearly not natural sounding.

As a comparison here is the same audio as a 8kb/s MP3:

MP3 at 8 kb/s - 23kb file size (download):

The file size is significantly larger than Codec 2 and the quality is arguably still not useable. You can clearly hear what is sometimes called sizzle - the weird metallic sounds you hear on low quality MP3s.

There is a final codec which is worth comparing, one that that seems to capture the two ideals of usable quality at low bitrates that we want: Opus.
Because of it's convincing low-bitrate performance, Auphonic already offers Opus encoding all the way down to 6 kbps, the lowest bitrate that Opus supports.

Comparing Opus at this 6 kbps rate to the 8kbps MP3 shows a significant improvement - although slightly muffled, it still sounds natural:

Opus at 6kbps (download):

Returning to Codec 2, and purely as s a bit of fun, here are some samples of Codec 2 on music! (Note that Codec 2 is not designed for music, it was only ever conceived for use on speech).

Original file (download):
As a 8kbps MP3 (download):

I personally couldn’t listen to the MP3 at this rate, so let’s listen to what Codec 2 does!

Codec 2 at different bitrates:

3200bps (download):
2400bps (download):
1600bps (download):
1200bps (download):
700bps (download):

As you can hear, it is not suitable for this application at all!

Codec 2 and WaveNet

As we have heard, despite the impressive bitrates achieved, the end result is not very natural sounding.
However, where it starts to get more interesting is the work done by W. Bastiaan Kleijn from Cornell University Library. He has been using with Codec 2 running at 2400bps on the coding side, but replaced the Codec 2 decoder with a WaveNet deep learning generative model (for more informationsee the paper Wavenet based low rate speech coding).

Here are some samples from the authors:

Codec Male Example
Original File
Codec 2
With WaveNet Decoder
Codec Female Example
Original File
Codec 2
With WaveNet Decoder

Comparing to Codec 2 you can hear a significant increase in quality, and if you compare to the original, there is not a significant decrease in quality.

David Rowe himself has stated that he considers the result to be "a game changer for low bit rate speech coding" and “as good an an 8000bps wideband speech codec”.

Conclusion

Whilst the (original) Codec 2 project represents very interesting work, it is limited, and the end result is not suited for podcasting. Also as we heard in the audio examples, it can only be used for voice recordings, and not music.

However, Codec 2 in combination with a WaveNet decoder improves the quality a lot and the low bitrate (2400bps) would be extremely interesting for podcasts and audiobooks distribution as well: one hour of audio would require only 1.03MB of storage!

Auphonic will add support for Codec 2 output files when the WaveNet decoder is in a usable form. For now we have just added support for Codec 2 input files.




Similar entries