Auphonic Blog: New Speechmatics API Integration and Speech Recognition Services Comparison

Speechmatics released a new API including an enhanced transcription engine (2h free per month!) that we integrated into the Auphonic Web Service now.
In this blog post, we also compare the accuracy of all our integrated speech recognition services and present our results.

Automatic speech recognition is most useful to make audio searchable: Even if automatically generated transcripts are not perfect and might be difficult to read (spoken text is very different from written text), they are very valuable if you try to find a specific topic within a one-hour audio file or if you need the exact time of a quote in an audio archive.
Currently, Auphonic supports the integration of the following four speech recognition services: Wit.ai, Google Cloud Speech, Amazon Transcribe, and Speechmatics.
All speech recognition services are improving very quickly lately, and we'll do our best to keep you updated – getting closer and closer to perfection.

Most recently, Speechmatics developed a new Enhanced Model, that we added to our production services. So now, you do have the choice between the Standard Model with faster results and medium good accuracy or the Enhanced Model with slower results but very good accuracy.
For each transcription model, you can process two hours of speech recognition per month for free (= 4h free per month combined). If you exceed the two hours per month and model, you will be charged $1.25/h for Standard and $1.90/h for Enhanced Model. For high volumes, you may contact the Speechmatics support for a discount.

How do other Speech Recognition Services compare to Speechmatics?

We tried to compare the relative ASR (Automatic Speech Recognition) quality of all services in English and German – 'best' means just the best one of our integrated services.
As speech recognition services are evolving very quickly, this is just a snapshot and may change again in the near future.

	Wit.ai	Google Speech API	Amazon Transcribe	Speechmatics
Price	free, also for commercial use	1+1h free per month, (Enhanced + Default Model), then ~$0.96-$2.16/h (depending on user settings)	1h free per month, (first 12 months), then ~$1.44/h	Standard 2h free per month, then ~$1.25/h, much cheaper for high volumes	Enhanced 2h free per month, then ~$1.90/h much cheaper for high volumes
ASR Quality English	basic	good (Enhanced Model)	very good	very good	best
ASR Quality German	basic	basic (Default Model)	very good	very good	best
Keyword Support	No	Yes	Yes	Yes	Yes
Word Timestamps and Confidence	No	No	Yes	Yes	Yes
Speed	fast	fast	much slower	medium	slower
Supported Languages	ar, bn, my, zh, nl, en, fi, fr, de, hi, id, it, ja, ca, ko, ms, ml, mr, pl, pt, ru, si, es, sv, tl, ta, th, tr, ur, vi	most languages supported! 138 languages and dialects (see: Google Language Support)	af-ZA, ar-AE, ar-SA, zh-CN, zh-TW, da-DK, nl-NL, en-AU, en-GB, en-IN, en-IE, en-NZ, en-AB, en-ZA, en-US, en-WL, fr-FR, fr-CA, fa-IR, de-DE, de-CH, he-IL, hi-IN, id-ID, it-IT, ja-JP, ko-KR, ms-MY, pt-PT, pt-BR, ru-RU, es-ES, es-US, ta-IN, te-IN, th-TH, tr-TR	ar, bg, yue, ca, hr, cs, da, nl, en, fi, fr, de, el, hi, hu, it, id, ja, ko, lv, lt, ms, cmn, no, pl, pt, ro, ru, sk, sl, es, sv, tr, uk	ar, bg, yue, ca, hr, cs, da, nl, en, fi, fr, de, el, hi, hu, it, id, ja, ko, lv, lt, ms, cmn, no, pl, pt, ro, ru, sk, sl, es, sv, tr, uk

Try out Speechmatics in Auphonic

1. Connect Speechmatics to your Auphonic Account

Enter a display name for the Speechmatics service in your Auphonic account.
Sign up for a Speechmatics Account.
On the Speechmatics page, go to “Manage Access” on the left, choose a name for your key and click “Generate API Key”. This API Key will only be shown once, so make sure you keep it safe! Copy your generated API Key to your Auphonic account into the form field “API Key”.
For “Model Accuracy” please select “Standard Model” or “Enhanced Model”.

If you want to use both Standard and Enhanced Models of Speechmatics once in a while, you need to create two separate services (one service for each model) in your Auphonic account!

2. Add Speechmatics to your Auphonic Production

Once your Speechmatics and Auphonic Accounts are connected, you can either create a preset or directly start your production just like you are used to.
In section “Speech Recognition” you may set “Service” to “Speechmatics”, select the language of your audio, add “Keywords” if you want, and you are ready to “Start Production”! In your Speechmatics account menu “Track Usage” there is a detailed list of your usage for the current month. For more information, you can also watch the following Video Tutorial by Speechmatics about usage, limits, and billing.

3. Correct Results using the Auphonic Transcript Editor

Auphonic also includes a Transcript Editor directly in our HTML output file.
If you use Speechmatics or Amazon Transcribe, the editor displays word confidence values to instantly see which sections should be checked manually:

Conclusion

Automatic Speech Recognition Services are evolving very quickly, and we've seen great improvements since our last comparisons in 2018 – especially in recognizing sloppy language, accents, and dialects.

With the new Enhanced Transcription Model by Speechmatics, we can now pass on further optimizations to you at a very reasonable price (4h free per month) – and we guess there are more improvements to come pretty soon.

Also, please let us know if you get different results comparing ASR services or if you compare services in other languages!

How do other Speech Recognition Services compare to Speechmatics?

Try out Speechmatics in Auphonic

Conclusion

Recent entries

Auphonic Blog

Newsletter

Stay in touch