Back in late 2016, we introduced Speech Recognition at Auphonic. This allows our users to create transcripts of their recordings, and more usefully, this means podcasts become searchable.
Now we integrated two more speech recognition engines: Amazon Transcribe and Speechmatics. Whilst integrating these services, we also took the opportunity to develop a complete new Transcription Editor:

Screenshot of our Transcript Editor with word confidence highlighting and the edit bar.
Try out the Transcript Editor Examples yourself!


The new Auphonic Transcript Editor is included directly in our HTML transcript output file, displays word confidence values to instantly see which sections should be checked manually, supports direct audio playback, HTML/PDF/WebVTT export and allows you to share the editor with someone else for further editing.

The new services, Amazon Transcribe and Speechmatics, offer transcription quality improvements compared to our other integrated speech recognition services.
They also return word confidence values, timestamps and some punctuation, which is exported to our output files.

The Auphonic Transcript Editor

With the integration of the two new services offering improved recognition quality and word timestamps alongside confidence scores, we realized that we could leverage these improvements to give our users easy-to-use transcription editing.
Therefore we developed a new, open source transcript editor, which is embedded directly in our HTML output file and has been designed to make checking and editing transcripts as easy as possible.

Main features of our transcript editor:
  • Edit the transcription directly in the HTML document.
  • Show/hide word confidence, to instantly see which sections should be checked manually (if you use Amazon Transcribe or Speechmatics as speech recognition engine).
  • Listen to audio playback of specific words directly in the HTML editor.
  • Share the transcript editor with others: as the editor is embedded directly in the HTML file (no external dependencies), you can just send the HTML file to some else to manually check the automatically generated transcription.
  • Export the edited transcript to HTML, PDF or WebVTT.
  • Completely useable on all mobile devices and desktop browsers.

Examples: Try Out the Transcript Editor

Here are two examples of the new transcript editor, taken from our speech recognition audio examples page:

1. Singletrack Transcript Editor Example
Singletrack speech recognition example from the first 10 minutes of Common Sense 309 by Dan Carlin. Speechmatics was used as speech recognition engine without any keywords or further manual editing.
2. Multitrack Transcript Editor Example
A multitrack automatic speech recognition transcript example from the first 20 minutes of TV Eye on Marvel - Luke Cage S1E1. Amazon Transcribe was used as speech recognition engine without any further manual editing.
As this is a multitrack production, the transcript includes exact speaker names as well (try to edit them!).

Transcript Editing

By clicking the Edit Transcript button, a dashed box appears around the text. This indicates that the text is now freely editable on this page. Your changes can be saved by using one of the export options (see below).
If you make a mistake whilst editing, you can simply use the undo/redo function of the browser to undo or redo your changes.


When working with multitrack productions, another helpful feature is the ability to change all speaker names at once throughout the whole transcript just by editing one speaker. Simply click on an instance of a speaker title and change it to the appropriate name, this name will then appear throughout the whole transcript.

Word Confidence Highlighting

Word confidence values are shown visually in the transcript editor, highlighted in shades of red (see screenshot above). The shade of red is dependent on the actual word confidence value: The darker the red, the lower the confidence value. This means you can instantly see which sections you should check/re-work manually to increase the accuracy.

Once you have edited the highlighted text, it will be set to white again, so it’s easy to see which sections still require editing.
Use the button Add/Remove Highlighting to disable/enable word confidence highlighting.

NOTE: Word confidence values are only available in Amazon Transcribe or Speechmatics, not if you use our other integrated speech recognition services!

Audio Playback

The button Activate/Stop Play-on-click allows you to hear the audio playback of the section you click on (by clicking directly on the word in the transcript editor).
This is helpful in allowing you to check the accuracy of certain words by being able to listen to them directly whilst editing, without having to go back and try to find that section within your audio file.

If you use an External Service in your production to export the resulting audio file, we will automatically use the exported file in the transcript editor.
Otherwise we will use the output file generated by Auphonic. Please note that this file is password protected for the current Auphonic user and will be deleted in 21 days.

If no audio file is available in the transcript editor, or cannot be played because of the password protection, you will see the button Add Audio File to add a new audio file for playback.

Export Formats, Save/Share Transcript Editor

Click on the button Export... to see all export and saving/sharing options:

Save/Share Editor
The Save Editor button stores the whole transcript editor with all its current changes into a new HTML file. Use this button to save your changes for further editing or if you want to share your transcript with someone else for manual corrections (as the editor is embedded directly in the HTML file without any external dependencies).
Export HTML / Export PDF / Export WebVTT
Use one of these buttons to export the edited transcript to HTML (for WordPress, Word, etc.), to PDF (via the browser print function) or to WebVTT (so that the edited transcript can be used as subtitles or imported in web audio players of the Podlove Publisher or Podigee).
Every export format is rendered directly in the browser, no server needed.

Amazon Transcribe

The first of the two new services, Amazon Transcribe, offers accurate transcriptions in English and Spanish at low costs, including keywords, word confidence, timestamps, and punctuation.

UPDATE 2019:
Amazon Transcribe offers more languages now - please see Amazon Transcribe Features!

Pricing
The free tier offers 60 minutes of free usage a month for 12 months. After that, it is billed monthly at a rate of $0.0004 per second ($1.44/h).
More information is available at Amazon Transcribe Pricing.
Custom Vocabulary (Keywords) Support
Custom Vocabulary (called Keywords in Auphonic) gives you the ability to expand and customize the speech recognition vocabulary, specific to your case (i.e. product names, domain-specific terminology, or names of individuals).
The same feature is also available in the Google Cloud Speech API.
Timestamps, Word Confidence, and Punctuation
Amazon Transcribe returns a timestamp and confidence value for each word so that you can easily locate the audio in the original recording by searching for the text.
It also adds some punctuation, which is combined with our own punctuation and formatting automatically.

The high-quality (especially in combination with keywords) and low costs of Amazon Transcribe make it attractive, despite only currently supporting two languages.
However, the processing time of Amazon Transcribe is much slower compared to all our other integrated services!

Try it yourself:
Connect your Auphonic account with Amazon Transcribe at our External Services Page.

Speechmatics

Speechmatics offers accurate transcriptions in many languages including word confidence values, timestamps, and punctuation.

Many Languages
Speechmatics’ clear advantage is the sheer number of languages it supports (all major European and some Asiatic languages).
It also has a Global English feature, which supports different English accents during transcription.
Timestamps, Word Confidence, and Punctuation
Like Amazon, Speechmatics creates timestamps, word confidence values, and punctuation.
Pricing
Speechmatics is the most expensive speech recognition service at Auphonic.
Pricing starts at £0.06 per minute of audio and can be purchased in blocks of £10 or £100. This equates to a starting rate of about $4.78/h. Reduced rate of £0.05 per minute ($3.98/h) are available if purchasing £1,000 blocks.
They offer significant discounts for users requiring higher volumes. At this further reduced price point it is a similar cost to the Google Speech API (or lower). If you process a lot of content, you should contact them directly at sales@speechmatics.com and say that you wish to use it with Auphonic.
More information is available at Speechmatics Pricing.

Speechmatics offers high-quality transcripts in many languages. But these features do come at a price, it is the most expensive speech recognition services at Auphonic.

Unfortunately, their existing Custom Dictionary (keywords) feature, which would further improve the results, is not available in the Speechmatics API yet.

Try it yourself:
Connect your Auphonic account with Speechmatics at our External Services Page.

What do you think?

Any feedback about the new speech recognition services, especially about the recognition quality in various languages, is highly appreciated.

We would also like to hear any comments you have on the transcript editor particularly - is there anything missing, or anything that could be implemented better?
Please let us know!



Recent entries