Audio examples

Speech2Text & Automatic Shownotes

Please click inside the waveform to zoom and scroll through the audio - each example is divided into multiple segments and annotated with details about the algorithms. We recommend listening with headphones so you can hear all the details!

Speech recognition example with automatic shownotes and chapters

Example 1 is the Lex Fridman Podcast #367 "Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI" (male, English) – link to the editable transcript and the autogenerated summaries, tags, and chapters: HTML Transcript Editor.
In addition to the speech recognition algorithm, this podcast is also processed by our Automatic Shownotes and Chapters Algorithm, which automatically summarizes and structures the content to create shownotes and chapters. All autogenerated data for this podcast, can also be found below the audio player.

Try to navigate within the audio file in the player, find the autogenerated chapters above the audio player and search for Elon Musk, Trump, etc.

Example 1, English speech recognition with autogenerated shownotes and chapters:

Automatic generated chapters and summaries in multiple levels of detail:

Click here for Chapters

Chapters:

0:00:00 Misunderstood and mocked org at the start
0:04:37 GPT-4 and the Continual Exponential Progress of AI
0:08:34 Human Feedback and Enormity of Pre-Training Data Set
0:10:06 Creation of GPT-4: Problem Solving and Pipeline
0:12:10 Measuring the Value of AI Models: Evals
0:14:08 GPT-4: From Facts to Wisdom and Reasoning Capabilities
0:17:50 Challenges in Generating Text with AI Models
0:20:49 Nuance and Individualism in Fascism
0:22:35 The Importance of AI Safety Considerations
0:25:00 Alignment and Capability are Close Vectors
0:26:54 Introducing GPT-4's "System Message" for Steerability
0:29:23 How GPT-4 and AI Language Models Will Change Programming
0:31:31 The Importance of AI Safety in OpenAI's Release
0:33:35 Aligning AI to Human Values and Preferences
0:35:42 Building a System with Customized Versions for Different Countries and Users
0:38:22 Adapting to the Egregiously Dumb Answers of GPT
0:39:57 Listening to Criticism and Handling Clickbait Headlines
0:42:03 Balancing User Control and System Guidance
0:47:00 Importance of Performance over Elegance in OpenAI
0:48:56 The Potential of GPT to Achieve Scientific Breakthroughs
0:52:05 Anxiety and Excitement of Programmers with GPT
0:54:04 Utopic Tech Bro: AI's Potential to Transform Our Lives
0:55:58 Limiting one-shot-to-get-it-right scenarios
0:58:06 Concerns about AI Takeoff Speed
1:00:01 Safe Quadrant for AGI Takeoff and Optimizing for Impact
1:02:32 The Definition of AGI Matters
1:04:30 What Would Conscious AI Look and Behave Like?
1:06:07 AI Model can Understand the Subjective Experience of Consciousness
1:07:54 Testing Emotion versus Facts
1:11:05 Prioritizing Safety under Pressure
1:12:38 OpenAI's Unusual Structure: Resisting Projects and Being Misunderstood
1:14:36 Non-Profit vs For-Profit: Pros and Cons
1:16:21 Collaboration to Minimize Scary Downsides
1:18:44 Distributing Power in OpenAI
1:22:25 Agreement and Disagreement with Elon Musk on AGI
1:23:59 Importance of Appreciating Hard Work
1:25:33 GPT Bias and Wokeness
1:27:05 Living in Bubbles and the Need to Connect with Users
1:31:36 Concerns of GPT's Increasing Intelligence
1:33:29 User-Centric Company: Talking to Users in Different Contexts
1:37:22 GPT Language Models and Replacing Jobs
1:40:11 WorldCoin and Universal Basic Income Study
1:42:41 Democratic Socialism and Resource Reallocation
1:44:26 The Hypothetical of Super Intelligent AGI and Centralized Planning
1:46:11 The Possibility of an Off Switch for AI Systems
1:47:55 Testing Dark Theories of the World
1:49:38 Epistemic Humility: The Terrifying Quest for Truth
1:51:42 Truth as a Collective Intelligence
1:54:06 Harmful Truths and Scientific Work
1:55:54 Preventing Hacking and Jailbreaking of GPT-4
1:59:08 Autonomy and Authority: The Key to High-Velocity Shipping
2:00:38 Approval of Every Single Hire at OpenAI
2:02:15 How Control Provisions Help to Develop AI Without Capitalist Imperative
2:03:46 Elon Musk: A Super Visionary
2:05:29 Satya Nadella's Thoughts on the Silicon Valley Bank Fiasco.
2:07:02 Full Deposit Guarantee to Avoid Doubt in Banks
2:09:11 The Upside of AGI and the Need to Deploy Early
2:11:20 Drawing Lines Between Tools and Creatures
2:13:20 The Importance of GPT-4's Conversational Style
2:15:36 Preparing for AGI
2:18:06 Advice for Young People and the Danger of Listening to Advice
2:19:36 Finding Happiness and Impact
2:22:20 Challenges and Progress of OpenAI

Click here for Long Summary

Long Summary:

The hosts of the podcast have a wide-ranging conversation covering multiple aspects of AI and its impact on society. Initially, they discuss the challenges faced by OpenAI in the beginning when AGI was not taken seriously, and the importance of having conversations about power, safety, and human alignment. They delve into the magic ingredient of reinforcement learning with human feedback that makes models like ChatGPT more useful. They discuss GPT-4 and how it is an early system that is slow and buggy, but points to something important in the evolution of AI. The hosts believe that building this technology in public is important as it allows for collaboration and feedback.

They touch upon the mystery of the compressed human knowledge, the difference between facts and wisdom, and whether GPT-4 may possess wisdom. They talk about the potential of AI, specifically GPT-10, and their excitement for it being a helpful tool in amplifying human abilities. The hosts also discuss the potential dangers of AGI taking over jobs, especially customer service jobs, and the implications of technological revolution on the dignity of work.

They discuss the potential pressure and biases that may arise in the development of AI technology. They also talk about the importance of feedback for improving their work, concerns over outside sources' influence, and the need for society to have input in the decision-making process. They explore the possibility of a super intelligent AGI being better than a liberal democratic system of multiple AGIs and the importance of humility in AI.

The hosts discuss the success and process of shipping AI-based products at OpenAI, the recent issue with Silicon Valley Bank, and the importance of deploying weak systems early. They also talk about their interest in GPT-4 powered pets and robots and speculate on the kind of interactions they would like to have with AGI. Quoting Alan Turing, they express excitement for what human civilization will accomplish in the future.

Click here for Brief Summary

Brief Summary:

The hosts of the podcast have a conversation about AI and its impact on society. They discuss the challenges that OpenAI faced in the beginning and the importance of conversations around power, safety, and human alignment. They delve into the magic ingredient of reinforcement learning with human feedback that makes models like ChatGPT more useful. They also discuss the potential of GPT-4 and GPT-10 and the dangers of AGI taking over jobs.

The hosts talk about the potential pressures and biases that may arise in the development of AI technology, the importance of feedback, and society's need to have input in the decision-making process. They explore the possibility of a super intelligent AGI being better than a liberal democratic system of multiple AGIs and the importance of humility in AI. They also talk about their interest in GPT-4 powered pets and robots and speculate on the kind of interactions they would like to have with AGI.

Click here for Subtitle Summary

Subtitle Summary:

Podcast hosts discuss OpenAI's challenges and AI's societal impact. They explore reinforcement learning, potential of GPT-4, and danger of AGI taking over jobs. They emphasize societal input and interest in GPT-4 powered pets and robots.

Click here for Tags

Tags:

podcast, AI, OpenAI, power, safety, human alignment, reinforcement learning, ChatGPT, GPT-4, GPT-10, AGI, jobs, biases, feedback, decision-making, super intelligence, liberal democratic system, humility, pets, robots, interactions

Multitrack speech recognition example

The second example is a multitrack automatic speech recognition transcript from the first 20 minutes of TV Eye on Marvel - Luke Cage S1E1 – link to the generated transcript with editor: HTML Transcript Editor.
As this is a multitrack production, the transcript and audio player include exact speaker names as well.
You can also see that the recognition quality drops if multiple speakers are active at the same time – for example at 01:04.

Example 2, English multitrack speech recognition:

German speech recognition example

As a reminder that our integrated services are not limited to English speech recognition, the third example is in German. All features demonstrated in the previous two examples also work in over 80 languages.
Here we use automatic speech recognition to transcribe radio news from Deutschlandfunk (Deutschlandfunk Nachrichten vom 11. Oktober 2016, 15:00) – link to the generated transcript with editor: HTML Transcript Editor.
As official newsreaders are speaking very structured and clearly, the recognition quality is also very high: try to search for Merkel, Putin, etc.

Example 3, German speech recognition: