Features

Audio examples

Speech2Text & Automatic Shownotes

Please click inside the waveform to zoom and scroll through the audio - each example is divided into multiple segments and annotated with details about the algorithms. We recommend listening with headphones so you can hear all the details!

Speech recognition example with automatic shownotes and chapters

Example 1 is the Lex Fridman Podcast #367 "Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI" (male, English) – link to the editable transcript and the autogenerated summaries, tags, and chapters: HTML Transcript Editor.
In addition to the speech recognition algorithm, this podcast is also processed by our Automatic Shownotes and Chapters Algorithm, which automatically summarizes and structures the content to create shownotes and chapters. All autogenerated data for this podcast, can also be found below the audio player.

Try to navigate within the audio file in the player, find the autogenerated chapters above the audio player and search for Elon Musk, Trump, etc.

Example 1, English speech recognition with autogenerated shownotes and chapters:

Automatic generated chapters and summaries in multiple levels of detail:

Click here for Chapters

Chapters:

0:00:00 OpenAI's Early Days and Facing Mockery
0:04:36 GPT-4 as an Early AI System with Great Potential
0:08:11 The Science of Human Guidance and Incorporating Feedback
0:11:34 The Value and Utility of AI Models to Society
0:15:01 GPT-4's Reasoning Capability and Struggles
0:18:15 Discovering Strengths and Weaknesses through Public Development
0:21:20 Nuanced Hypotheses on COVID Virus Leak
0:25:00 RLHF Technique and Making GPT-4 Safer and More Aligned
0:28:02 Writing Great Prompts and Collaborating with GPT-4
0:31:30 The System Card and AI Safety Considerations
0:35:02 Ideal Scenario: Global Deliberation on AI Boundaries
0:38:55 Adapting to Egregiously Dumb Answers and Clickbait Journalism
0:42:03 Treating users like adults and exploring conspiracy theories
0:46:29 The role of size and parameters in AI systems
0:50:20 AI as an extension of human will and amplifying abilities
0:54:26 The Positive Trajectories and Potential Risks of AI
0:57:48 The Importance of Technical Alignment Work and New Tools
1:01:14 Contemplating GPT-4 as an AGI
1:02:48 Questioning the Future of AGI in Sci-Fi Books
1:03:26 Exploring the Nature of Consciousness
1:06:34 Consciousness as an Experience, Not an Emotion
1:10:45 The Need for Safety Controls in Open Source LLMs
1:14:35 Seeking a balance between non-profit and for-profit organizations
1:18:27 Deploying AGI to allow time for adaptation and regulation
1:18:43 Concerns about power and decision-making in AGI development
1:23:12 Elon Musk's early struggles and criticism in SpaceX
1:27:05 Craving Human Connection Outside the AI Bubble
1:30:22 Balancing External Pressures and the Role of OpenAI CEO
1:32:58 Disconnect from reality and impact of AGI
1:36:01 Nervousness increasing while using GPT language models
1:39:06 Moving towards better jobs and redefining work
1:48:26 Embracing the Dark Places, Finding Light in Humor
1:57:59 The Process of Shipping AI-based Products
2:05:09 Life Trajectory and Y Combinator's Influence
2:06:03 Incentive misalignment and response to bank crisis
2:09:07 Hope in economic shift and potential of AGI
2:13:19 Excitement about interactive GPT-4 powered pets and conversations
2:16:25 Confusion and Division in the Face of Technological Advancement

Click here for Long Summary

Long Summary:

In this discussion with Evan Murakawa, we explore the journey of OpenAI from facing skepticism to achieving remarkable advancements in AI, notably with the development of GPT-4. The conversation delves into the complexities, possibilities, and challenges in AI development, emphasizing the importance of ongoing dialogue surrounding ethics, alignment, and power dynamics in AI technology. We touch on the evolution of AI reasoning, biases in models, and the role of human feedback in enhancing usability and alignment, showcasing the nuanced responses and reasoning capabilities of models like GPT-4 that bridge the gap between human knowledge and AI intelligence.

The interviewee sheds light on the significance of AI safety considerations in the release of GPT-4, highlighting efforts to balance alignment and capability progress while emphasizing user steerability and safety measures like the RLHF approach. We delve into the ethical implications of AI advancements, the evolution from GPT-3 to GPT-4, and the importance of transparent and responsible AI development amid societal values and ethical considerations. The interviewee reflects on past misconceptions in media discussions about AI advancements, stressing the need for context and understanding in shaping perceptions about AI technology.

Our conversation extends to the intricacies of producing data like GPT-4, focusing on compressing humanity's text output and debating the role of large language models in achieving general intelligence. While considering critiques from figures like Noam Chomsky on the path to AGI, we emphasize the potential for AI to enhance societal well-being and human capabilities, contemplating the concept of AI consciousness, assessing it, and highlighting the necessity for continual research and improvement to ensure safe AI development in an evolving landscape.

We further explore the caution, transparency, and responsibilities involved in developing AGI to mitigate potential risks associated with super-intelligent systems. The dialogue dives into OpenAI's evolution, structural changes, interactions with key figures like Elon Musk, and navigating the complexities of power and responsibility in shaping the future of AI technology. Lastly, we reflect on the impact of AGI on society, user-centric company initiatives, uncertainties surrounding AI, shifts in economic and political systems with AI integration, and addressing challenges like truth, misinformation, censorship, hate speech, security, and ethical dilemmas posed by advanced AI models like GPT-4.

Click here for Brief Summary

Brief Summary:

Join me in a conversation with Evan Murakawa as we explore OpenAI's journey to GPT-4 and discuss ethics, alignment, and power dynamics in AI development. We touch on AI reasoning, biases, and human feedback, emphasizing the importance of ongoing dialogue. Reflecting on AI safety, ethics, and responsible development, we highlight the societal impact of AGI and the challenges posed by advanced AI models like GPT-4.

Click here for Tags

Tags:

conversation, Evan Murakawa, OpenAI, GPT-4, ethics, alignment, power dynamics, AI reasoning, biases, AGI, societal impact

Multitrack speech recognition example

The second example is a multitrack automatic speech recognition transcript from the first 20 minutes of TV Eye on Marvel - Luke Cage S1E1 – link to the generated transcript with editor: HTML Transcript Editor.
As this is a multitrack production, the transcript and audio player include exact speaker names as well.
You can also see that the recognition quality drops if multiple speakers are active at the same time – for example at 01:04.

Example 2, English multitrack speech recognition:

German speech recognition example

As a reminder that our integrated services are not limited to English speech recognition, the third example is in German. All features demonstrated in the previous two examples also work in over 80 languages.
Here we use automatic speech recognition to transcribe radio news from Deutschlandfunk (Deutschlandfunk Nachrichten vom 11. Oktober 2016, 15:00) – link to the generated transcript with editor: HTML Transcript Editor.
As official newsreaders are speaking very structured and clearly, the recognition quality is also very high: try to search for Merkel, Putin, etc.

Example 3, German speech recognition:

← Back to the Feature Overview