Elevate Your Transcription Experience: ElevenLabs Unveils Innovative Speech-to-Text Model

Admin

Elevate Your Transcription Experience: ElevenLabs Unveils Innovative Speech-to-Text Model

ElevenLabs is an AI startup that recently secured $180 million in funding. Known for its advanced audio generation, the company has now introduced a new speech-to-text model named Scribe.

With a valuation of $3.3 billion, ElevenLabs has provided speech-to-text services to various businesses using its extensive range of voice options. Now, it aims to dive into the speech detection game, competing against companies like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Whisper models.

Scribe can understand over 99 languages from the start. The model boasts impressive accuracy for more than 25 languages, achieving a word error rate below 5%. Among these are English (with a 97% accuracy), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese. Other languages fall into different accuracy categories, showing varying levels of performance.

In benchmark tests, ElevenLabs claimed that Scribe surpassed Google Gemini 2.0 Flash and Whisper Large V3 across many languages.

ywAAAAAAQABAAACAUwAOw==
Image Credits:ElevenLabs

ElevenLabs previously integrated a speech-to-text feature into its AI conversational platform, but Scribe marks the first stand-alone model. In an interview with TechCrunch, CEO Mati Staniszewski emphasized the company’s goal to refine speech detection.

“We want to better understand conversations,” Staniszewski explained. “Many believe speech-to-text is a solved issue, but the reality is that many languages struggle. We can develop better models thanks to our in-house teams who can quickly review and annotate our data.”

Scribe includes features like smart speaker diarization, which identifies who is talking, timestamps for precise subtitles, and auto-tagging for sounds, such as audience laughter. This allows customers to transcribe video content and easily add subtitles.

Currently, Scribe works only with pre-recorded audio. A real-time version is on the way, which will enable meeting transcriptions and voice note-taking.

ElevenLabs is pricing Scribe at $0.40 per hour of audio transcribed. This rate is competitive, although some competitors offer lower prices with varying features.



Source link

DeepGram,ElevenLabs,Gladia,speech to text,speechmatics