Last March, OpenAI introduced a new AI tool called Voice Engine. It can mimic a person’s voice using just 15 seconds of their speech. Almost a year later, it’s still in a testing phase, and there’s no word on when, or if, it will be officially launched.
OpenAI seems cautious about fully releasing this tool. They might be worried about how it could be misused, or they could be looking to avoid regulatory issues. The company has faced criticism in the past for prioritizing flashy products over safety, often rushing to compete with others in the industry.
According to an OpenAI representative, the company is testing Voice Engine with a select group of “trusted partners.”
“We’re learning from how our partners are using the technology to improve its usefulness and safety,” the spokesperson shared. The tool has shown promise in areas like speech therapy, language learning, customer support, video games, and AI avatars.
Pushed back
Voice Engine powers the voices in OpenAI’s text-to-speech API and ChatGPT’s Voice Mode, delivering speech that sounds very natural. This tool can turn text into speech but comes with some content limitations. From the beginning, its release has faced delays.
In a blog post from June 2024, OpenAI explained how the Voice Engine works. It predicts what sounds a speaker might make based on a text transcript, considering various voices, accents, and speaking styles. This allows the model to generate not just speech from text, but also to create different “spoken utterances” depending on the speaker.
OpenAI originally planned to make Voice Engine, then called Custom Voices, available on March 7, 2024. They intended to provide access to around 100 selected developers first, especially those focused on beneficial and innovative applications. The pricing was even set at $15 for standard voices and $30 for high-definition voices.
However, at the last minute, OpenAI delayed the announcement. A few weeks later, they revealed Voice Engine, but there was no option to sign up. Only a small group of about 10 developers, who began working with OpenAI in late 2023, were granted access.
In their announcement, OpenAI mentioned wanting to discuss the responsible use of synthetic voices and how society can adapt to these advancements. They stated that, based on these discussions and testing outcomes, they would decide on a wider launch.
Long in the works
Voice Engine has been under development since 2022. OpenAI claimed they showcased the tool to high-level global policymakers in summer 2023, demonstrating both its potential and risks.
Several partners currently use Voice Engine, including Livox, a startup focused on helping people with disabilities communicate more easily. CEO Carlos Pereira noted that while they couldn’t integrate Voice Engine into a product due to its online requirement (many users lack internet access), the technology is impressive.
“The voice quality and ability to speak in various languages is unique,” Pereira said. “It’s one of the best tools for creating voices that I’ve encountered. We hope OpenAI creates an offline version soon.”
So far, Livox hasn’t been charged for using the tool, and Pereira hasn’t received any updates on when Voice Engine will be released more broadly.
In that June 2024 blog post, OpenAI hinted that they delayed Voice Engine partly due to concerns about potential misuse during the 2023 U.S. election. The model has safety features, including watermarking audio to trace its origin.
Developers must get clear consent from original speakers before using Voice Engine, and they need to inform audiences that the voices are generated by AI. However, OpenAI hasn’t detailed how they will enforce these rules, as monitoring this on a large scale could be quite complex.
OpenAI also suggested plans for a voice authentication system to verify speakers and a list to block the creation of voices too similar to famous individuals, but these projects are challenging. If mishandled, they could harm OpenAI’s reputation, especially given past criticisms about safety.
As voice cloning technology becomes more common, effective filtering and identity verification are essential. In fact, voice cloning was among the fastest-growing scams in 2024. It has led to various types of fraud and even bypassed security checks due to privacy and copyright issues. Malicious uses of voice cloning have created deepfakes of celebrities and politicians, widely shared on social media.
OpenAI may release Voice Engine soon or may choose to keep it limited. Their ongoing cautious approach to this tool has resulted in one of the longest previews in the company’s history.
Check out this related article: Exciting New Leak Reveals Apple’s Foldable iPhone Details – Start Saving Now!
Source link