In recent years, the listening time required by a piece of AI to clone someone’s voice has been getting shorter and shorter.
It used to be minutes, now it’s just seconds.
OpenAI, the Microsoft-backed company behind the viral generative AI chatbot ChatGPT, recently revealed that its own voice-cloning technology requires just 15 seconds of audio material to reproduce someone’s voice.
In a post on its website, OpenAI shared a small-scale preview of a model called Voice Engine, which it’s been developing since late 2022.
Voice Engine works by feeding it a minimum of 15 seconds of spoken material. The user can then input text to create what OpenAI describes as “emotive and realistic” speech that “closely resembles the original speaker.”
OpenAI insists it is taking a “cautious and informed approach to a broader release due to the potential for synthetic voice misuse,” adding that it wants to “start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities.”
It added: “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”
One of the misuses that OpenAI refers to is a scam that some criminals are already carrying out using similar technology that’s been publicly available for some time. It involves cloning a voice and then calling a friend or relative of that person to trick them into handing over cash via a bank transfer. There are also fears about how such technology might be used in the upcoming presidential election, an issue highlighted by a recent high-profile incident in which a robocall using a clone of President Joe Biden’s voice told people not to vote in January’s New Hampshire primary.
Another concern is how the rapidly improving technology will impact the livelihoods of voice actors who fear that they’ll be increasingly asked to sign over the rights to their voice so that AI can be used to create a synthetic version, with compensation for such a contract likely to be much lower than if the actor was asked to perform the job in person.
Looking at more positive deployments of the technology, OpenAI suggests that it could be used to provide reading assistance to non-readers and children using natural-sounding, emotive voices “representing a wider range of speakers than what’s possible with preset voices,” as well as instant translation of videos and podcasts, something that Spotify is already trialing.
It could also be used to help patients who are gradually losing their voice through illness to continue communicating using what sounds like their own voice.
OpenAI has some examples of the AI-generated audio and the reference audio on its website, and we’re sure you’ll agree that they’re pretty extraordinary.
Editors’ Recommendations
- How much does an AI supercomputer cost? Try $100 billion
- Copilot: how to use Microsoft’s own version of ChatGPT
- Is ChatGPT safe? Here are the risks to consider before using it
- ChatGPT shortly devolved into an AI mess
- OpenAI’s new AI-made videos are blowing people’s minds
Not so many moons ago, Trevor moved from one tea-loving island nation that drives on the left (Britain) to another (Japan)…
The best custom GPTs to make ChatGPT even more powerful
The introduction of Custom GPTs was one of the most exciting additions to ChatGPT in recent months. These allow you to craft custom chatbots with their own instructions and data by feeding them documents, weblinks, and more to make sure they know what you need and respond how you would like them to.
But you don’t have to make your own Custom GPT if you don’t want to. Indeed, there are tens of thousands of Custom GPTs already made by engineers around the world, and many of them are very impressive.
This one image breaks ChatGPT each and every time
Sending images as prompts to ChatGPT is still a fairly new feature, but in my own testing, it works fine most of the time. However, someone’s just found an image that ChatGPT can’t seem to handle, and it’s definitely not what you expect.
The image, spotted by brandon_xyzw on X (formerly Twitter), presents some digital noise. It’s nothing special, really — just a black background with some vertical lines all over it. But if you try to show it to ChatGPT, the image breaks the chatbot each and every time, without fail.
OpenAI and Microsoft sued by NY Times for copyright infringement
The New York Times has become the first major media organization to take on AI firms in the courts, accusing OpenAI and its backer, Microsoft, of infringing its copyright by using its content to train AI-powered products such as OpenAI’s ChatGPT.
In a lawsuit filed in Federal District Court in Manhattan, the media giant claims that “millions” of its copyrighted articles were used to train its AI technologies, enabling it to compete with the New York Times as a content provider.
[ For more curated Computing news, check out the main news page here]
The post OpenAI needs 15 seconds of audio for its AI to clone a voice | Digital Trends first appeared on www.digitaltrends.com