Films entirely created by artificial intelligence? After DALL-E and ChatGPT, Microsoft is developing an AI dedicated to sound

Films entirely created by artificial intelligence? After DALL-E and ChatGPT, Microsoft is developing an AI dedicated to sound

This new AI can imitate any voice, including its emotion and pitch after just three seconds of listening.

Why is this important?

With ChatGPT as a figurehead, which could be a revolution in the same way as the iPhone, artificial intelligence is spreading more and more. This time, Microsoft is releasing its own AI dedicated to sound: VALL-E.

News: Microsoft’s love affair with AI continues. While ChatGPT could land in the office suite, Microsoft releases its new AI: VALLEY.

The detail : On the VALL-E site, we learn that this AI was made possible thanks to “ 60,000 hours of speech in English, which is hundreds of times more than existing systems”.

  • This artificial intelligence uses a new method. Unlike other text-to-speech methods, which typically synthesize speech by manipulating waveforms, VALL-E generates discrete audio coding codes from texts and acoustic samples. It analyzes how a person speaks, breaking that information down into discrete components called “tokens”. Using EnCodec, the AI ​​uses training data to match what it “knows” about how that voice would sound if it spoke other phrases outside of the three-second sample.
  • Still according to the site, this new technology would allow “directly various voice synthesis applications, such as TTS (Text To Speech or making texts speak by computer, editor’s note), voice editing and content creation, in combination with other generative AI models like GPT-3.
  • Part of the database as well as the interpretation of VALL-E are available on their site.
General model of the operation of VALL-E – Source: VALLE-E

The challenge : AI is becoming more and more democratized in more and more areas.

  • For everything related to text, which can obviously be mixed with sound, ChatGPT is of course the current reference.
  • For the image, DALL-E already allows to generate images from text.
  • If we add the three, we could imagine a film or any audiovisual content totally made by AI. Fortunately, some applications have already been created (even if they still need to be optimized, editor’s note) to differentiate human work from that of AI in general, and ChatGPT in particular.

#Films #created #artificial #intelligence #DALLE #ChatGPT #Microsoft #developing #dedicated #sound

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Scroll to Top