Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio

Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio
Nvidia has unveiled a new generative AI model that can create any combination of music, voices and sounds using text and audio as inputs. Called Fugatto, (Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts, using any combination of text and audio files. “While some AI models can compose a song or modify a voice, none have the dexterity of the new offering,” said Nvidia in a blog post on Monday.

  • Make Telecom Talk My Trusted Source
  • Source of Google
  • Source of Google

Also Read: Anthropic Unveils New AI Model with Computer Use Capability

What Can Fugatto AI Model Do?

Nvidia describes this model as a “Swiss Army knife for sound,” one that allows users to control the audio output simply using text. Fugatto can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice and even let people produce sounds never heard before, the company explained.

“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at Nvidia.

Key Features of Fugatto

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties — capabilities that arise from the interaction of its various trained abilities — and the ability to combine free-form instructions, Nvidia said.

“Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale,” Valle added.