Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio

Nvidia said Fugatto is a foundational generative transformer model that builds on prior work in areas such as speech modeling, audio vocoding and audio understanding.

Highlights

  • It allows users to manipulate sound output with just text input.
  • Music producers, ad agencies, language tools, and game developers can all benefit from its capabilities.
  • Fugatto is powered by Nvidia's H100 GPUs and a global team of researchers.

Follow Us

Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio
Nvidia has unveiled a new generative AI model that can create any combination of music, voices and sounds using text and audio as inputs. Called Fugatto, (Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts, using any combination of text and audio files. "While some AI models can compose a song or modify a voice, none have the dexterity of the new offering," said Nvidia in a blog post on Monday.

Also Read: Anthropic Unveils New AI Model with Computer Use Capability




What Can Fugatto AI Model Do?

Nvidia describes this model as a "Swiss Army knife for sound," one that allows users to control the audio output simply using text. Fugatto can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice and even let people produce sounds never heard before, the company explained.

"We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, a manager of applied audio research at Nvidia.

Key Features of Fugatto

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties — capabilities that arise from the interaction of its various trained abilities — and the ability to combine free-form instructions, Nvidia said.

"Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale," Valle added.

Also Read: Microsoft Launches Industry-Specific AI Models to Drive Business Transformation

Potential Use Cases for Fugatto AI

According to Nvidia, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.

An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.

Additionally, Nvidia says language learning tools could be personalised to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.

Video game developers could use the AI model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets easily from text instructions and optional audio inputs.

Also Read: Microsoft Announces New AI Models and Solutions for Healthcare

The Technology Behind Fugatto

Nvidia said Fugatto is a foundational generative transformer model that builds on prior work in areas such as speech modeling, audio vocoding and audio understanding. Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. "Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger," said the company.

The full version used 2.5 billion parameters and was trained on a bank of Nvidia DGX systems, equipped with 32 Nvidia H100 Tensor Core GPUs.

Recent Comments

Shivraj Roy :

i really hope bsnl and mtnl gain traction

Jio, Airtel, and Vodafone Idea See Subscriber Losses, While BSNL…

Shivraj Roy :

3G came in 2009 4G came in 2015 with airtel (Jio 2016) 5G came in 2022 but 6G would be…

First 6G Deployments Are Expected in 2030: Ericsson Mobility Report

TheAndroidFreak :

Unlimited 5G data is not going anywhere. It's just it will touch 4199-4999 in next two years.

Jio, Airtel, and Vodafone Idea See Subscriber Losses, While BSNL…

Faraz :

Well after loosing in 2 metro circles, only Mumbai is remaining where Vi has good enough brand image. Rest of…

Jio, Airtel, and Vodafone Idea See Subscriber Losses, While BSNL…

TheAndroidFreak :

CAP was never needed in Mumbai. As I said previously, postpaid users hitting 160-170Mbps on just band 41. This is…

Jio, Airtel, and Vodafone Idea See Subscriber Losses, While BSNL…

Load More
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments