Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio

Nvidia said Fugatto is a foundational generative transformer model that builds on prior work in areas such as speech modeling, audio vocoding and audio understanding.

Highlights

  • It allows users to manipulate sound output with just text input.
  • Music producers, ad agencies, language tools, and game developers can all benefit from its capabilities.
  • Fugatto is powered by Nvidia's H100 GPUs and a global team of researchers.

Follow Us

Nvidia Unveils New AI Model Fugatto That Generates Audio from Text and Audio
Nvidia has unveiled a new generative AI model that can create any combination of music, voices and sounds using text and audio as inputs. Called Fugatto, (Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts, using any combination of text and audio files. "While some AI models can compose a song or modify a voice, none have the dexterity of the new offering," said Nvidia in a blog post on Monday.

Also Read: Anthropic Unveils New AI Model with Computer Use Capability




What Can Fugatto AI Model Do?

Nvidia describes this model as a "Swiss Army knife for sound," one that allows users to control the audio output simply using text. Fugatto can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice and even let people produce sounds never heard before, the company explained.

"We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, a manager of applied audio research at Nvidia.

Key Features of Fugatto

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties — capabilities that arise from the interaction of its various trained abilities — and the ability to combine free-form instructions, Nvidia said.

"Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale," Valle added.

Also Read: Microsoft Launches Industry-Specific AI Models to Drive Business Transformation

Potential Use Cases for Fugatto AI

According to Nvidia, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.

An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.

Additionally, Nvidia says language learning tools could be personalised to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.

Video game developers could use the AI model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets easily from text instructions and optional audio inputs.

Also Read: Microsoft Announces New AI Models and Solutions for Healthcare

The Technology Behind Fugatto

Nvidia said Fugatto is a foundational generative transformer model that builds on prior work in areas such as speech modeling, audio vocoding and audio understanding. Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. "Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger," said the company.

The full version used 2.5 billion parameters and was trained on a bank of Nvidia DGX systems, equipped with 32 Nvidia H100 Tensor Core GPUs.

Reported By

Kirpa B is passionate about the latest advancements in Artificial Intelligence technologies and has a keen interest in telecom. In her free time, she enjoys gardening or diving into insightful articles on AI.

Recent Comments

Gurpreet Singh :

I used this plan in past and Rs.10 topup will allow of sms. They deduct Rs.1.20/sms to 1900. Also I…

BSNL Voice and SMS Only Plans Listed

Gurpreet Singh :

Which circle are you from? In other circles we haven't received any sms yet. So they will withdraw circle by…

BSNL Voice and SMS Only Plans Listed

Faraz :

BSNL 4G aa hi jayega magar..Aahista aahista (music)

BSNL 5G: 3 Companies Bid to Rollout 5G SA in…

Sujata :

bhai let them first end their own problems, that should be their top most priority as of short term, and…

Vodafone Idea's 2024 Milestones: 4G Network, Enterprise Solutions and Growth

Sujata :

But first let them introduce 24hrs unlimited 4g to all the circles. Odisha too have 4g on b3 15 mhz…

Vodafone Idea's 2024 Milestones: 4G Network, Enterprise Solutions and Growth

Load More
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments