Google's Revolutionary AI MusicLM Creates Music From Text

Google's Latest AI Breakthrough Could Revolutionize the Way We Make Music

AI News Digest

Jan 31, 2023

The Gist

Google unveiled a new generative AI model called MusicLM
MusicLM creates 24 KHz musical audio from text descriptions or hummed melodies.
MusicLM can generate music for several minutes, re-create specific instruments, musical genres, time periods, and more.
Google's researchers are looking ahead to future improvements such as lyrics generation, text conditioning, and vocal quality, and modeling high-level song structure.
The creators of MusicLM acknowledge potential impacts such as copyright issues and cultural appropriation, and are holding back the code with no plans to release models at this point.

More Detal

Google researchers have unveiled a new generative AI model named MusicLM that generates 24 KHz musical audio from text descriptions and hummed melodies. MusicLM uses an AI model trained on a vast dataset of unlabeled music and captions from MusicCaps, a dataset composed of 5,521 music-text pairs. The system consists of two parts: first, it maps audio tokens to semantic tokens in captions for training, then it generates acoustic tokens from user captions or input audio.

MusicLM outperforms previous AI music generators in audio quality and accuracy, as demonstrated on Google's demonstration page. The examples showcase MusicLM's capabilities in creating audio from rich captions, long generation, story mode, text and melody conditioning, and matching the mood of image captions. Google also highlights MusicLM's ability to re-create specific instruments, genres, musical experience levels, places, and time periods.

However, the creators of MusicLM acknowledge potential impacts such as copyright issues and cultural appropriation and emphasize the need for more work on tackling these risks. Currently, Google has no plans to release MusicLM's models.

The researchers are already looking ahead to future improvements, including lyrics generation, text conditioning, vocal quality, and high-level song structure. These improvements will help anyone can create studio-quality music just by describing it, and we’ll see how that impacts the music industry.

The Algo Age

Discussion about this post