Oct 8, 2025
Did you know you can wish for a song that doesn’t exist, and it will materialize out of thin air? Google researchers have developed an AI, called MusicLM, that takes text prompts as input and yields minutes-long musical pieces catered to the specifications mentioned in the prompts.
Did you know you can wish for a song that doesn’t exist, and it will materialize out of thin air? Google researchers have developed an AI, called MusicLM, that takes text prompts as input and yields minutes-long musical pieces catered to the specifications mentioned in the prompts. This is quite similar to how DALL-E generates images from text prompts.
MusicLM is like other Large Language Models (LLMs), and uses Deep Learning and Natural Language Processing. It analyzes and finds hidden representations in its training dataset of music-text pairs shared by human experts, to generate music. That’s not all – not only can it write a music score, it can recommend new chords on existing music or create a brand new instrumental sound.
If you’re like me, and wondering whether you need to convert your music to a musical notation in order to generate music from MusicLM, this is not the case. Simply feed a raw input audio into the AI model, that is converted into a series of discrete tokens for analysis, and then used to produce new audio sequences. MusicLM is built on top of AudioLM, another Google project, which uses two tokenizers to extract information from the musical sequence –
Once the audio signal is tokenized, MuLan performs a joint embedding on music and text using a technique called constrastive loss. The joint embedding output is passed onto the next 3 stages:
MusicLM is quite flexible in its behaviour. It can act on any paragraph-long descriptions that refer to a vibe, genre or specific instruments to be included, or even short phrases like “melodic metal”. There is a story mode where the model morphs between prompts like this
resulted in a generated audio that you can listen here. Or check out the dulcet tones of Blob Opera built using MusicLM here.
Google has been cautious with the model, and has not released it to the general public, as similar generative AI technologies like Stable Diffusion and Midjourney have been charged with “misappropriation of creative content” for violating copyright law by scraping artists’ work without their consent. The AI programming model, CoPilot, developed jointly by Microsoft, GitHub and OpenAI is also being sued in a similar case. As the rise of generative AI accelerates, being cautious of the pitfalls such as data privacy issues is always the right thing to do.