We propose SongBloom, a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. SongBloom employs an autoregressive diffusion model that combines the high fidelity of diffusion models with the scalability of language models. Specifically, it gradually extends a musical sketch from short to long and refines the details from coarse to fine-grained. The interleaved generation paradigm effectively integrates prior semantic and acoustic context to guide the generation process. Experimental results demonstrate that SongBloom outperforms existing methods across both subjective and objective metrics and achieves performance comparable to the state-of-the-art commercial music generation platforms.
Demo page: https://cypress-yang.github.io/SongBloom_demo
ArXiv: https://arxiv.org/abs/2506.07634
A .jsonl file, where each line is a json object:
One example can be refered to as: example/test.jsonl
The prompt wav should be a 10-second, 48kHz audio clip.
The details about lyric format can be found in docs/lyric_format.md.
- Support Text Description
SongBloom (codes and weights) is released under the Apache License 2.0.
.png)

