Generative vocal cleanup that runs 10.5x faster than real-time on iPhone 12 CPU

3 hours ago 1

A 10.4M paramater generative audio model for restoring degraded vocals in any situation that runs 10.5x faster than real-time on iPhone 12's CPU; Outperforms all open source models in subjective quality; matches commericial models on singing voice restoration.

Technical Report: Technical Report

HuggingFace Model: Hugging Face Model

Extreme Degradation Bench: Hugging Face Model


# Create a virtual environment uv venv cleanup --python=3.10 source cleanup/bin/activate uv pip install -r requirements.txt

Download the model checkpoint from HuggingFace and place it in the root directory.

wget https://huggingface.co/smulelabs/Smule-Renaissance-Small/resolve/main/smule-renaissance-small.pt
python main.py {path-to-input} -o {path-to-output} -c {path-to-checkpoint}
Read Entire Article