Stable Diffusion 3.5 Flash

2 hours ago 2

SketchX, University of Surrey Stability AI

We present SD3.5-Flash, a few-step distillation framework that enables high-quality rectified flow generation on consumer hardware. While rectified flow models achieve exceptional quality through extensive multi-step refinement, their computational demands render advanced generative AI largely inaccessible. Our approach addresses fundamental challenges of adapting distribution matching distillation to flow models in few-step regimes, where standard re-noising processes create unstable gradients that systematically degrade quality. We introduce timestep sharing, which computes distribution objectives using intermediate trajectory samples rather than corrupted re-noised versions, providing stable gradients essential for robust few-step training. Our split-timestep fine-tuning technique resolves the capacity-quality tradeoff by temporarily expanding model capacity during training through specialized timestep branches. Combined with comprehensive pipeline optimizations including text encoder restructuring and intelligent quantization, our system generates high-resolution images in under one second while operating within 8GB memory constraints. Through extensive evaluation including large-scale user studies, we demonstrate that SD3.5-Flash consistently outperforms existing few-step methods while maintaining teacher model quality standards, making advanced generative AI truly accessible for practical deployment.

Performance Comparison

Quantization Comparison

We introduce the SD3.5-Flash suite of models, preferred by users over all other models at a variety of consumer compute budgets while offering comparable latency and memory requirements. Bubble size indicates VRAM occupied and pipeline size on disk for gpus and mobile devices respectively. We compute ELO ratings by assessing generated image quality via human rankings for different models.

Inference Latency

Inference Latency

Comparing inference latency of SD3.5-Flash models for different devices with VRAM / unified memory below device names.

User Studies

Human Evaluation Results

Results from user studies (124 annotators, 507 prompts, 4 seeds) comparing SD3.5-Flash models against other state-of-the-art few-step generation methods. Results show consistent preference for SD3.5-Flash across different evaluation criteria.

On-Device Demo

Real time iPhone (A17) demo for 512px image generation. Demonstrates the speed and efficiency of SD3.5-Flash on mobile hardware with live screen recording.

API Usage

import getpass # To get your API key visit https://platform.stability.ai/account/keys STABILITY_KEY = getpass.getpass('Enter your API Key') def send_generation_request(host, params): headers = { "Accept": "image/*", "Authorization": f"Bearer {STABILITY_KEY}" } files = {} image = params.pop("image", None) mask = params.pop("mask", None) if image is not None and image != '': files["image"] = open(image, 'rb') if mask is not None and mask != '': files["mask"] = open(mask, 'rb') if len(files)==0: files["none"] = '' response = requests.post(host, headers=headers, files=files, data=params) if not response.ok: raise Exception(f"HTTP {response.status_code}: {response.text}") return response # Usage example response = send_generation_request( "https://api.stability.ai/v2beta/stable-image/generate/sd3", { "prompt": "towering storm clouds over the ocean at sunset", "aspect_ratio": "1:1", "model": "sd3.5-flash" } ) if response.headers.get("finish-reason") == 'CONTENT_FILTERED': raise Warning("Generation failed NSFW classifier") with open("generated.jpg", "wb") as f: f.write(response.content)

Qualitative Comparisons

Qualitative Comparison 1

Qualitative Comparison 2

BibTeX

@misc{bandyopadhyay2025sd35flash, title={SD3.5-Flash: Distribution-Guided Distillation of Generative Flows}, author={Hmrishav Bandyopadhyay and Rahim Entezari and Jim Scott and Reshinth Adithyan and Yi-Zhe Song and Varun Jampani}, year={2025}, eprint={2509.21318}, archivePrefix={arXiv} }
Read Entire Article