Open source image generation with style codes (–sref))

2 hours ago 1

Huijie Liu1,2, Shuhao Cui1, Haoxiang Cao1,3, Shuai Ma2, Yue Yu1, Kai Wu1, †, Guoliang Kang2, †

1 Kuaishou Technology, 2 Beihang University, 3 South China Normal University

Co-Corresponding Author

cotyle 技术展示

Abstract

Innovative visual stylization is a cornerstone of artistic creation, while generating and representing novel styles remains a persistent challenge.Existing generative methods often rely on style images, lengthy textual descriptions, or parameter-efficient fine-tuning (PEFT) to guide models in generating images with specific styles. However, these methods struggle to create novel styles and require complex representations to convey stylistic information.In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a style code.To date, this field has only been explored by the industry(e.g., Midjourney), with no open-source research from the academic community.Specifically, we train a discrete style codebook to extract style embeddings from reference images. These embeddings then condition a T2I-DM to generate an image that aligns with the reference style.Specifically, we first train a discrete style codebook to extract style representations from reference images.Then, we train a text-to-image diffusion model (T2I-DM) conditioned on the output of the codebook, enabling it to generate images with specific style.Using the style codebook, we encode a large set of style images into indices and train an autoregressive model on them to model their distribution, which acts as the style generator.During inference, a numerical code deterministically samples a novel sequence of indices from the transformer. This sequence is then condition the diffusion process, generating style images.Unlike existing methods, our approach offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input.Extensive experiments validate that SeeTyle effectively turns a single code into a powerful style controller, demonstrating a style is worth one code.

Method

In this paper, we aim to adapt the text-to-image diffusion model (T2I-DM) to perform code-to-style image generation, that is, given a numerical style code, the system can generate images in a specific, consistent style. As shown in the figure, we propose our method, CoTyle, which comprises 3 main components. We begin by training a discrete style codebook with pairs of style images. The codebook can extract style embedding and discrete indices from reference image. Using these style embeddings, we train a T2I-DM capable of generating images that share the same style as the reference image. Finally, we train an autoregressive model to generate style indices, unlocking seed-to-style image generation.

cotyle 技术展示

Our main contributions can be summarized as:

  1. We introduce code-to-style image generation, a novel task which enables the creation of diverse, consistent visual styles conditioned solely on a numerical code.
  2. We propose CoTyle, a framework that achieves code-to-style generation by learning a discrete style codebook and an autoregressive style generator.
  3. We expand upon the CoTyle framework. Beyond code-to-style generation, it now facilitates generation conditioned on a reference image and allows for style interpolation.
  4. We conducted extensive experiments that demonstrate the effectiveness of CoTyle. The results validate that a single code can serve as a powerful, compact style controller, unlocking a vast space of reproducible novel styles.

Comparisons

Comparisons with Midjourney:

cotyle 技术展示

Comparisons with previous methods:

cotyle 技术展示

Style Interpolation

cotyle 技术展示

BibTeX

If you find this project useful for your research, please consider citing our paper.

@misc{liu2025styleworthcodeunlocking, title={A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space}, author={Huijie Liu and Shuhao Cui and Haoxiang Cao and Shuai Ma and Kai Wu and Guoliang Kang}, year={2025}, eprint={2511.10555}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.10555}, }
Read Entire Article