Generate character consistent images with a single reference

5 hours ago 2

Train character LoRAs from a single reference image and generate character consistent images across diverse scenes.

Character Sheet Generation: Generate a diverse character sheet from a single reference image
Automatic Captioning: Generate detailed captions for training images
LoRA Training: Train high-quality character LoRAs
Easy Inference: Generate images of your character in various scenarios

Python 3.10 or higher
GPU with at least 48GB VRAM
At least 60GB RAM
At least 100GB of free disk space

Clone the repository:

git clone https://github.com/RishiDesai/CharForge.git cd CharForge
Set these API keys and variables in your .env and add funds where appropriate

HF_TOKEN HF_HOME CIVITAI_API_KEY TOGETHER_API_KEY FAL_KEY OPENAI_API_KEY
Log into Hugging Face and accept their terms of service to download Flux.1-dev
Run the setup script

This will:
- Install submodules including ComfyUI, all required ComfyUI custom nodes, LoRACaptioner, MVAdapter, and ai-toolkit.
- Show all ComfyUI custom nodes
  - comfyui_essentials
  - comfyui-advancedliveportrait
  - comfyui-ic-light
  - comfyui-impact-pack
  - comfyui-custom-scripts
  - rgthree-comfy
  - comfyui-easy-use
  - comfyui-impact-subpack
  - was-node-suite-comfyui
  - ComfyUI_UltimateSDUpscale
  - ComfyUI-PuLID-Flux-Enhanced
  - comfy-image-saver
  - ComfyUI-Image-Filters
  - ComfyUI-Detail-Daemon
  - ComfyUI-KJNodes
- Download all necessary models to HF_HOME
- Set up the character sheet generation pipeline
Activate venv source .venv/bin/activate

1. Train a Character LoRA

python train_character.py --name "character_name" --input "path/to/reference_image.png"

Show all training options

python train_character.py \ --name "character_name" \ --input "path/to/reference_image.png" \ [--work_dir WORK_DIR] \ [--steps STEPS] \ [--batch_size BATCH_SIZE] \ [--lr LEARNING_RATE] \ [--train_dim TRAIN_DIM] \ [--rank_dim RANK_DIM] \ [--pulidflux_images PULID_FLUX_IMAGES]

--name (str): Character name (used for folder and model naming)
--input (str): Path to input image
--work_dir (str, optional): Working directory (defaults to ./scratch/{name}/)
--steps (int, optional): Number of training steps (default: 800)
--batch_size (int, optional): Training batch size (default: 1)
--lr (float, optional): Learning rate (default: 8e-4)
--train_dim (int, optional): Training image dimension (default: 512)
--rank_dim (int, optional): LoRA rank dimension (default: 8)
--pulidflux_images (int, optional): Number of Pulid-Flux images to include (default: 0)

This command will:

Generate a character sheet from your input image
Caption the generated images
Train a LoRA on Flux.1-dev using the generated dataset

2. Generate Images with Your Character LoRA

python test_character.py --character_name "character_name" --prompt "A detailed prompt here"

Show all inference options

python test_character.py \ --character_name "character_name" \ --prompt "A detailed prompt here" \ [--work_dir WORK_DIR] \ [--lora_weight LORA_WEIGHT] \ [--test_dim TEST_DIM] \ [--do_optimize_prompt/--no_optimize_prompt] \ [--output_filenames FILE1 FILE2 ...] \ [--batch_size BATCH_SIZE] \ [--num_inference_steps STEPS] \ [--fix_outfit/--no_fix_outfit] \ [--safety_check/--no_safety_check] \ [--face_enhance/--no_face_enhance]

--character_name (str): Name of the character (used to find LoRA and work_dir)
--prompt (str): The prompt to use for generation
--work_dir (str, optional): Working directory (defaults to ./scratch/{character_name}/)
--lora_weight (float, optional): LoRA strength (default: 0.73)
--test_dim (int, optional): Image width/height (default: 1024)
--do_optimize_prompt / --no_optimize_prompt: Whether to optimize the prompt using LoRACaptioner (default: enabled)
--output_filenames (str, optional): Filenames for output images (space separated list)
--batch_size (int, optional): Number of images to generate (default: 4)
--num_inference_steps (int, optional): Steps for generation (default: 30)
--fix_outfit / --no_fix_outfit: Use the reference image flag in prompt optimization (default: disabled)
--safety_check / --no_safety_check: Run safety checks on generated images (default: enabled)
--face_enhance / --no_face_enhance: Enable or disable face enhancement (default: disabled)

This command will:

Load your LoRA, prompt it, and generate the image(s)
Optionally do prompt optimization, FaceEnhance outputs, and run a safety check.

Note: The first run of train_character.py and test_character.py will take longer as remaining models will be downloaded.

The training script runs a ComfyUI server ephemerally.
All character images and character data are saved in ./scratch/{character_name} for easy access and organization.
fal.ai is used for upscaling and generating PuLID-Flux images, Together AI is used for image captioning and prompt optimization (via LoRACaptioner), GPT-4o is used for generating prompts for PuLID-Flux.
The character sheet generation is partly based off Mickmumpitz's Flux character consistency workflow. Specifically upscaling images, facial expressions, and lighting conditions.
Sections of the workflow were broken up into modular pieces. I used the ComfyUI-to-Python-Extension to re-engineer components for efficiency and function.

The character sheet includes multi-view images, varied facial expressions, lighting conditions, and (optionally) PuLID-Flux images.
Images are autocaptioned using LoRACaptioner.
LoRA is trained using ai-toolkit.
Inference is handled by diffusers with some speed improvements from the Modal Flux inference guide.

Training: LoRA rank of 8 and resolution fixed to 512x512 is the right balance of quality and speed.
- Entire training pipeline takes 30-40 minutes on 1 L40S
Inference: Resolution of 1024x1024 and LoRA weight of 0.65-0.85 gives the best results.
- Batch size of 4 takes 60 seconds on 1 L40S if the models are loaded in memory, 120 seconds otherwise.
- If FaceEnhance is enabled, you will likely need more than 48GB VRAM.

Training Parameters: You can modify training parameters by passing the relevant CLI arguments to train_character.py, or by editing the YAML config scripts/character_lora.yaml.
Public LoRA Serving: Use python scripts/serve_lora.py to serve LoRA weights via a FastAPI server, making them publicly accessible (e.g., for fal.ai inference).
Run ComfyUI Server: Use python scripts/run_comfy.py to launch a ComfyUI server, useful for doing inference manually.
Symlink LoRAs for ComfyUI: Use bash scripts/symlink_loras.sh to symlink trained LoRA weights from scratch/{character_name}/ to the ComfyUI LoRA directory for easy access.

Model download issues: Check your Hugging Face or CivitAI credentials
Out of memory: Use batch_size=1 for GPUs with less than 48GB VRAM

Read Entire Article