Generate character consistent images with a single reference

5 hours ago 2

Train character LoRAs from a single reference image and generate character consistent images across diverse scenes.

Example 1 Example 2

  • Character Sheet Generation: Generate a diverse character sheet from a single reference image
  • Automatic Captioning: Generate detailed captions for training images
  • LoRA Training: Train high-quality character LoRAs
  • Easy Inference: Generate images of your character in various scenarios

Train LoRA

Train LoRA

  • Python 3.10 or higher
  • GPU with at least 48GB VRAM
  • At least 60GB RAM
  • At least 100GB of free disk space
  1. Clone the repository:

    git clone https://github.com/RishiDesai/CharForge.git cd CharForge
  2. Set these API keys and variables in your .env and add funds where appropriate

    HF_TOKEN HF_HOME CIVITAI_API_KEY TOGETHER_API_KEY FAL_KEY OPENAI_API_KEY
  3. Log into Hugging Face and accept their terms of service to download Flux.1-dev

  4. Run the setup script

    This will:

    • Install submodules including ComfyUI, all required ComfyUI custom nodes, LoRACaptioner, MVAdapter, and ai-toolkit.

    • Show all ComfyUI custom nodes
      • comfyui_essentials
      • comfyui-advancedliveportrait
      • comfyui-ic-light
      • comfyui-impact-pack
      • comfyui-custom-scripts
      • rgthree-comfy
      • comfyui-easy-use
      • comfyui-impact-subpack
      • was-node-suite-comfyui
      • ComfyUI_UltimateSDUpscale
      • ComfyUI-PuLID-Flux-Enhanced
      • comfy-image-saver
      • ComfyUI-Image-Filters
      • ComfyUI-Detail-Daemon
      • ComfyUI-KJNodes
    • Download all necessary models to HF_HOME

    • Set up the character sheet generation pipeline

  5. Activate venv source .venv/bin/activate

1. Train a Character LoRA

python train_character.py --name "character_name" --input "path/to/reference_image.png"
Show all training options
python train_character.py \ --name "character_name" \ --input "path/to/reference_image.png" \ [--work_dir WORK_DIR] \ [--steps STEPS] \ [--batch_size BATCH_SIZE] \ [--lr LEARNING_RATE] \ [--train_dim TRAIN_DIM] \ [--rank_dim RANK_DIM] \ [--pulidflux_images PULID_FLUX_IMAGES]
  • --name (str): Character name (used for folder and model naming)
  • --input (str): Path to input image
  • --work_dir (str, optional): Working directory (defaults to ./scratch/{name}/)
  • --steps (int, optional): Number of training steps (default: 800)
  • --batch_size (int, optional): Training batch size (default: 1)
  • --lr (float, optional): Learning rate (default: 8e-4)
  • --train_dim (int, optional): Training image dimension (default: 512)
  • --rank_dim (int, optional): LoRA rank dimension (default: 8)
  • --pulidflux_images (int, optional): Number of Pulid-Flux images to include (default: 0)

This command will:

  1. Generate a character sheet from your input image
  2. Caption the generated images
  3. Train a LoRA on Flux.1-dev using the generated dataset

2. Generate Images with Your Character LoRA

python test_character.py --character_name "character_name" --prompt "A detailed prompt here"
Show all inference options
python test_character.py \ --character_name "character_name" \ --prompt "A detailed prompt here" \ [--work_dir WORK_DIR] \ [--lora_weight LORA_WEIGHT] \ [--test_dim TEST_DIM] \ [--do_optimize_prompt/--no_optimize_prompt] \ [--output_filenames FILE1 FILE2 ...] \ [--batch_size BATCH_SIZE] \ [--num_inference_steps STEPS] \ [--fix_outfit/--no_fix_outfit] \ [--safety_check/--no_safety_check] \ [--face_enhance/--no_face_enhance]
  • --character_name (str): Name of the character (used to find LoRA and work_dir)
  • --prompt (str): The prompt to use for generation
  • --work_dir (str, optional): Working directory (defaults to ./scratch/{character_name}/)
  • --lora_weight (float, optional): LoRA strength (default: 0.73)
  • --test_dim (int, optional): Image width/height (default: 1024)
  • --do_optimize_prompt / --no_optimize_prompt: Whether to optimize the prompt using LoRACaptioner (default: enabled)
  • --output_filenames (str, optional): Filenames for output images (space separated list)
  • --batch_size (int, optional): Number of images to generate (default: 4)
  • --num_inference_steps (int, optional): Steps for generation (default: 30)
  • --fix_outfit / --no_fix_outfit: Use the reference image flag in prompt optimization (default: disabled)
  • --safety_check / --no_safety_check: Run safety checks on generated images (default: enabled)
  • --face_enhance / --no_face_enhance: Enable or disable face enhancement (default: disabled)

This command will:

  1. Load your LoRA, prompt it, and generate the image(s)
  2. Optionally do prompt optimization, FaceEnhance outputs, and run a safety check.

Note: The first run of train_character.py and test_character.py will take longer as remaining models will be downloaded.

  • The training script runs a ComfyUI server ephemerally.
  • All character images and character data are saved in ./scratch/{character_name} for easy access and organization.
  • fal.ai is used for upscaling and generating PuLID-Flux images, Together AI is used for image captioning and prompt optimization (via LoRACaptioner), GPT-4o is used for generating prompts for PuLID-Flux.
  • The character sheet generation is partly based off Mickmumpitz's Flux character consistency workflow. Specifically upscaling images, facial expressions, and lighting conditions.
  • Sections of the workflow were broken up into modular pieces. I used the ComfyUI-to-Python-Extension to re-engineer components for efficiency and function.
  • The character sheet includes multi-view images, varied facial expressions, lighting conditions, and (optionally) PuLID-Flux images.
  • Images are autocaptioned using LoRACaptioner.
  • LoRA is trained using ai-toolkit.
  • Inference is handled by diffusers with some speed improvements from the Modal Flux inference guide.
  • Training: LoRA rank of 8 and resolution fixed to 512x512 is the right balance of quality and speed.
    • Entire training pipeline takes 30-40 minutes on 1 L40S
  • Inference: Resolution of 1024x1024 and LoRA weight of 0.65-0.85 gives the best results.
    • Batch size of 4 takes 60 seconds on 1 L40S if the models are loaded in memory, 120 seconds otherwise.
    • If FaceEnhance is enabled, you will likely need more than 48GB VRAM.
  • Training Parameters: You can modify training parameters by passing the relevant CLI arguments to train_character.py, or by editing the YAML config scripts/character_lora.yaml.
  • Public LoRA Serving: Use python scripts/serve_lora.py to serve LoRA weights via a FastAPI server, making them publicly accessible (e.g., for fal.ai inference).
  • Run ComfyUI Server: Use python scripts/run_comfy.py to launch a ComfyUI server, useful for doing inference manually.
  • Symlink LoRAs for ComfyUI: Use bash scripts/symlink_loras.sh to symlink trained LoRA weights from scratch/{character_name}/ to the ComfyUI LoRA directory for easy access.
  • Model download issues: Check your Hugging Face or CivitAI credentials
  • Out of memory: Use batch_size=1 for GPUs with less than 48GB VRAM
Read Entire Article