Run ML experiments automatically with Claude Code analyzing and improving results iteratively.
-
You need an environment with working Python, PyTorch, Claude Code and, optionally, CUDA Perhaps, also venv. Also, read the WARNING below.
-
Create experiment directory with your idea:
- Copy and run the automation script:
That's it! Claude will create a plan, write code, run training, and iterate.
Check logs/ directory to monitor progress.
You need to be aware of the fact this will automatically run AI-generated code when you run runner.py. This template does NOT include any kind of sandboxing, the responsibility for sandboxing and outcomes is entirely on the person running the script. The script is provided for educational purposes only.
This is a proof-of-concept. It can:
- train a small NN from scratch
- train small transformer on synthetic data
- fine-tune existing pre-trained transformer
- get existing pre-trained transformer and extend it, e.g. add or modify some module
The most ambitious experiment I tried so far injected memory into a pre-trained transformer using gating MLP.
Training script running time is limited 2 hours in runner.py. Feel free to modify if you feel ambitious.
- IDEA.md - Your experiment description (required)
- runner.py - Automation script
- CLAUDE.md - Instructions for Claude
- PLAN.md - Created by Claude
- train.py - Created/modified by Claude (main training script)
- logs/ - Training outputs (stdout, stderr for each iteration)
- status.json - Current experiment status and iteration counter
- REPORT.md - Final results
- watchdog.py - Optional safety script to prevent runaway experiments
When enabled, allows Claude to use pip for Python package installation:
- Adds pip install capability to Claude's allowed tools
- Useful for experiments that need additional libraries
- Standard Python package manager
Example:
When enabled, allows Claude to use uv for fast Python package installation:
- Adds uv pip install capability to Claude's allowed tools
- Useful for experiments that need additional libraries
- Much faster than regular pip for dependency resolution
Example:
To prevent experiments from running forever:
The watchdog will create a REPORT.md to stop the experiment if:
- Running longer than 24 hours
- More than 20 iterations
- No progress for 2+ hours
Note: currently watchdog cannot actually kill a process. It's more like a concept of a watchdog.
- Start with a clear, focused idea in IDEA.md
- Specify constraints (GPU memory, time limits)
- Watch the first iteration to ensure it's on track
- Check logs/ if something goes wrong
- Always use fp32 unless you specifically need fp16/bf16 (prevents NaN issues)
The experiment creates status.json with:
Check this file to see current progress without interrupting the experiment.
"IDEA.md not found" - Create this file first "Claude Code CLI not found" - Install from https://github.com/anthropics/claude-code Training keeps failing - Add constraints to IDEA.md (smaller model, less memory) NaN losses - Ensure fp32 is used, reduce learning rate No progress - Check logs/runner.log and status.json
To resume after interruption:
To use existing code:
To monitor without interrupting:
The goal is to show that this is possible.
- Tensorboard support (also maybe wandb?)
- Save PIDs to status.json