Orchestrate multiple Claude Code agents working in parallel to improve your codebase through automated bug fixing or systematic best practices implementation
Claude Code Agent Farm is a powerful orchestration framework that runs multiple Claude Code (cc) sessions in parallel to systematically improve your codebase. It supports multiple technology stacks and workflow types, allowing teams of AI agents to work together on large-scale code improvements.
- 🚀 Parallel Processing: Run 20+ Claude Code agents simultaneously (up to 50 with max_agents config)
- 🎯 Multiple Workflows: Bug fixing, best practices implementation, or coordinated multi-agent development
- 🤝 Agent Coordination: Advanced lock-based system prevents conflicts between parallel agents
- 🌐 Multi-Stack Support: 34 technology stacks including Next.js, Python, Rust, Go, Java, Angular, Flutter, C++, and more
- 📊 Smart Monitoring: Real-time dashboard showing agent status and progress
- 🔄 Auto-Recovery: Automatically restarts agents when needed
- 📈 Progress Tracking: Git commits and structured progress documents
- ⚙️ Highly Configurable: JSON configs with variable substitution
- 🖥️ Flexible Viewing: Multiple tmux viewing modes
- 🔒 Safe Operation: Automatic settings backup/restore, file locking, atomic operations
- 🛠️ Development Setup: 24 integrated tool installation scripts for complete environments
- Python 3.13+ (managed by uv)
- tmux (for terminal multiplexing)
- Claude Code (claude command installed and configured)
- git (for version control)
- Your project's tools (e.g., bun for Next.js, mypy/ruff for Python)
- direnv (optional but recommended for automatic environment activation)
- uv (modern Python package manager)
The agent farm requires a special cc alias to launch Claude Code with the necessary permissions:
This alias will be configured automatically by the setup script.
The setup script will:
- Check and install missing prerequisites
- Create a Python 3.13 virtual environment
- Install all dependencies
- Configure the cc alias
- Set up direnv for automatic environment activation
- Handle both bash and zsh shells automatically
The project includes a comprehensive modular system for setting up development environments:
Run the interactive menu:
Or run specific setups directly:
-
Python FastAPI (setup_python_fastapi.sh)
- Python 3.12+, uv, ruff, mypy, pre-commit, ipython
-
Go Web Apps (setup_go_webapps.sh)
- Go 1.23+, golangci-lint, air, migrate, mockery, Task, swag
-
Next.js (setup_nextjs.sh)
- Node.js 22+, Bun, pnpm, TypeScript, ESLint, Prettier
-
SvelteKit/Remix/Astro (setup_sveltekit_remix_astro.sh)
- Extends Next.js setup with Vite, Playwright, Vitest, Biome
-
Rust Development (setup_rust.sh)
- Rust toolchain, cargo tools, web & system programming tools
-
Java Enterprise (setup_java_enterprise.sh)
- Java 21 LTS, SDKMAN, Gradle 8.11+, Maven 3.9+, JBang
-
Bash/Zsh Scripting (setup_bash_zsh.sh)
- Shell development tools and best practices
-
Cloud Native DevOps (setup_cloud_native_devops.sh)
- Docker, Kubernetes, Terraform, cloud tools
-
GenAI/LLM Ops (setup_genai_llm_ops.sh)
- ML/AI development tools and frameworks
-
Data Engineering (setup_data_engineering.sh)
- Data processing and analytics tools
-
Serverless Edge (setup_serverless_edge.sh)
- Serverless and edge computing tools
-
Terraform Azure (setup_terraform_azure.sh)
- Terraform, Azure CLI, infrastructure tools
-
Angular (setup_angular.sh)
- Node.js, Angular CLI, TypeScript, testing tools
-
Flutter (setup_flutter.sh)
- Flutter SDK, Dart, Android Studio, development tools
-
React Native (setup_react_native.sh)
- React Native CLI, mobile development tools
Additional setup scripts are available for:
- PHP/Laravel (setup_php_laravel.sh)
- C++ Systems (setup_cpp_systems.sh)
- Solana/Anchor (setup_solana_anchor.sh)
- Ansible (setup_ansible.sh)
- LLM Dev Testing (setup_llm_dev_testing.sh)
- LLM Eval Observability (setup_llm_eval_observability.sh)
- Kubernetes AI (setup_kubernetes_ai_inference.sh)
- 🎨 Interactive & Safe: Colorful prompts, always asks before installing
- 🔍 Smart Detection: Checks existing installations to avoid conflicts
- 🛡️ Non-Destructive: Won't overwrite configurations without permission
- 🐚 Shell Agnostic: Works with both bash and zsh
- 📊 Progress Tracking: Shows what's installed and what's pending
This project consists of two independent scripts that work together:
This is the main orchestrator that does all the heavy lifting:
- Creates and manages tmux sessions with multiple panes
- Generates the problems file by running configured commands
- Launches Claude Code agents in each tmux pane
- Monitors agent health (context usage, work status, errors)
- Auto-restarts agents when they complete tasks or hit issues
- Runs monitoring dashboard in the tmux controller window
- Handles graceful shutdown with Ctrl+C
- Manages settings backup/restore to prevent corruption
- Implements file locking for concurrent access safety
- Writes monitor state to JSON file for external monitoring
You run this script and it stays running (unless using --no-monitor mode). The monitoring dashboard is displayed in the tmux session's controller window, not in the launching terminal.
This is an optional convenience tool for viewing the tmux session:
- It does NOT interact with the Python script
- Run it in a separate terminal to peek at agent activity
- Provides different viewing modes (grid, focus, split)
- Just a wrapper around tmux commands for convenience
- Automatically suggests font size adjustments for many agents
Think of it like this:
- Python script = Your car engine (does all the work)
- Shell script = Your dashboard camera (lets you see what's happening)
There's a hidden command for running just the monitor display:
This reads the monitor state file and displays the dashboard without launching agents.
- Separation of Concerns: Core logic (Python) vs viewing utilities (shell)
- Flexibility: You can monitor agents without the viewer script
- Independence: Either script can be used without the other
Agents work through type-checker and linter problems in parallel:
- Runs your configured type-check and lint commands
- Generates a combined problems file
- Agents select random chunks to fix
- Marks completed problems to avoid duplication
- Focuses on fixing existing issues
- Uses instance-specific seeds for better randomization
Agents systematically implement modern best practices:
- Reads a comprehensive best practices guide
- Creates a progress tracking document (@<STACK>_BEST_PRACTICES_IMPLEMENTATION_PROGRESS.md)
- Implements improvements in manageable chunks
- Tracks completion percentage for each guideline
- Maintains continuity between sessions
- Supports continuing existing work with special prompts
The most sophisticated workflow option transforms the agent farm into a coordinated development team capable of complex, strategic improvements. Amazingly, this powerful feature is implemented entire by means of the prompt file! No actual code is needed to effectuate the system; rather, the LLM (particularly Opus 4) is simply smart enough to understand and reliably implement the system autonomously:
This workflow implements a distributed coordination protocol that allows multiple agents to work on the same codebase simultaneously without conflicts. The system creates a /coordination/ directory structure in your project:
-
Unique Agent Identity: Each agent generates a unique ID (agent_{timestamp}_{random_4_chars})
-
Work Claiming Process: Before starting any work, agents must:
- Check the active work registry for conflicts
- Create a lock file claiming specific files and features
- Register their work plan with detailed scope information
- Update their status throughout the work cycle
-
Conflict Prevention: The lock file system prevents multiple agents from:
- Modifying the same files simultaneously
- Implementing overlapping features
- Creating merge conflicts or breaking changes
- Duplicating completed work
-
Smart Work Distribution: Agents automatically:
- Select non-conflicting work from available tasks
- Queue work if their preferred files are locked
- Handle stale locks (>2 hours old) intelligently
- Coordinate through descriptive git commits
This coordination system solves several critical problems:
- Eliminates Merge Conflicts: Lock-based file claiming ensures clean parallel development
- Prevents Wasted Work: Agents check completed work log before starting
- Enables Complex Tasks: Unlike simple bug fixing, agents can tackle strategic improvements
- Maintains Code Stability: Functionality testing requirements prevent breaking changes
- Scales Efficiently: 20+ agents can work productively without stepping on each other
- Business Value Focus: Requires justification and planning before implementation
- Stale Lock Detection: Automatically handles abandoned work after 2 hours
- Emergency Coordination: Alert system for critical conflicts
- Progress Transparency: All agents can see what others are working on
- Atomic Work Units: Each agent completes full features before releasing locks
- Detailed Planning: Agents must create comprehensive plans before claiming work
This workflow excels at:
- Large-scale refactoring projects
- Implementing complex architectural changes
- Adding comprehensive type hints across a codebase
- Systematic performance optimizations
- Multi-faceted security improvements
- Feature development requiring coordination
To use this workflow, specify the cooperating agents prompt:
The project includes pre-configured support for:
- Next.js - TypeScript, React, modern web development
- Angular - Enterprise Angular applications
- SvelteKit - Modern web framework
- Remix/Astro - Full-stack web frameworks
- Flutter - Cross-platform mobile development
- Laravel - PHP web framework
- PHP - General PHP development
- Python - FastAPI, Django, data science workflows
- Rust - System programming and web applications
- Rust CLI - Command-line tool development
- Go - Web services and cloud-native applications
- Java - Enterprise applications with Spring Boot
- C++ - Systems programming and performance-critical applications
- Bash/Zsh - Shell scripting and automation
- Terraform/Azure - Infrastructure as Code
- Cloud Native DevOps - Kubernetes, Docker, CI/CD
- Ansible - Infrastructure automation and configuration management
- HashiCorp Vault - Secrets management and policy as code
- GenAI/LLM Ops - AI/ML operations and tooling
- LLM Dev Testing - LLM development and testing workflows
- LLM Evaluation & Observability - LLM evaluation and monitoring
- Data Engineering - ETL, analytics, big data
- Data Lakes - Kafka, Snowflake, Spark integration
- Polars/DuckDB - High-performance data processing
- Excel Automation - Python-based Excel automation with Azure
- PostgreSQL 17 & Python - Modern PostgreSQL 17 with FastAPI/SQLModel
- Serverless Edge - Edge computing and serverless
- Kubernetes AI Inference - AI inference on Kubernetes
- Security Engineering - Security best practices and tooling
- Hardware Development - Embedded systems and hardware design
- Unreal Engine - Game development with Unreal Engine 5
- Solana/Anchor - Blockchain development on Solana
- Cosmos - Cosmos blockchain ecosystem
- React Native - Cross-platform mobile development
Each stack includes:
- Optimized configuration file
- Technology-specific prompts
- Comprehensive best practices guide (31 guides total)
- Appropriate chunk sizes and timing
Create your own configuration:
- tech_stack: Technology identifier (one of 34 supported stacks)
- problem_commands: Commands for type-checking, linting, and testing
- best_practices_files: Guides to copy to the project
- chunk_size: How many lines/changes per agent iteration (varies by stack: 20-75)
- prompt_file: Which prompt template to use (36 available)
- agents: Number of agents to run (default: 20)
- max_agents: Maximum allowed agents (default: 50)
- auto_restart: Enable automatic agent restart
- context_threshold: Restart when context drops below this %
- git_branch: Optional specific branch to commit to
- git_remote: Remote to push to (default: origin)
All configuration options can be overridden via CLI:
The system includes specialized prompts for all workflows and tech stacks:
- default_prompt.txt - Generic bug fixing
- default_prompt_nextjs.txt - Next.js specific
- default_prompt_python.txt - Python specific
- bug_fixing_prompt_for_nextjs.txt - Advanced Next.js fixing
- cooperating_agents_improvement_prompt_for_python_fastapi_postgres.txt - Multi-agent coordination system
- default_best_practices_prompt.txt - Generic implementation
- continue_best_practices_prompt.txt - Continue existing work
- default_best_practices_prompt_nextjs.txt - Next.js 15
- default_best_practices_prompt_angular.txt - Angular
- default_best_practices_prompt_sveltekit.txt - SvelteKit
- default_best_practices_prompt_remix_astro.txt - Remix/Astro
- default_best_practices_prompt_flutter.txt - Flutter
- default_best_practices_prompt_laravel.txt - Laravel
- default_best_practices_prompt_php.txt - PHP
- default_best_practices_prompt_python.txt - Python/FastAPI
- default_best_practices_prompt_rust_web.txt - Rust web apps
- default_best_practices_prompt_rust_system.txt - Rust systems
- default_best_practices_prompt_rust_cli.txt - Rust CLI tools
- default_best_practices_prompt_go.txt - Go applications
- default_best_practices_prompt_java.txt - Java enterprise
- default_best_practices_prompt_cpp.txt - C++ systems
- default_best_practices_prompt_bash_zsh.txt - Shell scripting
- default_best_practices_prompt_terraform_azure.txt - IaC
- default_best_practices_prompt_cloud_native_devops.txt - DevOps
- default_best_practices_prompt_ansible.txt - Ansible automation
- default_best_practices_prompt_vault.txt - HashiCorp Vault
- default_best_practices_prompt_genai_llm_ops.txt - AI/ML ops
- default_best_practices_prompt_llm_dev_testing.txt - LLM development
- default_best_practices_prompt_llm_eval_observability.txt - LLM evaluation
- default_best_practices_prompt_data_engineering.txt - Data pipelines
- default_best_practices_prompt_data_lakes.txt - Data lakes
- default_best_practices_prompt_polars.txt - Polars/DuckDB
- default_best_practices_prompt_excel.txt - Excel automation
- default_best_practices_prompt_serverless_edge.txt - Edge computing
- default_best_practices_prompt_kubernetes_ai.txt - Kubernetes AI
- default_best_practices_prompt_security.txt - Security engineering
- default_best_practices_prompt_hardware.txt - Hardware development
- default_best_practices_prompt_unreal.txt - Unreal Engine
- default_best_practices_prompt_solana.txt - Solana blockchain
- default_best_practices_prompt_cosmos.txt - Cosmos blockchain
- default_best_practices_prompt_react_native.txt - React Native
Prompts support dynamic variables:
- {chunk_size} - Replaced with configured chunk size
Example in prompt:
- Problem Generation: Runs type-check, lint, and test commands
- Agent Launch: Starts N agents in tmux panes
- Task Distribution: Each agent selects random problem chunks
- Conflict Prevention: Marks completed problems with [COMPLETED]
- Progress Tracking: Commits changes and tracks error reduction
- Guide Distribution: Copies best practices guides to project
- Progress Document: Agents create/update tracking document
- Systematic Implementation: Works through guidelines incrementally
- Accurate Tracking: Maintains honest completion percentages
- Session Continuity: Progress persists between runs
- Settings Backup: Automatically backs up Claude settings before starting
- Creates timestamped backups in ~/.claude/backups/
- Keeps last 10 backups with automatic rotation
- Full backup option with --full-backup flag
- Settings Restore: Restores from backup if corruption detected
- Automatic detection of settings errors
- Seamless restoration during agent startup
- File Locking: Uses file locks to prevent concurrent access issues
- Lock files in ~/.claude/.agent_farm_launch.lock
- 30-second stale lock detection and cleanup
- Prevents concurrent Claude launches that could corrupt settings
- Permission Management: Automatically fixes file permissions
- Sets 600 permissions on settings.json
- Sets 700 permissions on .claude directory
- Ensures proper file ownership
- Atomic Operations: Uses atomic file operations for safety
- Emergency Cleanup: Handles unexpected exits gracefully
- Cleans up tmux sessions
- Removes lock files
- Deletes state files
- Launch Locking: Prevents concurrent Claude launches with lock files
- Dynamic Stagger: Adjusts launch delays based on error detection
- Doubles stagger time when corruption detected
- Gradual increase based on agent count
- Agent Limits: Enforces max_agents limit (default: 50)
- Instance Randomization: Adds unique seeds to each agent for better work distribution
The Python script includes a real-time monitoring dashboard that shows:
- Agent Status: Working, Idle, Context Low, Error, Disabled
- Context Usage: Percentage of agent's context window used
- Last Activity: Time since the agent last did something
- Last Error: Most recent error message (if any)
- Session Stats: Total restarts, uptime, active agents
- Cycle Count: Number of work cycles completed
The monitoring dashboard runs in the tmux controller window:
- 🟡 starting - Agent initializing
- 🟢 working - Actively processing
- 🔵 ready - Waiting for input
- 🟡 idle - Completed work
- 🔴 error - Problem detected
- ⚫ unknown - State unclear
When --auto-restart is enabled:
- Monitors agent health continuously
- Restarts agents that hit errors or go idle
- Implements exponential backoff to prevent restart loops
- Initial wait: 10 seconds
- Doubles with each restart (max 5 minutes)
- Tracks restart count per agent
- Disables agents after max_errors threshold
The system writes monitor state to .claude_agent_farm_state.json in the project directory. This file contains:
- Agent statuses and health metrics
- Session information
- Runtime statistics
Structure:
External tools can read this file to monitor the farm's progress.
- Verify cc alias: alias | grep cc
- Test Claude Code manually: cc
- Check API key configuration
- Increase --wait-after-cc timing
- Use --full-backup flag if settings corruption suspected
- Validate JSON syntax
- Ensure all paths are correct
- Check command availability (mypy, ruff, etc.)
- Verify best practices guides exist
- Each agent uses ~500MB RAM
- Reduce agent count if needed
- Monitor with htop
- Check available disk space for logs
- Respect max_agents limit (default: 50)
- System automatically backs up settings
- Restores from backup on error detection
- Manual restore: Check ~/.claude/backups/
- Use --full-backup for comprehensive backup
- State File: Check .claude_agent_farm_state.json for agent status
- Lock Files: Look for .agent_farm_launch.lock in ~/.claude/
- Backup Directory: ~/.claude/backups/ contains settings backups
- Emergency Cleanup: Ctrl+C triggers graceful shutdown
- Manual tmux: tmux kill-session -t claude_agents to force cleanup
- Define your tech stack config (see 34 examples)
- Create appropriate prompts (follow 37 existing patterns)
- Add best practices guides (optional, see 35 examples)
- Configure problem commands (type-check, lint, test)
- Set appropriate chunk sizes (20-75 based on complexity)
- Test with small agent counts first
- Start small (5-10 agents) and scale up
- Maximum 50 agents by default (configurable via max_agents)
- Increase stagger time for many agents
- Consider running in batches for 50+ agents
- Use --no-monitor for headless operation
- Monitor system resources (RAM, CPU)
- Adjust chunk sizes based on performance
Configure custom git branches and remotes in your config:
- Chunk Size: Smaller chunks (20-30) for complex tasks, larger (50-75) for simple fixes
- Recommended sizes by stack: Python (50), Next.js (50), Rust (30), Go (40), Java (35)
- Stagger Time: Increase for many agents or slow systems
- Default 10s prevents settings corruption
- Automatically doubles on error detection
- Context Threshold: Lower values (15-20%) restart agents sooner
- Idle Timeout: Adjust based on task complexity
- Check Interval: Balance between responsiveness and CPU usage
- Max Agents: Increase beyond 50 for powerful systems
- Wait After CC: Default 15s ensures Claude is fully ready
- Increase if seeing startup failures
- All long-running operations can be interrupted with Ctrl+C
- Graceful shutdown preserves work in progress
- Emergency cleanup on unexpected exits
- Detects multiple error conditions:
- Settings corruption
- Authentication failures
- Welcome/setup screens
- Command not found errors
- Parse errors (TypeError, SyntaxError, JSONDecodeError)
- Login prompts and API key issues
- Automatic recovery attempts before disabling agents
- Preserves other working agents during recovery
- {chunk_size} - Replaced with configured chunk size
- Supports regex patterns for flexible prompt templates
- Only allows letters, numbers, hyphens, and underscores
- Prevents tmux errors from invalid characters
- Intelligently waits for shell prompts before sending commands
- --fast-start flag skips prompt detection for faster launches
- Handles both bash and zsh prompts
- Interruptible confirmation prompts (Ctrl+C uses default)
- Safe defaults for all destructive operations
- Clear messaging for all user interactions
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests if applicable
- Update documentation
- Submit a pull request
- Create config file in configs/ (34 examples to follow)
- Add prompts in prompts/ (37 examples available)
- Write best practices guide in best_practices_guides/ (35 examples)
- Add setup script in tool_setup_scripts/ (15 examples)
- Test thoroughly with various project types
- Update this README with your addition
Created by Jeffrey Emanuel ([email protected])
MIT License - see LICENSE file
- Always backup your code before running
- Review changes before committing
- Start with few agents to test
- Monitor first runs to ensure proper behavior
- Check resource usage for large agent counts
- Verify cc alias is properly configured
- Ensure git is configured with proper credentials
- Respect agent limits (default max: 50)
- Claude settings are automatically backed up and restored
- Lock files prevent concurrent launches and corruption
- State files enable external monitoring tools
- Monitor state file (.claude_agent_farm_state.json) for external integrations
- tmux session logs for debugging agent issues
- Git commit history for tracking improvements
- Manual settings restore from ~/.claude/backups/
- Lock file cleanup: rm ~/.claude/.agent_farm_launch.lock
- Emergency session cleanup: tmux kill-session -t claude_agents
- Use SSDs for better file I/O performance
- Allocate 500MB RAM per agent
- Consider network bandwidth for API calls
- Monitor CPU usage with htop during runs
Happy farming! 🚜 May your code be clean and your agents productive.
| Web Development | 8 | Next.js, Angular, Flutter, Laravel, React Native |
| Systems & Languages | 7 | Python, Rust, Go, Java, C++ |
| DevOps & Infrastructure | 6 | Terraform, Kubernetes, Ansible |
| Data & AI | 8 | GenAI/LLM, Data Lakes, PostgreSQL 17, Polars |
| Specialized | 5 | Security, Hardware, Blockchain |
| Total | 34 |
| Configuration Files | 37 |
| Prompt Templates | 37 |
| Best Practices Guides | 35 |
| Tool Setup Scripts | 24 |
.png)

