Show HN: An AI agent that debugs your LLM app and submits pull requests
4 months ago
4
Kaizen Agent is an AI debugging engineer that continuously tests, analyzes, and improves your AI agents and LLM applications. It runs multiple tests simultaneously, intelligently analyzes failures, automatically fixes code and prompts, and creates pull requests with improvements - all powered by AI.
Kaizen Agent acts as an AI debugging engineer that continuously tests, analyzes, and improves your AI agents and LLM applications. Here's how it works at a high level:
Parallel Testing: Runs multiple test cases simultaneously across your AI agents
Multiple Input Types: Supports strings, dictionaries, objects, and inline objects as inputs
Dynamic Loading: Automatically imports dependencies and referenced files
Intelligent Detection: Uses AI to analyze test failures and identify root causes
Context Understanding: Examines code, prompts, and test outputs to understand issues
Pattern Recognition: Identifies common problems in AI agent implementations
Code Improvements: Automatically fixes code issues, bugs, and logic problems
Prompt Optimization: Improves prompts for better AI agent performance
Best Practices: Applies AI development best practices and patterns
Multiple Output Evaluation: Evaluates return values, variables, and complex outputs
LLM-based Assessment: Uses AI to assess the quality and correctness of responses
Continuous Improvement: Iteratively improves until tests pass
5. Integration & Deployment
Pull Request Creation: Automatically creates PRs with fixes and improvements
Version Control: Integrates with GitHub for seamless deployment
Documentation: Updates documentation and comments as needed
This workflow ensures your AI agents are robust, reliable, and continuously improving through automated testing and fixing cycles.
git clone https://github.com/Kaizen-agent/kaizen-agent.git
cd kaizen-agent
pip install -e ".[dev]"
Before using Kaizen Agent, you need to set up your environment variables for API access.
Required Environment Variables
GOOGLE_API_KEY (Required): Your Google AI API key for LLM operations
Kaizen Agent comes with two example agents to help you get started quickly. These examples demonstrate how to test AI agents and LLM applications.
Example 1: Summarizer Agent
The summarizer agent demonstrates basic text summarization functionality:
Navigate to the example:
cd test_agent/summarizer_agent
Set up your environment:
# Create .env file with your Google API keyecho"GOOGLE_API_KEY=your_google_api_key_here"> .env
Run the tests:
# From the summarizer_agent directory
kaizen test-all --config test_config.yaml --auto-fix
The email agent demonstrates email improvement functionality:
Navigate to the example:
cd test_agent/email_agent
Set up your environment:
# Create .env file with your Google API keyecho"GOOGLE_API_KEY=your_google_api_key_here"> .env
Run the tests:
# From the email_agent directory
kaizen test-all --config test_config.yaml --auto-fix
Creating Your Own Test Configuration
Navigate to your project directory:
cd path/to/your/ai-agent-project
Create a test configuration file (YAML):
name: My AI Agent Test Suitefile_path: path/to/your/agent.pydescription: "Test suite for my AI agent"# Test stepssteps:
- name: Test Case 1input:
method: runinput: "test input"description: "Test basic functionality"
- name: Test Case 2input:
method: process_datainput: {"data": [1, 2, 3]}description: "Test data processing"
Run tests with auto-fix:
# From your project directory
kaizen test-all --config test_config.yaml --auto-fix --create-pr
# Run all tests in configuration
kaizen test-all --config <config_file> [options]
# Check environment setup
kaizen setup check-env [--features core github optional]
# Test GitHub access and permissions
kaizen test-github-access --config <config_file> [--repo owner/repo]
# Comprehensive GitHub access diagnostics
kaizen diagnose-github-access --config <config_file> [--repo owner/repo]
# Navigate to the directory containing your config file firstcd path/to/your/project
# Then run the kaizen command
kaizen test-all --config <config_file> [--auto-fix] [--create-pr] [--max-retries <n>] [--base-branch <branch>] [--save-logs] [--verbose]
Options:
--config, -c: Path to test configuration file (required)
--auto-fix: Enable automatic code fixing
--create-pr: Create a pull request with fixes
--max-retries: Maximum number of fix attempts (default: 1)
--base-branch: Base branch for pull request (default: main)
--pr-strategy: Strategy for when to create PRs (default: ANY_IMPROVEMENT)
--test-github-access: Test GitHub access before running tests
--save-logs: Save detailed test logs in JSON format
# Navigate to the directory containing your config file firstcd path/to/your/project
# Test GitHub access with config file
kaizen test-github-access --config test_config.yaml
# Test specific repository
kaizen test-github-access --repo owner/repo-name --base-branch main
# Comprehensive diagnostics
kaizen diagnose-github-access --repo owner/repo-name
The test configuration file (YAML) supports the following structure:
name: Test Suite Namefile_path: path/to/main/code.pydescription: "Description of the test suite"# Package dependencies to import before test executiondependencies:
- "requests>=2.25.0"
- "pandas==1.3.0"
- "numpy"# Local files to import (relative to config file location)referenced_files:
- "utils/helper.py"
- "models/data_processor.py"# Files that should be fixed if tests failfiles_to_fix:
- "main_code.py"
- "utils/helper.py"# Test configurationagent_type: "default"auto_fix: truecreate_pr: falsemax_retries: 3base_branch: "main"pr_strategy: "ANY_IMPROVEMENT"# Test regions to executeregions:
- "test_function"
- "test_class"# Test stepssteps:
- name: Test Case Nameinput:
method: runinput: "test input"description: "Description of the test case"timeout: 30retries: 2# Evaluation criteriaevaluation:
evaluation_targets:
- name: summary_textsource: variablecriteria: "Should include key insights from the data"description: "Summary should highlight important patterns"weight: 1.0
- name: returnsource: returncriteria: "Should be a dictionary with 'status' and 'results' keys"description: "Return value should have expected structure"weight: 1.0# Metadatametadata:
version: "1.0.0"author: "Test Author"created_at: "2024-01-01T00:00:00Z"### Configuration Fields
- `name`: Name of the test suite
- `file_path`: Path to the main code file
- `description`: Description of the test suite
- `dependencies`: List of package dependencies
- `referenced_files`: List of local files to import
- `files_to_fix`: List of files that should be fixed if tests fail
- `agent_type`: Type of agent to use (default: "default")
- `auto_fix`: Whether to enable automatic fixing
- `create_pr`: Whether to create pull requests
- `max_retries`: Maximum number of fix attempts
- `base_branch`: Base branch for pull requests
- `pr_strategy`: Strategy for when to create PRs
- `regions`: List of code regions to test
- `steps`: List of test steps
- `evaluation`: Evaluation criteria and settings
- `metadata`: Additional metadata### Defining Code RegionsTo test specific parts of your code, you need to define regions using special comment markers. Kaizen Agent will only test the code within these marked regions.#### Region MarkersUse these comment markers to define testable regions in your Python code:
```python# kaizen:start:{region_name}# Your code here - this will be testedclass MyAgent: def __init__(self): self.name = "My Agent" def process_data(self, data): # This method will be tested return {"status": "success", "data": data}# kaizen:end:{region_name}
Example: Complete Agent with Regions
importjsonfromtypingimportDict, Any# kaizen:start:customer_support_agentclassCustomerSupportAgent:
def__init__(self):
self.issue_analysis=""self.improvement_recommendations=""self.analysis_results= {}
defanalyze_customer_issues(self, *inputs):
"""Analyze customer issues with multiple inputs."""# Process different input typesuser_query=Nonecustomer_data=Nonecustomer_feedback=Noneforinput_itemininputs:
ifisinstance(input_item, str):
user_query=input_itemelifisinstance(input_item, dict):
customer_data=input_itemelifhasattr(input_item, 'text'):
customer_feedback=input_item# Set variables that will be tracked for evaluationself.issue_analysis="Customers are experiencing performance issues."self.improvement_recommendations="Implement performance optimization."self.analysis_results= {
"satisfaction_score": 2.5,
"main_issues": ["performance", "stability"],
"recommendations": ["optimize code", "add monitoring"]
}
return {
"status": "completed",
"analysis": self.issue_analysis,
"recommendations": self.improvement_recommendations,
"details": self.analysis_results
}
# kaizen:end:customer_support_agent# This code outside the region won't be testeddefutility_function():
return"This won't be tested"
In your YAML configuration, reference the region name:
# Test regions to executeregions:
- "customer_support_agent"# Matches the region name in your code
Descriptive Names: Use clear, descriptive region names (e.g., customer_support_agent, data_processor)
Complete Classes: Include entire classes or functions within a single region
Avoid Nesting: Don't nest regions within other regions
Clean Boundaries: Place markers at logical code boundaries (class/function level)
Consistent Naming: Use the same naming convention across your codebase
steps:
- name: User Creation Testinput:
method: create_userinput: {"name": "John Doe", "email": "[email protected]"}expected_output:
status: "success"user_id: "user_123"message: "User created successfully"description: "Test user creation API response"
3. Advanced Evaluation Targets
evaluation:
evaluation_targets:
- name: summary_textsource: variablecriteria: "Should include clarification about the compound's instability"description: "The summary text should explain stability concerns"weight: 1.0
- name: returnsource: returncriteria: "Should be a dictionary with 'status' and 'summary' keys"description: "The return value should have the expected structure"weight: 1.0
Multiple Inputs & Outputs
Kaizen Agent supports advanced multiple inputs and multiple outputs evaluation, making it perfect for complex agent workflows and multi-step processes.
Kaizen Agent supports four types of inputs that can be combined in any configuration:
steps:
- name: String Input Testinput:
method: process_textinput:
- name: user_querytype: stringvalue: "What are the main issues customers are reporting?"
Evaluate multiple outputs from your agents including return values and specific variables:
evaluation:
evaluation_targets:
- name: issue_analysissource: variablecriteria: "Should identify the root cause of customer complaints"description: "The analysis should explain why customers are dissatisfied"weight: 1.0
- name: recommended_actionsource: variablecriteria: "Should suggest specific improvements or solutions"description: "The recommendation should be actionable and specific"weight: 1.0
- name: returnsource: returncriteria: "Should be a dictionary with 'status' and 'summary' keys"description: "The return value should have the expected structure"weight: 1.0
Here's a comprehensive example showing multiple inputs and outputs:
name: Customer Support Analysis Testagent_type: dynamic_regionfile_path: support_agent.pydescription: Test customer support analysis with multiple input types and output evaluationevaluation:
evaluation_targets:
- name: issue_analysissource: variablecriteria: "Should identify the main customer pain points and their causes"description: "Analysis should cover customer satisfaction factors"weight: 1.0
- name: improvement_recommendationssource: variablecriteria: "Should provide specific actionable recommendations for improvement"description: "Recommendations should be practical and business-focused"weight: 1.0
- name: returnsource: returncriteria: "Should return a structured response with 'status', 'analysis', and 'recommendations' fields"description: "Return value should be well-structured for API consumption"weight: 1.0regions:
- SupportAgentmax_retries: 2files_to_fix:
- support_agent.pysteps:
- name: Complex Customer Querydescription: Test handling of mixed input types with multiple outputsinput:
file_path: support_agent.pymethod: analyze_customer_issuesinput:
# String inputs
- name: user_querytype: stringvalue: "What are the main issues customers are reporting?"# Dictionary inputs
- name: customer_datatype: dictvalue:
customer_id: "CUST123"product: "Premium Widget"subscription_tier: "Enterprise"support_tickets: 3satisfaction_score: 2.5# Inline object inputs (recommended)
- name: customer_feedbacktype: inline_objectclass_path: "input_types.CustomerFeedback"attributes:
text: "Product is too slow and crashes frequently"tags: ["performance", "stability", "frustration"]priority: "high"source: "support_ticket"evaluation:
type: llm
Agent Implementation Example
To use multiple inputs and outputs, implement your agent like this:
fromdataclassesimportdataclassfromtypingimportList, Optional@dataclassclassCustomerFeedback:
text: strtags: List[str]
priority: Optional[str] =Nonesource: Optional[str] =Nonedefto_dict(self) ->dict:
return {
'text': self.text,
'tags': self.tags,
'priority': self.priority,
'source': self.source
}
classSupportAgent:
def__init__(self):
# Variables that will be tracked for evaluationself.issue_analysis=""self.improvement_recommendations=""self.analysis_results= {}
defanalyze_customer_issues(self, *inputs):
"""Analyze customer issues with multiple inputs."""# Process different input typesuser_query=Nonecustomer_data=Nonecustomer_feedback=Noneforinput_itemininputs:
ifisinstance(input_item, str):
user_query=input_itemelifisinstance(input_item, dict):
customer_data=input_itemelifhasattr(input_item, 'text'): # CustomerFeedback objectcustomer_feedback=input_item# Set variables that will be tracked for evaluationself.issue_analysis="Customers are experiencing performance issues and system crashes, leading to low satisfaction scores."self.improvement_recommendations="Implement performance optimization and add crash recovery mechanisms. Consider upgrading server infrastructure."self.analysis_results= {
"satisfaction_score": 2.5,
"main_issues": ["performance", "stability"],
"recommendations": ["optimize code", "add monitoring", "improve error handling"]
}
# Return structured resultreturn {
"status": "completed",
"analysis": self.issue_analysis,
"recommendations": self.improvement_recommendations,
"details": self.analysis_results
}
The --save-logs option allows you to save detailed test execution logs in JSON format for later analysis and debugging.
# Navigate to the directory containing your config file firstcd path/to/your/project
# Run tests with detailed logging
kaizen test-all --config test_config.yaml --save-logs
# Combine with other options
kaizen test-all --config test_config.yaml --auto-fix --create-pr --save-logs
When --save-logs is enabled, two files are created in the test-logs/ directory: