Compare LLMs in one click - An open-source tool to evaluate and compare Large Language Model responses across different providers with latency, cost, and quality metrics.
- Parallel Comparison: Test multiple LLM models simultaneously
- Comprehensive Metrics: Track latency, token usage, and cost
- Quality Scoring: Built-in scoring for length simplicity, readability, and JSON validity
- Cost Transparency: Real-time pricing comparison across providers
- Extensible: Easy to add new LLM providers
- Node.js 18+
- npm/yarn
- API keys for LLM providers (OpenAI, Anthropic)
-
Clone the repository
git clone https://github.com/your-org/duelr.git cd duelr -
Install dependencies
-
Set up environment variables
cp .env.example .env.localAdd your API keys to .env.local:
OPENAI_API_KEY=your_openai_api_key_here ANTHROPIC_API_KEY=your_anthropic_api_key_here -
Start the development server
-
Open your browser
Navigate to http://localhost:3000
- Enter your prompt in the text area
- Select models you want to compare (OpenAI GPT-4o, Claude Sonnet 4, etc.)
- Click "Run Comparison" to execute parallel requests
- Review results in side-by-side cards showing:
- Response text with copy button
- Latency measurements
- Token usage and costs
- Quality scores (simplicity, readability, JSON validity)
- Frontend: Next.js 15 with React 19, Tailwind CSS, Shadcn/ui
- API Routes: Next.js API routes for LLM integrations
- Providers: Modular provider system (OpenAI, Anthropic)
- Scoring: Built-in heuristic algorithms for response evaluation
- ✅ OpenAI: GPT-4o, GPT-4o-mini, GPT-4.1-mini
- ✅ Anthropic: Claude Haiku 3.5, Claude Sonnet 4, Claude Opus 4
- 🚧 Groq: Coming soon
- 🚧 Mistral: Coming soon
- Length Simplicity: tokens ÷ sentences - measures verbosity
- Readability: Flesch reading ease score - proxy for clarity
- JSON Validity: For structured output prompts
- Create a new provider file in lib/providers/
- Implement the LLMResponse interface
- Add provider configuration to lib/types.ts
- Update the API route in app/api/compare/route.ts
Update the pricing table in lib/types.ts:
export const DEFAULT_PRICING: PricingTable = {
"your-provider:model-name": 0.001, // USD per 1M tokens
// ... other models
};
Cost = (prompt_tokens + completion_tokens) / 1_000_000 * price_per_1M_tokens
- 🟢 Green: < $0.001 per request
- 🟡 Yellow: $0.001 - $0.01 per request
- 🔴 Red: > $0.01 per request
- Length Simplicity: Lower = more concise
- Readability: Higher = easier to read (0-100 scale)
- JSON Validity: Pass/fail for structured outputs
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
We welcome contributions! Please see our Contributing Guide for details.
If you find Duelr useful, please consider:
- Starring the repository
- Reporting bugs and issues
- Suggesting new features
- Contributing code improvements
Built with ❤️ by the open source community
.png)

