Show HN: PreflightLLMCost – Try to Predict LLM API Costs Before Execution

4 months ago 6

A cost forecasting tool for LLM API calls, implementing research-based prediction algorithms to estimate token usage and costs before execution.

This project addresses the challenge of unpredictable LLM API costs in large-scale applications. While not perfect, it represents an effort to advance preflight cost estimation using insights from recent academic research on LLM response length prediction capabilities.

The tool implements a three-tier prediction system that combines heuristic analysis, statistical modeling, and research-informed algorithms to provide cost estimates with confidence intervals.

The implementation draws from several key papers:

Input Template → Template Sampler → Tokenizer → Prediction Engine → Cost Calculator → Output
Component Purpose Implementation
Template Sampler Generate prompt variations Jinja templates + CSV/JSON data
Tokenizer Engine Count tokens accurately tiktoken with model-specific encoders
Prediction Engine Estimate completion length 3-tier cascade system
Statistical Analysis Quantify uncertainty Bootstrap confidence intervals
Pricing Engine Calculate costs Multi-provider pricing with auto-updates

Tier 1: Enhanced Heuristics

  • Response type classification (8 categories)
  • Length complexity analysis
  • Controlled variance injection

Tier 2: Emergent Regression

  • Multi-dimensional feature extraction
  • L2-regularized optimization
  • Historical data learning

Tier 3: Hidden State Analysis

  • Global attribute encoding
  • LDPE-inspired corrections
  • Weighted confidence scoring
git clone https://github.com/aatakansalar/preflightllmcost cd preflightllmcost pip install -e .
pip install preflightllmcost
# Simple prediction preflightllmcost predict "Summarize {{content}}" \ --variables '{"content": "sample text"}' \ --model gpt-3.5-turbo # CSV data processing preflightllmcost predict "Analyze {{task}}" \ --data examples/sample_data.csv \ --model gpt-4 \ --budget 10.00 # Enhanced prediction with academic features preflightllmcost predict "Reason through {{problem}}" \ --variables '{"problem": "complex analysis"}' \ --tier2 --tier3 \ --accuracy 0.10 \ --confidence 0.99
from preflightllmcost import CostPredictor, TemplateConfig, PredictionConfig # Configuration template_config = TemplateConfig( template="Write analysis of {{topic}}", variables={"topic": ["AI", "ML", "DL"]}, sample_count=50 ) prediction_config = PredictionConfig( model="gpt-4-turbo", use_tier2=True, # Enable regression use_tier3=True, # Enable hidden state analysis accuracy_target=0.15 ) # Execute prediction predictor = CostPredictor(prediction_config) report = predictor.predict_cost(template_config, budget=5.0) print(f"Expected cost: ${report.cost_usd.mean:.6f}") print(f"Budget exceeded: {report.budget_exceeded}")
Cost Prediction Report ┌─────────────────┬─────────┬─────────┬───────────────┬─────────────┐ │ Metric │ Mean │ Std Dev │ 95% CI │ Worst Case │ ├─────────────────┼─────────┼─────────┼───────────────┼─────────────┤ │ Prompt Tokens │ 45.2 │ 8.1 │ (42.1, 48.3) │ 61.4 │ │ Completion Tokens│ 152.8 │ 23.4 │ (144.2, 161.4)│ 199.6 │ │ Cost (USD) │ $0.000891│ $0.000127│ ($0.000851, $0.000931)│ $0.001145│ └─────────────────┴─────────┴─────────┴───────────────┴─────────────┘ Model: gpt-3.5-turbo | Method: enhanced_heuristic | Budget: ✅ Within limits
{ "cost_usd": { "mean": 0.000891, "confidence_interval": [0.000851, 0.000931], "worst_case": 0.001145 }, "prediction_method": "enhanced_heuristic", "budget_exceeded": false }
  • Variables: JSON object with fixed or list values
  • CSV/JSON Files: Real data for template rendering
  • Variable Lengths: Synthetic text generation by character count
  • Accuracy Target: MAPE threshold (0.08-0.25)
  • Confidence Level: Statistical confidence (0.95-0.999)
  • Tier Selection: Enable/disable prediction methods
  • Bootstrap Samples: Statistical robustness (200-1000)
preflightllmcost models # List all supported models and pricing

Providers: OpenAI, Anthropic, Google
Auto-pricing: Weekly updates from vendor APIs

# GitHub Actions - name: Cost validation run: | preflightllmcost predict "Process {{item}}" \ --data batch_data.csv \ --budget 100.00 \ --model gpt-4 # Exits with code 1 if budget exceeded
Metric Target Achieved
Processing Speed <300ms for 1000 rows
Accuracy (Standard) ≤25% MAPE
Accuracy (Enhanced) ≤15% MAPE
Precision Control <3 token variance
Memory Usage O(n) complexity

The tool maintains local caches in ~/.preflightllmcost/:

  • prices.yaml - Model pricing data (auto-updated)
  • history.db - Historical usage for regression learning
response_types = { "reasoning": ["step by step", "analyze", "because"], "json": ["json", "format", "structure"], "summary": ["summarize", "brief", "overview"], # ... 8 total categories }
  • Bootstrap Confidence Intervals: Non-parametric estimation
  • Multi-metric Validation: MAPE + variance stability
  • Worst-case Analysis: Conservative μ + 2σ projections
  • Adaptive Regression: Improves with historical data
  • Feature Engineering: Multi-dimensional token relationships
  • Correlation Thresholds: Quality-controlled model selection
  • Predictions are estimates based on statistical patterns
  • Accuracy depends on prompt similarity to training patterns
  • New model variants may require calibration period
  • Complex reasoning tasks show higher variance

MIT License - See LICENSE file for details.

Read Entire Article