Show HN: PreflightLLMCost – Try to Predict LLM API Costs Before Execution

4 months ago 6

A cost forecasting tool for LLM API calls, implementing research-based prediction algorithms to estimate token usage and costs before execution.

This project addresses the challenge of unpredictable LLM API costs in large-scale applications. While not perfect, it represents an effort to advance preflight cost estimation using insights from recent academic research on LLM response length prediction capabilities.

The tool implements a three-tier prediction system that combines heuristic analysis, statistical modeling, and research-informed algorithms to provide cost estimates with confidence intervals.

The implementation draws from several key papers:

Response Length Perception and Sequence Scheduling (Zheng et al., 2023) - 86% throughput improvement through length perception
Emergent Response Planning in LLMs (Dong et al., ICML 2025) - Hidden state encoding of global response attributes
Precise Length Control in Large Language Models (Butcher et al., 2024) - LDPE achieving <3 token precision
Zero-Shot Strategies for Length-Controllable Summarization (Retkowski & Waibel, NAACL 2025) - Length approximation strategies

Input Template → Template Sampler → Tokenizer → Prediction Engine → Cost Calculator → Output

Component Purpose Implementation

Template Sampler	Generate prompt variations	Jinja templates + CSV/JSON data
Tokenizer Engine	Count tokens accurately	tiktoken with model-specific encoders
Prediction Engine	Estimate completion length	3-tier cascade system
Statistical Analysis	Quantify uncertainty	Bootstrap confidence intervals
Pricing Engine	Calculate costs	Multi-provider pricing with auto-updates

Tier 1: Enhanced Heuristics

Response type classification (8 categories)
Length complexity analysis
Controlled variance injection

Tier 2: Emergent Regression

Multi-dimensional feature extraction
L2-regularized optimization
Historical data learning

Tier 3: Hidden State Analysis

Global attribute encoding
LDPE-inspired corrections
Weighted confidence scoring

git clone https://github.com/aatakansalar/preflightllmcost cd preflightllmcost pip install -e .

pip install preflightllmcost

# Simple prediction preflightllmcost predict "Summarize {{content}}" \ --variables '{"content": "sample text"}' \ --model gpt-3.5-turbo # CSV data processing preflightllmcost predict "Analyze {{task}}" \ --data examples/sample_data.csv \ --model gpt-4 \ --budget 10.00 # Enhanced prediction with academic features preflightllmcost predict "Reason through {{problem}}" \ --variables '{"problem": "complex analysis"}' \ --tier2 --tier3 \ --accuracy 0.10 \ --confidence 0.99

from preflightllmcost import CostPredictor, TemplateConfig, PredictionConfig # Configuration template_config = TemplateConfig( template="Write analysis of {{topic}}", variables={"topic": ["AI", "ML", "DL"]}, sample_count=50 ) prediction_config = PredictionConfig( model="gpt-4-turbo", use_tier2=True, # Enable regression use_tier3=True, # Enable hidden state analysis accuracy_target=0.15 ) # Execute prediction predictor = CostPredictor(prediction_config) report = predictor.predict_cost(template_config, budget=5.0) print(f"Expected cost: ${report.cost_usd.mean:.6f}") print(f"Budget exceeded: {report.budget_exceeded}")

Cost Prediction Report ┌─────────────────┬─────────┬─────────┬───────────────┬─────────────┐ │ Metric │ Mean │ Std Dev │ 95% CI │ Worst Case │ ├─────────────────┼─────────┼─────────┼───────────────┼─────────────┤ │ Prompt Tokens │ 45.2 │ 8.1 │ (42.1, 48.3) │ 61.4 │ │ Completion Tokens│ 152.8 │ 23.4 │ (144.2, 161.4)│ 199.6 │ │ Cost (USD) │ $0.000891│ $0.000127│ ($0.000851, $0.000931)│ $0.001145│ └─────────────────┴─────────┴─────────┴───────────────┴─────────────┘ Model: gpt-3.5-turbo | Method: enhanced_heuristic | Budget: ✅ Within limits

{ "cost_usd": { "mean": 0.000891, "confidence_interval": [0.000851, 0.000931], "worst_case": 0.001145 }, "prediction_method": "enhanced_heuristic", "budget_exceeded": false }

Variables: JSON object with fixed or list values
CSV/JSON Files: Real data for template rendering
Variable Lengths: Synthetic text generation by character count

Accuracy Target: MAPE threshold (0.08-0.25)
Confidence Level: Statistical confidence (0.95-0.999)
Tier Selection: Enable/disable prediction methods
Bootstrap Samples: Statistical robustness (200-1000)

preflightllmcost models # List all supported models and pricing

Providers: OpenAI, Anthropic, Google
Auto-pricing: Weekly updates from vendor APIs

# GitHub Actions - name: Cost validation run: | preflightllmcost predict "Process {{item}}" \ --data batch_data.csv \ --budget 100.00 \ --model gpt-4 # Exits with code 1 if budget exceeded

Metric Target Achieved

Processing Speed	<300ms for 1000 rows	✅
Accuracy (Standard)	≤25% MAPE	✅
Accuracy (Enhanced)	≤15% MAPE	✅
Precision Control	<3 token variance	⚪
Memory Usage	O(n) complexity	✅

The tool maintains local caches in ~/.preflightllmcost/:

prices.yaml - Model pricing data (auto-updated)
history.db - Historical usage for regression learning

response_types = { "reasoning": ["step by step", "analyze", "because"], "json": ["json", "format", "structure"], "summary": ["summarize", "brief", "overview"], # ... 8 total categories }

Bootstrap Confidence Intervals: Non-parametric estimation
Multi-metric Validation: MAPE + variance stability
Worst-case Analysis: Conservative μ + 2σ projections