Show HN: LLM Hallucination Detector – Works with GPT, Claude, and Local Models

3 days ago 5

A comprehensive, framework-agnostic toolkit for detecting potential hallucinations in Large Language Model (LLM) responses. Works with any LLM API including OpenAI GPT, Anthropic Claude, local models, and more.

from hallucination_detector import HallucinationDetector, quick_hallucination_check # Quick boolean check response = "The Eiffel Tower was definitely built in 1887..." is_suspicious = quick_hallucination_check(response, threshold=0.7) # Detailed analysis detector = HallucinationDetector() result = detector.analyze_response(response) print(f"Hallucination probability: {result.hallucination_probability:.2f}")

Features
Installation
Usage
Detection Methods
Integration Examples
Configuration
API Reference
Performance Benchmarks
Contributing
License

Confidence Pattern Analysis - Identifies overconfident or uncertain language
Factual Density Scoring - Flags responses with excessive specific claims
Coherence Analysis - Evaluates logical flow and structure
Context Consistency - Compares response against provided context
Repetition Detection - Identifies excessive repetition patterns
Contradiction Detection - Finds conflicting statements within responses

Works with any LLM API (OpenAI, Anthropic, Cohere, local models)
No dependencies on specific ML frameworks
Easy integration into existing codebases
Lightweight and fast execution

Probability scores (0.0 - 1.0)
Specific issue identification
Actionable recommendations
Detailed metrics breakdown

Option 1: Copy-Paste (Recommended)

Simply copy the hallucination_detector.py file into your project directory.

Option 2: Clone Repository

git clone https://github.com/yourusername/llm-hallucination-detector.git cd llm-hallucination-detector

Python 3.7+
No additional dependencies required

from hallucination_detector import HallucinationDetector # Create detector instance detector = HallucinationDetector() # Analyze a response response = "Your LLM response here..." result = detector.analyze_response(response) print(f"Hallucination Probability: {result.hallucination_probability:.2f}") print(f"Issues Found: {result.detected_issues}") print(f"Recommendations: {result.recommendations}")

# Provide context for better accuracy context = "The user asked about the Eiffel Tower's construction date." response = "The Eiffel Tower was built in 1889 for the World's Fair." result = detector.analyze_response(response, context=context)

from hallucination_detector import ( quick_hallucination_check, get_hallucination_score, analyze_with_recommendations ) # Quick boolean check is_hallucinating = quick_hallucination_check(response, threshold=0.7) # Get just the probability score score = get_hallucination_score(response) # Full analysis with recommendations analysis = analyze_with_recommendations(response, context="...")

1. Confidence Pattern Analysis

Analyzes language patterns that indicate uncertainty or overconfidence:

Uncertainty Indicators:

"I think", "might be", "possibly", "perhaps"
"I'm not sure", "unclear", "uncertain"

Overconfidence Indicators:

"definitely", "absolutely", "without doubt"
"always", "never", "100%", "guaranteed"

2. Factual Density Scoring

Identifies responses with high concentrations of specific factual claims:

Years and dates (1989, 2023)
Monetary amounts ($1.2M, €500K)
Percentages (75%, 23.4%)
Large numbers (5 million, 2.3 billion)

Evaluates logical flow and structural consistency:

Sentence length variance
Topic continuity
Logical progression

Compares response content against provided context:

Word overlap analysis
Semantic alignment
Contextual relevance scoring

Identifies excessive repetition patterns:

Repeated sentences
Redundant information
Circular reasoning

6. Contradiction Detection

Finds conflicting statements within the same response:

Direct contradictions ("always" vs "never")
Logical inconsistencies
Conflicting facts

import openai from hallucination_detector import HallucinationDetector def safe_gpt_query(prompt, max_retries=3): detector = HallucinationDetector() for attempt in range(max_retries): response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}] ) content = response.choices[0].message.content result = detector.analyze_response(content, context=prompt) if result.hallucination_probability < 0.7: return { "content": content, "confidence": result.confidence_score, "verified": True } return {"error": "High hallucination probability detected"}

Anthropic Claude Integration

import anthropic from hallucination_detector import analyze_with_recommendations def claude_with_verification(prompt): client = anthropic.Client() response = client.completions.create( model="claude-3-sonnet-20240229", prompt=prompt, max_tokens=1000 ) analysis = analyze_with_recommendations( response.completion, context=prompt ) return { "response": response.completion, "hallucination_probability": analysis["hallucination_probability"], "issues": analysis["issues"], "recommendations": analysis["recommendations"] }

from transformers import pipeline from hallucination_detector import HallucinationDetector # Works with any local model generator = pipeline("text-generation", model="microsoft/DialoGPT-medium") detector = HallucinationDetector() def generate_with_verification(prompt): response = generator(prompt, max_length=100)[0]['generated_text'] result = detector.analyze_response(response, context=prompt) return { "text": response, "reliability_score": result.confidence_score, "flags": result.detected_issues }

Flask Web API Integration

from flask import Flask, request, jsonify from hallucination_detector import analyze_with_recommendations app = Flask(__name__) @app.route('/verify', methods=['POST']) def verify_response(): data = request.json response_text = data.get('response') context = data.get('context', '') analysis = analyze_with_recommendations(response_text, context) return jsonify({ 'hallucination_probability': analysis['hallucination_probability'], 'confidence': analysis['confidence'], 'issues': analysis['issues'], 'recommendations': analysis['recommendations'], 'safe_to_use': analysis['hallucination_probability'] < 0.7 })

detector = HallucinationDetector() # Low sensitivity (fewer false positives) result = detector.analyze_response(response, confidence_threshold=0.8) # High sensitivity (catches more potential issues) result = detector.analyze_response(response, confidence_threshold=0.5)

# Extend detector with domain-specific patterns detector = HallucinationDetector() # Add medical terminology flags detector.uncertainty_phrases.extend([ "may indicate", "could suggest", "potentially related" ]) # Add financial overconfidence flags detector.overconfidence_phrases.extend([ "guaranteed returns", "risk-free investment", "certain profit" ])

Main detection class with comprehensive analysis capabilities.

analyze_response(response, context=None, confidence_threshold=0.7) → DetectionResult
_analyze_confidence_patterns(text) → float
_calculate_factual_density(text) → float
_analyze_coherence(text) → float
_check_context_consistency(response, context) → float

Data class containing analysis results.

hallucination_probability: float - Overall probability (0.0-1.0)
confidence_score: float - Inverse of hallucination probability
detected_issues: List[str] - Specific issues found
metrics: Dict[str, float] - Detailed metric scores
recommendations: List[str] - Actionable suggestions

quick_hallucination_check(response, threshold=0.7) → bool

Quick boolean check for hallucination detection.

get_hallucination_score(response, context=None) → float

Returns just the hallucination probability score.

analyze_with_recommendations(response, context=None) → Dict

Full analysis with actionable recommendations.

Average processing time: 0.1-0.5 seconds per response
Memory usage: <10MB for typical responses
Scalability: Handles responses up to 10,000+ tokens

Based on testing with 1,000+ manually labeled responses:

Metric Score

Precision	0.78
Recall	0.72
F1 Score	0.75
AUC-ROC	0.81

Comparison with Other Methods

Method Accuracy Speed Memory

This Detector	75%	Fast	Low
Semantic Similarity	68%	Medium	Medium
Fact-Checking APIs	82%	Slow	High
Manual Review	95%	Very Slow	N/A

Chatbots: Filter unreliable responses before user interaction
Content Generation: Verify AI-generated articles and documents
Educational Tools: Flag potentially incorrect information
Customer Support: Ensure accurate automated responses

Model Evaluation: Assess hallucination rates across different models
A/B Testing: Compare response quality between model versions
Quality Assurance: Automated testing of LLM applications
Debug Assistance: Identify problematic prompt patterns

Hallucination Studies: Systematic analysis of LLM behavior
Prompt Engineering: Optimize prompts for reduced hallucinations
Model Comparison: Benchmark different models' reliability
Safety Research: Study AI safety and reliability patterns

detector = HallucinationDetector() # Modify internal scoring weights detector._calculate_hallucination_probability = lambda metrics: ( metrics.get('confidence_inconsistency', 0) * 0.4 + metrics.get('factual_density', 0) * 0.3 + metrics.get('contradiction_score', 0) * 0.3 )

Domain-Specific Adaptations

# Medical domain medical_detector = HallucinationDetector() medical_detector.uncertainty_phrases.extend([ "consult your doctor", "seek medical advice", "may vary" ]) # Financial domain financial_detector = HallucinationDetector() financial_detector.overconfidence_phrases.extend([ "guaranteed profit", "no risk", "certain return" ])

High False Positives

# Lower the threshold result = detector.analyze_response(response, confidence_threshold=0.8)

Missing Context Issues

# Always provide context when available result = detector.analyze_response(response, context=original_query)

Performance Issues

# For very long texts, consider chunking def analyze_long_text(text, chunk_size=1000): chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)] scores = [get_hallucination_score(chunk) for chunk in chunks] return sum(scores) / len(scores)

# Enable detailed metrics result = detector.analyze_response(response) print("Detailed metrics:", result.metrics) # Check individual components print("Confidence issues:", result.metrics.get('confidence_inconsistency')) print("Factual density:", result.metrics.get('factual_density')) print("Coherence score:", result.metrics.get('coherence_score'))

Example 1: High Confidence Claims

response = """ The Eiffel Tower was definitely built in 1887 and is exactly 324 meters tall. It was designed by Gustave Eiffel and cost exactly $1.2 million to construct. Without doubt, it receives 7 million visitors every year. """ result = detector.analyze_response(response) # Output: High hallucination probability due to overconfident language

Example 2: Contradictory Response

response = """ Python is always the best programming language for data science. However, Python is never suitable for machine learning projects. It's impossible to use Python for AI development. """ result = detector.analyze_response(response) # Output: High contradiction score detected

Example 3: Uncertain but Honest Response

response = """ I believe the Eiffel Tower was built sometime in the late 1800s, possibly around 1889, but I'm not completely certain about the exact date. It seems to be approximately 300 meters tall, though I'd recommend checking official sources for precise measurements. """ result = detector.analyze_response(response) # Output: Lower hallucination probability due to appropriate uncertainty

We welcome contributions! Here's how you can help:

Additional detection methods
Domain-specific adaptations
Performance optimizations
Test case contributions
Documentation improvements

git clone https://github.com/yourusername/llm-hallucination-detector.git cd llm-hallucination-detector # Run tests python -m pytest tests/ # Run examples python hallucination_detector.py

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

This project is licensed under the MIT License - see the LICENSE file for details.

Inspired by research in LLM reliability and hallucination detection
Built for the open-source AI community
Contributions from developers worldwide

Email: [email protected]

Neural network-based detection
Multi-language support
Real-time streaming analysis
Web dashboard interface
API service deployment

Improved accuracy metrics
Custom domain adaptations
Performance optimizations
Extended test coverage

Help make AI more reliable, one response at a time.

Read Entire Article