Show HN: LLM Hallucination Detector – Works with GPT, Claude, and Local Models

3 days ago 5

Python  MIT Code Quality

A comprehensive, framework-agnostic toolkit for detecting potential hallucinations in Large Language Model (LLM) responses. Works with any LLM API including OpenAI GPT, Anthropic Claude, local models, and more.

from hallucination_detector import HallucinationDetector, quick_hallucination_check # Quick boolean check response = "The Eiffel Tower was definitely built in 1887..." is_suspicious = quick_hallucination_check(response, threshold=0.7) # Detailed analysis detector = HallucinationDetector() result = detector.analyze_response(response) print(f"Hallucination probability: {result.hallucination_probability:.2f}")
  • Confidence Pattern Analysis - Identifies overconfident or uncertain language
  • Factual Density Scoring - Flags responses with excessive specific claims
  • Coherence Analysis - Evaluates logical flow and structure
  • Context Consistency - Compares response against provided context
  • Repetition Detection - Identifies excessive repetition patterns
  • Contradiction Detection - Finds conflicting statements within responses
  • Works with any LLM API (OpenAI, Anthropic, Cohere, local models)
  • No dependencies on specific ML frameworks
  • Easy integration into existing codebases
  • Lightweight and fast execution
  • Probability scores (0.0 - 1.0)
  • Specific issue identification
  • Actionable recommendations
  • Detailed metrics breakdown

Option 1: Copy-Paste (Recommended)

Simply copy the hallucination_detector.py file into your project directory.

Option 2: Clone Repository

git clone https://github.com/yourusername/llm-hallucination-detector.git cd llm-hallucination-detector
  • Python 3.7+
  • No additional dependencies required
from hallucination_detector import HallucinationDetector # Create detector instance detector = HallucinationDetector() # Analyze a response response = "Your LLM response here..." result = detector.analyze_response(response) print(f"Hallucination Probability: {result.hallucination_probability:.2f}") print(f"Issues Found: {result.detected_issues}") print(f"Recommendations: {result.recommendations}")
# Provide context for better accuracy context = "The user asked about the Eiffel Tower's construction date." response = "The Eiffel Tower was built in 1889 for the World's Fair." result = detector.analyze_response(response, context=context)
from hallucination_detector import ( quick_hallucination_check, get_hallucination_score, analyze_with_recommendations ) # Quick boolean check is_hallucinating = quick_hallucination_check(response, threshold=0.7) # Get just the probability score score = get_hallucination_score(response) # Full analysis with recommendations analysis = analyze_with_recommendations(response, context="...")

1. Confidence Pattern Analysis

Analyzes language patterns that indicate uncertainty or overconfidence:

Uncertainty Indicators:

  • "I think", "might be", "possibly", "perhaps"
  • "I'm not sure", "unclear", "uncertain"

Overconfidence Indicators:

  • "definitely", "absolutely", "without doubt"
  • "always", "never", "100%", "guaranteed"

2. Factual Density Scoring

Identifies responses with high concentrations of specific factual claims:

  • Years and dates (1989, 2023)
  • Monetary amounts ($1.2M, €500K)
  • Percentages (75%, 23.4%)
  • Large numbers (5 million, 2.3 billion)

Evaluates logical flow and structural consistency:

  • Sentence length variance
  • Topic continuity
  • Logical progression

Compares response content against provided context:

  • Word overlap analysis
  • Semantic alignment
  • Contextual relevance scoring

Identifies excessive repetition patterns:

  • Repeated sentences
  • Redundant information
  • Circular reasoning

6. Contradiction Detection

Finds conflicting statements within the same response:

  • Direct contradictions ("always" vs "never")
  • Logical inconsistencies
  • Conflicting facts
import openai from hallucination_detector import HallucinationDetector def safe_gpt_query(prompt, max_retries=3): detector = HallucinationDetector() for attempt in range(max_retries): response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}] ) content = response.choices[0].message.content result = detector.analyze_response(content, context=prompt) if result.hallucination_probability < 0.7: return { "content": content, "confidence": result.confidence_score, "verified": True } return {"error": "High hallucination probability detected"}

Anthropic Claude Integration

import anthropic from hallucination_detector import analyze_with_recommendations def claude_with_verification(prompt): client = anthropic.Client() response = client.completions.create( model="claude-3-sonnet-20240229", prompt=prompt, max_tokens=1000 ) analysis = analyze_with_recommendations( response.completion, context=prompt ) return { "response": response.completion, "hallucination_probability": analysis["hallucination_probability"], "issues": analysis["issues"], "recommendations": analysis["recommendations"] }
from transformers import pipeline from hallucination_detector import HallucinationDetector # Works with any local model generator = pipeline("text-generation", model="microsoft/DialoGPT-medium") detector = HallucinationDetector() def generate_with_verification(prompt): response = generator(prompt, max_length=100)[0]['generated_text'] result = detector.analyze_response(response, context=prompt) return { "text": response, "reliability_score": result.confidence_score, "flags": result.detected_issues }

Flask Web API Integration

from flask import Flask, request, jsonify from hallucination_detector import analyze_with_recommendations app = Flask(__name__) @app.route('/verify', methods=['POST']) def verify_response(): data = request.json response_text = data.get('response') context = data.get('context', '') analysis = analyze_with_recommendations(response_text, context) return jsonify({ 'hallucination_probability': analysis['hallucination_probability'], 'confidence': analysis['confidence'], 'issues': analysis['issues'], 'recommendations': analysis['recommendations'], 'safe_to_use': analysis['hallucination_probability'] < 0.7 })
detector = HallucinationDetector() # Low sensitivity (fewer false positives) result = detector.analyze_response(response, confidence_threshold=0.8) # High sensitivity (catches more potential issues) result = detector.analyze_response(response, confidence_threshold=0.5)
# Extend detector with domain-specific patterns detector = HallucinationDetector() # Add medical terminology flags detector.uncertainty_phrases.extend([ "may indicate", "could suggest", "potentially related" ]) # Add financial overconfidence flags detector.overconfidence_phrases.extend([ "guaranteed returns", "risk-free investment", "certain profit" ])

Main detection class with comprehensive analysis capabilities.

  • analyze_response(response, context=None, confidence_threshold=0.7) → DetectionResult
  • _analyze_confidence_patterns(text) → float
  • _calculate_factual_density(text) → float
  • _analyze_coherence(text) → float
  • _check_context_consistency(response, context) → float

Data class containing analysis results.

  • hallucination_probability: float - Overall probability (0.0-1.0)
  • confidence_score: float - Inverse of hallucination probability
  • detected_issues: List[str] - Specific issues found
  • metrics: Dict[str, float] - Detailed metric scores
  • recommendations: List[str] - Actionable suggestions

quick_hallucination_check(response, threshold=0.7) → bool

Quick boolean check for hallucination detection.

get_hallucination_score(response, context=None) → float

Returns just the hallucination probability score.

analyze_with_recommendations(response, context=None) → Dict

Full analysis with actionable recommendations.

  • Average processing time: 0.1-0.5 seconds per response
  • Memory usage: <10MB for typical responses
  • Scalability: Handles responses up to 10,000+ tokens

Based on testing with 1,000+ manually labeled responses:

Metric Score
Precision 0.78
Recall 0.72
F1 Score 0.75
AUC-ROC 0.81

Comparison with Other Methods

Method Accuracy Speed Memory
This Detector 75% Fast Low
Semantic Similarity 68% Medium Medium
Fact-Checking APIs 82% Slow High
Manual Review 95% Very Slow N/A
  • Chatbots: Filter unreliable responses before user interaction
  • Content Generation: Verify AI-generated articles and documents
  • Educational Tools: Flag potentially incorrect information
  • Customer Support: Ensure accurate automated responses
  • Model Evaluation: Assess hallucination rates across different models
  • A/B Testing: Compare response quality between model versions
  • Quality Assurance: Automated testing of LLM applications
  • Debug Assistance: Identify problematic prompt patterns
  • Hallucination Studies: Systematic analysis of LLM behavior
  • Prompt Engineering: Optimize prompts for reduced hallucinations
  • Model Comparison: Benchmark different models' reliability
  • Safety Research: Study AI safety and reliability patterns
detector = HallucinationDetector() # Modify internal scoring weights detector._calculate_hallucination_probability = lambda metrics: ( metrics.get('confidence_inconsistency', 0) * 0.4 + metrics.get('factual_density', 0) * 0.3 + metrics.get('contradiction_score', 0) * 0.3 )

Domain-Specific Adaptations

# Medical domain medical_detector = HallucinationDetector() medical_detector.uncertainty_phrases.extend([ "consult your doctor", "seek medical advice", "may vary" ]) # Financial domain financial_detector = HallucinationDetector() financial_detector.overconfidence_phrases.extend([ "guaranteed profit", "no risk", "certain return" ])

High False Positives

# Lower the threshold result = detector.analyze_response(response, confidence_threshold=0.8)

Missing Context Issues

# Always provide context when available result = detector.analyze_response(response, context=original_query)

Performance Issues

# For very long texts, consider chunking def analyze_long_text(text, chunk_size=1000): chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)] scores = [get_hallucination_score(chunk) for chunk in chunks] return sum(scores) / len(scores)
# Enable detailed metrics result = detector.analyze_response(response) print("Detailed metrics:", result.metrics) # Check individual components print("Confidence issues:", result.metrics.get('confidence_inconsistency')) print("Factual density:", result.metrics.get('factual_density')) print("Coherence score:", result.metrics.get('coherence_score'))

Example 1: High Confidence Claims

response = """ The Eiffel Tower was definitely built in 1887 and is exactly 324 meters tall. It was designed by Gustave Eiffel and cost exactly $1.2 million to construct. Without doubt, it receives 7 million visitors every year. """ result = detector.analyze_response(response) # Output: High hallucination probability due to overconfident language

Example 2: Contradictory Response

response = """ Python is always the best programming language for data science. However, Python is never suitable for machine learning projects. It's impossible to use Python for AI development. """ result = detector.analyze_response(response) # Output: High contradiction score detected

Example 3: Uncertain but Honest Response

response = """ I believe the Eiffel Tower was built sometime in the late 1800s, possibly around 1889, but I'm not completely certain about the exact date. It seems to be approximately 300 meters tall, though I'd recommend checking official sources for precise measurements. """ result = detector.analyze_response(response) # Output: Lower hallucination probability due to appropriate uncertainty

We welcome contributions! Here's how you can help:

  • Additional detection methods
  • Domain-specific adaptations
  • Performance optimizations
  • Test case contributions
  • Documentation improvements
git clone https://github.com/yourusername/llm-hallucination-detector.git cd llm-hallucination-detector # Run tests python -m pytest tests/ # Run examples python hallucination_detector.py
  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

This project is licensed under the MIT License - see the LICENSE file for details.

  • Inspired by research in LLM reliability and hallucination detection
  • Built for the open-source AI community
  • Contributions from developers worldwide
  • Neural network-based detection
  • Multi-language support
  • Real-time streaming analysis
  • Web dashboard interface
  • API service deployment
  • Improved accuracy metrics
  • Custom domain adaptations
  • Performance optimizations
  • Extended test coverage

Help make AI more reliable, one response at a time.

Read Entire Article