Extremely lightweight universal grammar implementation with provable recursion
A mathematically rigorous, recursively complete language model that fits in under 50kB with zero runtime dependencies. Built on Chomsky's Minimalist Grammar theory with formal verification and empirical validation.
- Our Model: 0.05 MB (50 KB)
- GPT-3: 700,000 MB
- Ratio: We're 14,000,000x smaller!
Yet we still provide provable recursion, next-token prediction, and formal verification. See how we did it →
🧮 Mathematically Proven: Formal proofs of recursive capability using Coq
⚡ Ultra-Lightweight: Complete implementation under 50kB binary
🔬 Scientifically Validated: Tested with standard linguistic benchmark suites
🏗️ Universal Grammar: Based on Chomsky's Minimalist Grammar theory
♾️ Provably Recursive: Generates a^n b^n patterns, proving non-regularity
🤖 Probabilistic Language Model: Next-token prediction with formal guarantees
- Rust 1.70+ (for compilation)
- Git (for cloning)
- Python 3.8+ (optional, for probabilistic language model)
Read First: Recursive Language Overview
- What is recursion in language?
- Why does it matter?
- How does this implementation work?
Next: Chomsky's Mathematical Proofs
- The 1956 proof that changed linguistics
- How finite-state grammars fail
- Why recursion is mathematically necessary
Then: Formal Language Theory
- Grammar hierarchies and automata
- Minimalist Grammar operations
- Complexity theory and parsing
Implementation: Atomic Language Model
Validation: NLP Verification Methods
- Agreement test suites
- Colorless green tests
- Performance benchmarking
Advanced: Machine Verification
- Coq proof development
- Mechanized theorem proving
- Mathematical rigor
This project demonstrates the full journey from mathematical theory to practical implementation:
📖 The Recursive Story - The complete narrative connecting all pieces
We've extended the atomic language model with probabilistic next-token prediction capabilities while maintaining all formal guarantees:
- 🎲 Weighted Grammar Rules: Each production has learned probabilities
- 🔮 Next-Token Prediction: Monte Carlo sampling for language modeling
- 🔄 Hybrid Architecture: Combines Rust validation with Python inference
- 🌐 REST API: Flask server for easy integration
- 📦 Still Ultra-Light: <100kB total with all features
This extension bridges formal grammar theory with practical NLP applications, creating the world's smallest formally verified language model.
- ✅ Formal Specification: Complete mathematical definition of operations
- ✅ Coq Proofs: Machine-verified theorems about recursive properties
- ✅ Non-regularity Proof: Constructive demonstration via a^n b^n generation
- ✅ Complexity Bounds: Polynomial parsing with exponential generation capacity
- ✅ Merge Operation: Merge(α:=_X β, X:γ) = ⟨X, [], [α, γ]⟩
- ✅ Move Operation: Implements wh-movement and feature checking
- ✅ Feature System: Categories, selectors, and movement features
- ✅ Minimalist Compliance: Based on Chomsky's Minimalist Program
- ✅ Zero Dependencies: No runtime requirements
- ✅ Size Optimized: ~35kB binary with full functionality
- ✅ Memory Efficient: <256kB peak usage for 20-word sentences
- ✅ Fast Parsing: Polynomial-time complexity O(n³)
- ✅ Agreement Tests: Subject-verb agreement across center-embedding (Linzen et al. 2016)
- ✅ Colorless Green: Syntactic processing with semantic anomalies (Gulordava et al. 2018)
- ✅ Recursive Capability: Demonstrates unbounded center-embedding
- ✅ Performance Metrics: Comprehensive benchmarking framework
This implementation demonstrates that:
- Recursive universal grammar is implementable in extremely constrained environments
- Mathematical theory and practical engineering can be unified effectively
- Formal verification and empirical testing provide complementary validation
- Chomsky's insights about recursion remain relevant for modern AI systems
We welcome contributions! See Contributing Guidelines
Great starting points:
- 📖 Improve documentation and examples
- 🧪 Add more linguistic test cases
- ⚡ Optimize performance and memory usage
- 🔬 Extend Coq formalization
- 🌍 Test on additional languages
If you use this work in research, please cite:
MIT License - see LICENSE for details.
Built on the mathematical foundations of:
- Noam Chomsky: Recursive language theory and Minimalist Grammar
- Edward Stabler: Formal implementation of Minimalist Grammars
- Linzen et al.: Agreement test methodology
- Gulordava et al.: Colorless green evaluation framework
Glossary of Concepts → A comprehensive A-Z guide to all key terms and concepts in the documentation
Built with mathematical rigor. Validated through empirical testing. Optimized for practical use.
.png)

