What if probability didn't have to be calculated—what if it could flow?
Galton Lab is a research playground that reimagines how neural networks make predictions. Instead of computing probability distributions the traditional way (softmax over thousands of options), we let probability flow through learned geometric landscapes—like water finding its way downhill.
🎮 Try the Interactive Demos — Learn the concepts from physics to transformers and beyond in your browser!
Think about how a Galton board works: you drop a ball, it bounces off pegs, and eventually lands in a bucket. The pattern of where balls land creates a probability distribution through physics, not arithmetic.
We're applying this idea to machine learning:
Traditional approach:
Our approach:
When a model is very confident ("The capital of France is ___"), probes converge quickly → fast prediction. When uncertain, probes spread out → the model automatically takes more time. No manual tuning required—it emerges from the physics.
You can literally see confidence by watching how probes move. Tight convergence = confident. Spread out = uncertain.
Instead of opaque probability numbers, you get trajectories you can visualize. You can watch probability mass flow toward the winning token and understand why it won.
No need to compute probabilities for every token in your vocabulary. The geometry guides probes to likely regions automatically.
The core insight—probability as geometric flow—applies far beyond token prediction. Anywhere you use softmax to make categorical choices, you can replace it with learned flow fields.
Working Examples:
- 🖼️ Image Classification — CNN → 2D flow field → digits
- 🔍 Attention Mechanism — Replace softmax(QK^T) with probe routing
- 🎮 RL Policies — Action selection through geometric flow
See examples/ for runnable code with visualizations, and docs/use-cases.md for 8+ application domains.
The pattern is universal:
And get uncertainty quantification, interpretability, and adaptive compute for free.
The intuitive starting point
Digital versions of physical Galton boards with learnable "pegs" that guide probes left or right. Simple to understand, easy to visualize, and surprisingly effective for small vocabularies.
- Probes drop through a grid of learned biases
- Each row nudges probes toward likely tokens
- Adaptive: stops when one bucket gets enough mass
- Hierarchical variants for scaling to larger vocabularies
Files: src/galton_lab/board.py, experiments/hierarchical_compare.py
The scalable evolution
When discrete boards hit their limits, we move to continuous flow. Probes now follow smooth trajectories on a ring (torus topology), guided by a learned velocity field.
- Represents probability as a flow through continuous space
- Uses ODEs (Ordinary Differential Equations) integrated with RK2
- Learned using neural SDFs (Signed Distance Fields)
- Scales to real vocabularies while staying differentiable
Files: src/galton_lab/ode/, galton/train.py
- Context composers (src/galton_lab/composers.py): Map input context → probability landscapes
- Training tools (galton/train.py, tests/): GPU-ready training loops with warm-start presets
- Visualizations (src/galton_lab/visualize.py): Watch probability flow in real-time
- Documentation (Galton.md, docs/char32_ode_warmstart.md): Deep dives into the theory and practice
See it in action (discrete boards):
Train a model (continuous ODE sampler):
| --device auto | Use GPU if available, else CPU |
| --amp | Mixed precision training (faster on GPU) |
| --sampler ode | Use continuous flow instead of discrete board |
| --warm-start-preset char32 | Use proven initialization for character models |
| --auto-handoff | Automatically transition from warm-start to sharpening phase |
| --compile | JIT compile with PyTorch 2.0+ (even faster) |
Discrete Galton Board:
Continuous ODE Sampler:
-
Warm Start: Begin with soft, spread-out probability landscapes
- High sigma (σ=0.9) for wide Gaussian windows
- Directional bias to break symmetry
- Knowledge distillation from a simple "teacher" model
-
Auto Handoff: System detects when model is confident
- Monitors margin (gap between top choices)
- Checks if target token probability is sufficient
-
Sharpening: Tighten the focus
- Reduce sigma (σ=0.5) for narrower peaks
- Remove training wheels (bias, distillation)
- Pure cross-entropy optimization
- Scale to production vocabularies (10k-50k tokens) using hierarchical routing
- Integrate with real transformers as a drop-in softmax replacement
- Stochastic variants (SDEs) for better exploration during training
- Comparative benchmarks against standard sampling methods
- Language models with built-in uncertainty quantification
- Reinforcement learning with interpretable policy flows
- Structured generation where grammar rules shape the probability landscape
- Any domain with periodic structure (audio, time series, molecular conformations)
- Can geometric flow matching replace all categorical distributions?
- Does this connect to diffusion models, optimal transport, or energy-based learning?
- Can we prove convergence guarantees for the adaptive compute property?
- Interactive Demos — Two interactive journeys:
- Foundation Demo — Physics to transformers (6 stages)
- Beyond LLMs Demo — Image classification, attention, RL (4 stages)
- examples/ — Runnable Python examples: image classification, attention, RL policies
- docs/use-cases.md — 8+ application domains beyond language models
- Galton.md — The complete origin story: from 4am idea to working prototype
- docs/char32_ode_warmstart.md — Deep technical dive on the continuous ODE sampler
- experiments/ — Additional experiments with visualizations
- tests/ — Unit tests that double as usage examples
This is an active research project. We welcome:
- Experiments — Try it on new tasks and share results
- Visualizations — Make the flow more intuitive
- Theory — Connect to related mathematical frameworks
- Critique — Tell us where this breaks or why it won't scale
Open an issue to discuss ideas or submit a PR with improvements.
If you build on this work, please cite:
MIT — See LICENSE file for details.
"In a world full of edges, be a torus."
.png)
