Watch the entire English language blossom from Wiktionary + Google Books N-grams, rendered as a living, breathing prefix galaxy.
Overview
What you’re seeing is a timelapse of English vocabulary growth from 1800 to 2019. Each node represents a letter prefix, and its size reflects how many words with that prefix have appeared up to that year. The layout is stable over time so your eye can track change.
We built this from two public datasets. Wiktionary provides a list of English lemmas. Google Books 1-grams provides yearly counts and volumes. We combine them to estimate a robust first year for each word, then accumulate by prefixes up to six letters.
A lemma is the canonical or dictionary form of a set of related words. For a given set of forms of a word, the lemma is the base form.
English Lexicon Time Machine visualizes the evolution of the English language through time. By combining Wiktionary lemmas with Google Books N-gram data, we trace when words first appeared and render their growth as a radial prefix trie that expands across decades.
Key Features
- Zero-config setup – ./setup.sh spins up the virtualenv, fetches every dataset, caches the heavy lifts, and ships final MP4/GIF output
- Radial growth cinematics – the trie erupts from the core alphabet, framing decades of linguistic evolution as a neon fractal
- Repeatable science – every artifact (lemmata, first-year inference, trie counts, layouts) checkpoints to disk and into a reusable tarball for instant re-renders
- Battle-tested – streams 26 full 1-gram shards, handles 1.4GB Wiktionary dumps, and renders 220 frames in glorious 1080p
Quickstart
The script will:
- Create/upgrade venv/ with Python 3
- Download Wiktionary + Google Books 1-gram shards (a–z)
- Extract English lemmas, infer first-use years, aggregate prefix counts
- Render 220 radial frames (outputs/frames/frame-0000.png → frame-0219.png)
- Encode outputs/english_trie_timelapse.mp4 and a share-ready GIF
Rerun the script anytime—artifact caching means future passes jump straight to rendering.
Pipeline Architecture
| Lemma extraction | src/ingest/wiktionary_extract.py | artifacts/lemmas/lemmas.tsv |
| First-year inference | src/ingest/ngram_first_year.py | artifacts/years/first_years.tsv |
| Prefix aggregation | src/build/build_prefix_trie.py | artifacts/trie/prefix_counts.jsonl |
| Layout generation | src/viz/layout.py | artifacts/layout/prefix_positions.json |
| Frame rendering | src/viz/render_frames.py | outputs/frames/ |
| Encoding | src/viz/encode.py | outputs/english_trie_timelapse.mp4 + .gif |
Render Only (after initial run)
Use flags such as --min-radius, --max-radius, --base-edge-alpha, or --start-progress to tune the visualization.
Neo4j Integration
Load artifacts/years/first_years.tsv to explore the word data in Neo4j (compatible with both Community and Enterprise editions):
Documentation
- Getting Started – Quick setup guide
- Methodology – How the visualization works
- Step-by-Step Guide – Detailed instructions for each stage
- Advanced Tuning – Parameter customization options
- Interpreting Results – Understanding the visualization
- Troubleshooting – Common issues and solutions
- GitHub Organization – More helpful resources
- GitHub Repository – Source code and issues
- X Community – Join discussions on Knowledge Graphs, GNNs, and Graph Databases
.png)

