UK Postcode Binary Encoder

2 hours ago 1

A high-performance, ultra-compact binary format for UK postcode geolocation data. Compresses 1.79M postcodes from ~50MB CSV to just 6.2MB binary (88% compression) with 920K+ lookups/second and O(1) exact lookups.

🌐 Try the Live Demo →

The implementation is split into focused, reusable classes:

  • PostcodeEncoder - Main orchestrator class that coordinates the encoding process
  • PostcodeNormalizer - Handles postcode validation, normalization, and inward ID encoding
  • BitmapManager - Manages 4000-bit presence bitmaps and rank directories for each outward
  • BinaryFileWriter - Handles the binary file format layout and writing
  • BitWriter - Utility for bit-packing coordinate deltas
  • Ultra-Compact: 88% compression ratio (50MB → 6.2MB) with 7.8/8.0 entropy efficiency
  • Blazing Fast: 920,000+ lookups/second with <1ms per lookup latency
  • O(1) Lookups: Constant-time exact postcode lookup via bitmap + rank directory + bit extraction
  • Efficient Queries: Fast prefix enumeration of all postcodes in an outward (e.g., "ST6")
  • Memory Optimized: <10MB memory footprint, designed for direct memory mapping
  • High Precision: 1e-4° coordinate precision with <1m average error (0.35m typical)
  • Type Safe: Full TypeScript implementation with comprehensive error handling
import { PostcodeEncoder } from "./src/PostcodeEncoder"; const encoder = new PostcodeEncoder("postcodes.csv"); encoder.build("postcodes.bin");

Querying (Binary Database)

import * as fs from "fs"; import { PostcodeClient } from "./src/PostcodeClient"; // Load the binary database const buffer = fs.readFileSync("postcodes.pcod"); const client = new PostcodeClient(buffer); // Exact postcode lookup (O(1)) const result = client.lookup("ST6 5AA"); if (result) { console.log(`${result.postcode}: ${result.lat}, ${result.lon}`); } // Enumerate all postcodes in an outward (prefix query) const postcodes = client.enumerateOutward("ST6"); console.log(`Found ${postcodes.length} postcodes in ST6`); // Get database statistics const stats = client.getStats(); console.log( `Database: ${stats.totalPostcodes} postcodes, ${stats.fileSize} bytes` ); // Check if postcode exists const isValid = client.isValidPostcode("M1 1AA"); // Find outwards by prefix const manchesterOutwards = client.findNearbyOutwards("M");

This project uses the ONS Postcode Database that can be downlaod from https://geoportal.statistics.gov.uk/datasets/e14b1475ecf74b58804cf667b6740706, Once you have downloaded the dataset you can put the full ONS_{month}_{year}_UK.csv file in the data/ directory and run the following command to extract just the valid postcodes, lat and long values into a file called postcodes.csv, this file is what you can then use to build the binary database

  1. Ensure that you have a file called postcodes.csv in the root directory, it should contain 3 columns only
  2. Execute the yarn run build command, you should see an output like { totalOutwards: 2943, totalPostcodes: 1790884, fileSize: 6509848 } and a new file called postcodes.pcod apear in the root of the directory.

Performance & Compression

Metric Value Details
Input Size ~50MB Raw CSV with 1.79M postcodes
Output Size 6.2MB Binary format (.pcod file)
Compression Ratio 88% 8:1 compression ratio
Entropy Efficiency 7.8/8.0 Near-optimal bit utilization
Coordinate Precision 1e-4° ~11m resolution at UK latitudes
Average Error 0.35m Quantization error vs source
Operation Performance Details
Exact Lookup 920K+ ops/sec O(1) constant time
Latency <1ms Per lookup on M3 Mac
Memory Usage <10MB Total runtime footprint
Prefix Query Variable Depends on outward size
Build Time 1.65s For 1.79M postcodes

Below is the output from yarn ts-node scripts/build-and-test.ts on a Mac Pro M3, which builds the database and validates every postcode lookup:

🏗️ Building postcode database... ✅ Build completed in 1.65s 🧪 Loading database and starting tests... 🔍 Testing postcode lookups... ============================================================ POSTCODE DATABASE TEST REPORT ============================================================ 📊 PROCESSING STATISTICS: Total rows processed: 1,790,884 Valid rows: 1,790,884 Invalid rows: 0 Invalid rate: 0.00% 🔍 LOOKUP STATISTICS: Successful lookups: 1,790,884 Failed lookups: 0 Lookup success rate: 100.00% 📍 COORDINATE ACCURACY: Coordinate mismatches: 0 Accuracy rate: 100.00% Max coordinate error: 0.66m Average coordinate error: 0.35m ⏱️ PERFORMANCE: Build time: 1.65s Test time: 1.95s Total time: 3.59s Lookup rate: 920,763 lookups/sec

The binary format achieves exceptional compression through:

  • Bit-packed coordinate deltas: Variable-width encoding based on sector ranges
  • Adaptive storage modes: Bitmap vs list encoding based on density
  • Quantized coordinates: 1e-4° precision (sufficient for postcode accuracy)
  • Entropy optimization: 7.8/8.0 bits/byte efficiency (97.5% of theoretical maximum)

Entropy Distribution:

  • 95.8% of data chunks achieve >7.5 bits/byte entropy
  • Coordinate data sections: 7.8 bits/byte (highly compressed)
  • Index sections: 6.3 bits/byte (structured but efficient)

Comparison with Traditional Approaches

Approach File Size Lookup Speed Memory Usage Precision
This Binary Format 6.2MB 920K ops/sec <10MB 0.35m avg
Raw CSV 50MB ~1K ops/sec* 50MB+ Source precision
SQLite Database ~50MB ~10K ops/sec* 50MB+ Source precision
JSON Format ~120MB ~5K ops/sec* 120MB+ Source precision
Compressed CSV (gzip) ~30MB ~500 ops/sec* 50MB+ Source precision

*Estimated performance for linear/indexed searches without specialized optimization

Key Advantages:

  • 8× smaller than raw CSV
  • 92× faster than typical database lookups
  • 5× smaller than compressed alternatives
  • Zero dependencies - pure binary format
  • Memory mappable - no loading overhead
  • Deterministic performance - O(1) guaranteed
+======================================================================================+ | PCDB v3 FILE | +======================================================================================+ | HEADER (32 bytes) | | 0x00 magic: 'PCDB' (char[4]) | | 0x04 version: u8 (=3) | | 0x05 flags: u8 (=0) | | 0x06 outwardCount: u16 | | 0x08 totalUnitCount: u32 | | 0x0C latOffset: i32 (global min latInt) | | 0x10 lonOffset: i32 (global min lonInt) | | 0x14 reserved[12] (zeros) | +======================================================================================+ | OUTWARD INDEX TABLE (9 bytes x outwardCount) | | | | Entry i (9 bytes): | | +--------------------------------------+ | | | outwardCode: char[4] (ASCII, NUL-pad) | | | | sectorCount: u8 | | | | sectorIndexOffset: u32 (ABSOLUTE) ---+-----> points to start of Outward Block i | | +--------------------------------------+ | | | +======================================================================================+ | OUTWARD BLOCK i (variable) | | (begins at index[i].sectorIndexOffset, ABSOLUTE) | | | | +--------------------------------------------------------------------------------+ | | | SECTOR TABLE (14 bytes x sectorCount) | | | | | | | | SectorEntry s (14 bytes): | | | | +-----------------------------------------------------------------------+ | | | | | sectorNumber : u8 (0..9) | | | | | | unitCount : u16 | | | | | | unitsRelOff : u24 (LE) -------------------------------+------------+ | | | | | (RELATIVE to Outward Block|start) | | | | | baseLatStored: i24 (LE) = sector.latMin - header.latOffset | | | | | | baseLonStored: i24 (LE) = sector.lonMin - header.lonOffset | | | | | | flagsAndBits : u16 (LE) | | | | | | bit0 = 1 (coordinates are bit-packed) | | | | | | bit1 = 1:list mode | 0:bitmap mode | | | | | | bits2..6 = bitsLat (5 bits) | | | | | | bits7..15 = bitsLon (9 bits) | | | | | +-----------------------------------------------------------------------+ | | | +--------------------------------------------------------------------------------+ | | | | +--------------------------------------------------------------------------------+ | | | UNIT DATA BLOBS (concatenated, one per sector, same order as sector table) | | | | | | | | Blob for SectorEntry s at: (OutwardBlockStart + unitsRelOff[s]) | | | | +-----------------------------+ immediately followed by +-------------+ | | | | | UNIT PRESENCE |---------------------------->| COORD STREAM | | | | | +-----------------------------+ +-------------+ | | | | | | | | UNIT PRESENCE (choose ONE): | | | | * Bitmap mode: EXACTLY 85 bytes (680 bits). Bit set => unitIndex present. | | | | byte = unitIndex/8, bit = unitIndex%8. | | | | * List mode: Varint sequence of unit indices using delta coding: | | | | first value = absolute first unitIndex (varint) | | | | subsequent values = delta to previous index (varint) | | | | (Encoder picks list if total varint bytes < 85) | | | | | | | | COORD STREAM (bit-packed, no per-value padding): | | | | For each of unitCount units (in ascending unitIndex order): | | | | write latDelta : unsigned, width = bitsLat | | | | write lonDelta : unsigned, width = bitsLon | | | | Total length = ceil(unitCount x (bitsLat + bitsLon) / 8) bytes | | | | | | | | Reconstruct per unit: | | | | latInt = header.latOffset + baseLatStored + latDelta | | | | lonInt = header.lonOffset + baseLonStored + lonDelta | | | | lat = latInt / 100000.0 | | | | lon = lonInt / 100000.0 | | | +--------------------------------------------------------------------------------+ | | | +======================================================================================+ | OUTWARD BLOCK i+1 ... (repeat) | +======================================================================================+
  • All multi-byte integers are LITTLE-ENDIAN.
  • Coordinates are quantized: lat/lon x 100000 -> integer (latInt/lonInt).
  • Ordering:
    • Outwards sorted lexicographically
    • Sectors sorted by sectorNumber (0-9)
    • Units sorted by unitIndex (ascending)
  • Offsets:
    • sectorIndexOffset is ABSOLUTE (file offset).
    • unitsRelOff is RELATIVE to the start of its Outward Block (i.e., sector table start).
  • Capacity:
    • Bitmap presence uses 85 bytes = 680 bits (code implements 680 bits).
  • bitsLat/bitsLon are minimal widths for sector-local deltas:
    • latDelta = unit.latInt - sector.latMin
    • lonDelta = unit.lonInt - sector.lonMin

🚀 Live Demo: https://robertpitt.github.io/postcode-db/

A modern web interface is available in the playground/ directory. This provides a user-friendly way to search postcodes through a web browser.

# Install dependencies yarn install # Start development server yarn dev # Build for production yarn build:web

Visit http://localhost:3000 to use the web interface.

  • Real-time search: Type partial or full postcodes for instant results
  • Responsive design: Works on desktop and mobile
  • Modern UI: Built with Tailwind CSS and Font Awesome icons
  • Map integration: Direct links to Google Maps
  • GitHub Pages ready: Automatic deployment workflow included

The project includes a GitHub Actions workflow that automatically:

  1. Builds the postcode database
  2. Compiles the web interface
  3. Deploys to GitHub Pages

Simply push to the main branch to trigger deployment.

Read Entire Article