TeraMD: A complete Markdown parser in ~1100 lines of Python

2 hours ago 2

A lightweight, single-file recursive descent Markdown parser built for fun that converts Markdown text into a structured Abstract Syntax Tree (AST) and can generate HTML output. Easy to add custom syntax.

Demo Output

TeraMD supports a comprehensive set of Markdown syntax elements:

  • Headings (# to ######) - All six levels of ATX-style headings
  • Paragraphs - Standard text blocks with inline formatting
  • Horizontal Rules - Thematic breaks using ---
  • Lists - Both ordered (1., 2., etc.) and unordered (*, -) with arbitrary nesting
  • Blockquotes - Single and nested blockquotes using >
  • Code Blocks - Both fenced (```) and indented (4+ spaces) code blocks with language specification
  • Tables - GitHub Flavored Markdown (GFM) style tables with alignment support
  • Math Blocks - LaTeX-style math using $$...$$ or \[...\] delimiters
  • Emphasis - Italic text using *text*
  • Strong - Bold text using **text**
  • Inline Code - Code spans using `code`
  • Inline Math - Mathematical expressions using $...$ or \(...\)
  • Links - Standard links with [text](url) syntax
  • Images - Image embedding with ![alt](src) syntax
  • Footnotes - Inline footnote references [^label] with definitions [^label]: content
  • Escape Sequences - Backslash escaping for special characters
  • HTML Generation - Built-in HTML emitter with proper escaping and semantic markup
  • Position Tracking - Every AST node includes position information for debugging
  1. Lexer - Tokenizes input text into a stream of typed tokens (symbols, text, digits, etc.)
  2. Parser - Builds an Abstract Syntax Tree using recursive descent parsing
  3. Emitter - Traverses the AST to generate HTML

The entire parser is contained in a single Python file (teramd.py) with no external dependencies beyond the standard library.

from teramd import TeraMDParser, emit_html # Parse Markdown text into AST parser = TeraMDParser() document = parser.parse(markdown_text) # Generate HTML html_output = emit_html(document)

The repository includes a demonstration:

  • demo.md - Showcase document demonstrating all supported Markdown features
  • demo.py - Script that parses the demo file and generates styled HTML output

Run the demo:

This will parse demo.md and generate demo.html with a complete HTML document including CSS styling and KaTeX for mathematical expressions.

Read Entire Article