The interesting architecture of crt.sh (2018)

3 weeks ago 1

A while back I wrote myself a little dashboard for monitoring TLS certificates for my domains. Right now it works by talking to https://crt.sh/. Sometimes this works great, but sometimes crt.sh is really slow. Plus, it’s another thing that could be compromised.

So, I started looking at how crt.sh works. It’s kinda cool.

There are only 3 separate processes:

  • Cron
    • ct_monitor is program that uses libcurl to get CT log changes and libpq to put them into the database.
  • PostgreSQL
    • certwatch_db is the core web application, written in PL/pgSQL. It even includes the HTML templating and query parameter handling. Of course, there are a couple of things not entirely done in pgSQL…
    • libx509pq adds a set of x509_* functions callable from pgSQL for parsing X509 certificates.
    • libcablintpq adds the cablint_embedded(bytea) function to pgSQL.
    • libx509lintpq adds the x509lint_embedded(bytea,integer) function to pgSQL.
  • Apache HTTPD
    • mod_certwatch is a pretty thin wrapper that turns every HTTP request into an SQL statement sent to PostgreSQL, via…
    • mod_pgconn, which manages PostgreSQL connections.

The interface exposes HTML, ATOM, and JSON. All from code written in SQL.

And then I guess it’s behind an nginx-based load-balancer or somesuch (based on the 504 Gateway Timout messages it’s given me). But that’s not interesting.

The actual website is run from a read-only slave of the master DB that the ct_monitor cron-job updates; which makes several security considerations go away, and makes horizontal scaling easy.

Anyway, I thought it was neat that so much of it runs inside the database; you don’t see that terribly often. I also thought the little shims to make that possible were neat. I didn’t get deep enough in to it to end up running my own instance or clone, but I thought my notes on it were worth sharing.

Read Entire Article