Wikidata is the largest structured open knowledge dataset on the web. It consists of items (indexed by Qid) and properties (indexed by Pid). Knowledge about a item is represented via statements, whose basic structure is: Subject(item), Predicate(property) and Object(item or value). For example, one of Tim Berners-Lee(Q80)'s employers(P108) was CERN(Q42944).
Since the items of interest are spatial temporal, they should have both location and date related properties:
- coordinate(P625), location(P276), street(P669), admin(P131), juri(P1001), country(P17)
- start time(P580), end time(P582), point in time(P585)
Items of no historical significance are ignored, mainly including static objects (like buildings and roads), recurring events (like sports and ceremonies) and etc. Most items' classes(P31) are also remapped to a limited set of classes to make them more manageable.
The json dump (doc) is chosen for import. Importing from gzip compressed dump takes more than 1.6 hours, as the main bottleneck is decompression. Converting the dump to zstd (level 6) decreases the time to less than 50 minutes on my potato PC (i3-3220, 8GB RAM, 7200RPM HDD).
Imported items are stored in a rtree index inside a small SQLite database with each item's coordinate and date as indexed ranges. A rtree index is able to qurey all these ranges at once.
Date is stored as YearMMDD10 in int32. Negative date represents BCE (before the common era). Zero is not used. To keep dates monotonically increase, negative dates of same year are reversed as -Year(12-MM)(31-DD)10.
The resulted databse is directly loaded into browser using wasm.
Frontend realted source code is in this reposity.
Import related source code is located here.
.png)

