Tree Walk Format (TWF): a flat file format to describe your family tree

3 weeks ago 1

twf is a Perl script that can turn a family tree (documented using what I'll call a tree walk format flat file) into a groff/dot diagram. The groff/dot output generated by twf can then be post processed (using dot or any of the other tools in the groff tool set) to generate output in other/more human readable formats, say, for example, pdf, which can show the relationships within a family tree in a more diagrammatic/pictorial form.

An example file (abe.twf) shows the input format used by the twf script and it depicts the relationships documented in the Book of Genesis in the Bible starting from Abraham Adam.

Running make in this directory will generate a pdf file which uses abe.twf as the input file and documents a minimal long linkage from Adam all the way to Abraham's family tree as per the Book of Genesis in the Bible.

An informal grammar that describes the tree walk format using something that resembles YACC/Bison (while borrowing some additional functionality such as the '%%' delimiter and "rule" notation from Perl6/Raku) is shown below:

// a "tree walk format"/twf input file consists of ... twf : ( family '\n' )+ // ... one or more families, listed one per line family : ( person %% '|')+ // a family consists of persons delimited by '|' family : include filename.twf // pull data from the named file (in twf format) person : name ( ',' age )? op // a person has a name and an optional age person : '?' // a person's name may be unknown person : '-' // birth order of remaining children is unknown age : '?' | number '?'? // the age of the person may be unknown ... // ... which is denoted by a '?' or it could be a known number // which may be suffixed with a '?' to denote if it is doubtful number : [0-9]+ // age (at time of death) is just a whole number name : realname ( \s '(' nickname ')' )? // a person may have a different // optional earlier name and/or a nickname realname: [^\,\(\#\|]+ // names can have anything including spaces ... nickname: [^\,\(\#\|\)]+ // .. except some special characters op : [\.\!\^]? // the '.' operator denotes no descendants // a cut/'!' operator indicates a back reference // a forward pointer/'^' operator indicates that // the person is resolved later, not immediately

Note that while doing genealogy research, the age of the deceased may be unknown (and sometimes disputed/unbelievable, such as the ones listed for Abraham and his family) and these can be denoted using '?'.

When a marriage between relatives occurs (for example, when Isaac marries his cousin Rebekah), mathematically, a family "tree" becomes a connected graph. To handle such cases, the cut/'!' operator provides a means of denoting that the current referent may be already embedded in the stack due to an earlier reference in the tree. The cut/'!' operator resolves the looping that can thus result from this, and yes, the cut/'!' operator is "borrowed" from Prolog although there is no backtracking involved here so it is not used in quite the same sense as it is used in Prolog.

In some cases, relationships can be complicated enough (for example, see Lot's family tree) that the cut /'!' operator needs a paired forward pointer/'^' reference operator as well. This operator just postpones the lookup of the person in the immediately following family entry and defers it until a later cut/'!' operator resolves it in the stack.

Turning the twf format into the more well known GEDCOM format is left as an exercise for the reader. The primary (only?) advantage of twf over GEDCOM is that it is meant to be hand editable using any text editor of your choice.

Some background and context regarding the creation of this file format is at: https://ces.mataroa.blog/blog/twf_ftwmd

Read Entire Article