A CRDT-Based Messenger in 12 Lines of Bash Using a Synced Folder

4 hours ago 2
mkdir -p $(dirname $0)/data; cd data

print_messages() {
clear
cat $(ls -tr | tail -n30)
printf "\033[31m$USER:\033[0m "
}
export -f print_messages

watchexec 2> /dev/null -- bash -c "print_messages" &

while read text; do
printf "\033[31m$USER:\033[0m $text\n\n" > "$(uuidgen)"
done

Put that script inside a folder, share the folder with someone via Syncthing or Dropbox or whatever, run it, and you should get this:


This is hardly a Discord killer, but as far as messengers go there are some interesting properties:

  1. There is no central authority or server that "owns" the messages
  2. An offline machine can write new messages that will propagate once it's back online
  3. All participating machines will show the same messages in the same order once they're synced, no matter what

There's nothing really novel about those three things; that's what you get out of the box with Conflict Free Replicated Data Types (CRDTs). So my goal with this blog post is to plant the seed in your mind that CRDTs are just generally cool, and they are very simple.

And even though this little messenger is kinda toy-ish, it's completely solid and I use it to communicate with a (equally nerdy) friend of mine. I've used the same technique to create a time tracker that I can use on different machines without every worrying about being online or things getting out of sync. We're obviously relying on a file sync program to do some heavy lifting here, but because the data is "conflict free", something as simple as rsync or scp would work (and always work) just fine.

The Bash Script

There's not much to it, so I'll run through it quick.

mkdir -p $(dirname $0)/data; cd data

Create a data directory (if needed) and move into it.

print_messages() {
clear
cat $(ls -tr | tail -n30)
printf "\033[31m$USER:\033[0m "
}

Clear the screen, print the contents of the last 30 messages, and then print the mike: prompt. The gross looking \033[31m stuff is just ANSI escape codes to set and unset the color.

Some Bash nonsense to "export a function". Otherwise the watchexec subprocess can't see it.

watchexec 2> /dev/null -- bash -c "print_messages" &

Start up watchexec. Send it's stderr output into /dev/null so it doesn't bother us. Whenever it sees file changes, reprint the messages. The & symbol makes it run in the background so our script can do other things.

BTW, I used watchexec to watch for file changes because it works on Termux, which lets me use this on Android. If you want to use fswatch (which seems more common) instead, replace that line with this:

print_messages
fswatch -o . | while read -r event; do
print_messages
done &

while read text; do
printf "\033[31m$USER:\033[0m $text\n\n" > "$(uuidgen)"
done

Read user input. When they hit Enter, put whatever they wrote into a file. Critically, use a Universally Unique Identifier for that file.

So basically, stuff messages into files that all have UUIDs. If you look inside data you'll see this:

$ ls data
035aa216-3e23-4921-8d14-b79bdc150232 5d07ed32-8f0c-4c88-9a93-f12606d57ea1
04455d8b-b58a-40da-a01a-7631e90ccbd8 6187df26-8a6e-4729-9553-ffe1acf0d45f
1d621700-322e-4ba9-9d66-16a739838adf 650b8c73-9ef5-40e5-8014-a8d97d617f1f

Why This Works

Using UUIDs means that any machine can create files without having to worry about another machine creating an identically-named file and causing a conflict, which would break our whole system.

We can't delete files, because if a machine deletes a file and then tries to sync, it won't be able to tell if it deleted something or if it's just talking to a machine that has a new file (we actually could get away with this because Syncthing keeps a local database to log file deletions, but that's cheating, and simpler tools like scp definitely don't. Plus there's better ways anyway).

Lastly, after two machines exchange files, it's critical that they both can display messages in the same order. Using ls -tr to order the files actually works perfectly, because -tr (order by time, reversed) uses the file modification date, and that gets preserved when copying the files. It's technically possible to create files with the same modification date on two different machines and therefore have an arbitrary ordering, but at least on Linux with most modern filesystems you get billionth-of-a-second granularity, which is more than fine. On a filesystem like FAT32 with 2 second granularity this would very much be a problem.

So, those 3 properties mean that we have created CRDT. CRDTs are just data structures that:

  1. Can be replicated across an arbitrary number of nodes
  2. Can be modified concurrently
  3. Will always converge to the same thing after nodes sync with each other

Specifically, we've created a grow-only set. If we ignored the contents of each file we could still count them with something like ls -1 | wc -l, and that would be an even simpler CRDT called a grow-only counter.

That's what I used in the timer-tracker thing I mentioned earlier. Just add a file with a UUID into a directory called 25_minute_pomodoros, and now you're counting pomodoros.

Edits and Deletions

So an obvious problem is that you can't edit or delete a message. And, in fact, it's fundamental to the design that once you create a new file, you absolutely do not mess with it.

To get around that, you just create more files. So in the Pomodoro example, there's a folder called 25_minute_pomodoros_deletions. If I decide that I want to decrement my Pomodoro counter, just touch 25_minute_pomodoros_deletions/$(uuidgen). Then subtract the number of files in 25_minute_pomodoros_deletions from 25_minute_pomodoros. This is called a positive-negative counter.

For messages, rather than just putting the plain text contents in each file, we could do more structured data like:

message:Hey it's mike what's up?

or

delete:2880dbc8-a2c6-43c0-8f88-e0fb2672755c

or

edit:2880dbc8-a2c6-43c0-8f88-e0fb2672755c:Hey, it's Mike what's up???

We'd then have to actually inspect the contents of each messsage and decide if it should be displayed or if it affects a previous message (so we're well beyond 12 lines of bash at this point) but it doesn't change anything about the properties of the system. Any machine can make those changes freely, and messages will always be rendered the same way.

The Takeaway

The important concept here is that data is stored in one of these very simple CRDT models, and you can use that basic model to "render" whatever data you want.

Flat files and uuidgen is enough to implement the data structure (not saying you should, but cleary you can). The sync part is what's mind blowing. You can sync arbitrarily complex data between an arbitrary number of devices without knowing anything about it. rsync or scp could easily handle this job.

If we were doing the same thing in a more sane way (like, say, storing these messages in a local sqlite database), you can still pump messages between machines without any care for what's in them or what they mean.

Even if you want a dedicated server, the server does not need to know how to render them, so the entirely of the server logic can be: Hey, let's compare messages. Please give me the ones that I'm missing. Here's the ones that you're missing. And you have one endpoint: /sync.

I've been building things with CRDTs for a while now and have developed a real love for what they let you do. I'd love to talk more about them soon, but for now, I hope that's at least a fun introduction for anyone who isn't familiar yet. I really think they're being slept on and I hope more people start using them.


A Little Note: If you actually want to play with this and you're using Syncthing, messages are kinda slow by default. There's a setting in ~/.config/syncthing/config.xml called fsWatcherDelayS. Set it to "1" for the folder you're keeping messages in and it will be much faster. If you're using Google Drive or Dropbox or whatever, you're on your own.

Read Entire Article