
Introduction
In my last blog, I went over multiple serialization solutions for Ruby including two JSON based flavors (Oj and the standard library JSON gem) alongside CBOR and MessagePack as binary alternatives.
If you recall my verdict, MessagePack came out on top regarding encoded size and encoding performance, while the JSON gem was the best when it came to decoding performance. Leaving the question regarding which serializer to use for your apps with the infamous “it depends” answer.
Today I am introducing a preview release of TinyBits, my take on serializing JSON like objects in a schema-less, binary format.
What is TinyBits
TinyBits is a C library that implements a serializer and a de-serializer from a specific set of data types to a binary format and back, supported data types are:
- Integer Numbers using in64 capacity
- Floating point numbers using IEEE 754 double precision (including NaN, +/-INF)
- Strings
- Blobs (strings with binary data)
- Booleans
- NULL values
- Arrays
- Maps (key/value pairs)
- Datetime (stored as seconds since epoch, with fractional data and a time zone offset)
TinyBits packs those values in a very tight binary format, achieved mostly due to:
- Single byte type headers
- Integer compression using the SQLite4 (yes SQLite4, not SQLite3) varint format.
- Floating point compression via scaling + varint storage.
- Strings deduplication using backward referencing (this is very effective with arrays of maps with similar keys).
Design Tradeoffs
TinyBits makes the following design tradeoffs:
- Documents with repeated strings, like arrays of hashes (e.g. returned from a database) will have the best size reduction
- Size reduction is balanced to not affect encoding performance very much, even if we end up slightly larger
- Decoding performance is prioritized over encoding performance
- Small to medium sized documents are prioritized over larger ones
Currently a Ruby extension of TinyBits is available as a Ruby gem. Source code is available here. The C implementation can be found here.
Usage
TinyBits is straight forward to use. You just need to install the gem.
Then you can either use the class methods
Or you can use the faster object interface
If you use the packer/unpacker objects to pack/unpack individual documents then you don’t need to reset the packer between invocations.
Compactness
Let’s look at an example to see how compact TinyBits can be compared to other solutions.
This simple document translates to the following sizes:
| JSON | 163 bytes |
| CBOR | 134 bytes |
| MessagePack | 133 bytes |
| TinyBits | 80 bytes |
As can be seen, TinyBits was able to compress the above document greatly since it had multiple repeated strings, integer and floating point numbers.
Needless to say, if the document had less redundancy, the difference will be less profound, for example:
Which translates to the following sizes:
| JSON | 50 bytes |
| CBOR | 48 bytes |
| MessagePack | 48 bytes |
| TinyBits | 35 bytes |
The difference between TinyBits and the other binary formats here is due to the presence of the value 3.1, which takes 9 bytes (1 tag + 8) for CBOR and MessagePack to store, while it takes only 2 bytes for TinyBits to store.
This is how the data looks like in hex for the three binary formats
| CBOR | a467757365725f696476573636f7265fb408cccccccccccd626870fb404bd9999999999a646e616d656564726f6964 |
| MessagePack | 84a7757365725f69647a573636f7265cb408cccccccccccda26870cb404bd9999999999aa46e616d65a564726f6964 |
| TinyBits | 1447757365725f6964874573636f7265211f42687021f23d446e616d654564726f6964 |
Just look how similar the CBOR and MessagePack sequences are. And even the TinyBits sequence is very similar, it differs in the middle though where the numbers 3.1 and 55.7 are rendered.
If we take our data set from the last blog, which had 11 different document types, 10 records each, these are average document sizes when serialized using the different serializers:
Fast Encoding
TinyBits strives to streamline the encoding process, and it manages to do so. Of course there are some overheads associated with string deduplication and integer/floating point compression, but these were kept at minimum to achieve speeds that are on par or even better than other binary encoders.
These are the average encoding rates of the 11 document types mentioned above
| Oj | 63,872 docs/sec |
| JSON | 73,877 docs/sec |
| CBOR | 105,488 docs/sec |
| MessagePack | 124,081 docs/sec |
| TinyBits | 135,054 docs/sec |

As can be seen, for the test data set, TinyBits is the fastest encoder with ~9% advantage over MessagePack while doing a lot more work to compress and deduplicate.
Fast Decoding
As a nice side effect of its compact encoded size, the decoding process is much faster, and thanks to string deduplication, TinyBits doesn’t need to create many Ruby strings. As a result we see the following decoding performance.
| Oj | 25,281 docs/sec |
| JSON | 40,590 docs/sec |
| CBOR | 32,126 docs/sec |
| MessagePack | 35,428 docs/sec |
| TinyBits | 55,880 docs/sec |

In this test, the binary formats generally lag behind the standard library JSON parser. But TinyBits beats the JSON parser in decoding, delivering ~38% more performance.
Compressibility
Encoding to smaller sizes is one thing, ultimately delivering the smallest size is another. Case in point, MessagePack usually encodes to smaller size than JSON, which is great if you are sending the data as is. If you try to compress the data further though, it turns out that in most cases, specially when using Zstd, JSON compresses to a smaller absolute size. This is also true of CBOR.
TinyBits, on the other hand, stays very competitive with JSON compressibility, trading blows with it over the documents in the data set and edging it slightly on average. If you are using LZ4 specifically, then TinyBits is always more compressible than JSON.
Here are the average sizes of the documents when compressed using LZ4 and Zstd
| JSON | 3,096 bytes | 1,339 bytes | 952 bytes |
| CBOR | 2,625 bytes | 1,294 bytes | 992 bytes |
| MessagePack | 2,239 bytes | 1,294 bytes | 1,012 bytes |
| TinyBits | 1,458 bytes | 1160 bytes | 942 bytes |
Note that for these documents, raw TinyBits, without any compression applied, is just 53% larger than the smallest compressed format (JSON + Zstd).
Memory Usage
As we have seen before most of these serializers are memory efficient. Still, TinyBit’s Ruby extension allocates the lest amount of memory in all the serializers tested, even if by a small margin
| Oj | 3.21 KBs/doc |
| JSON | 4.19 KBs/doc |
| CBOR | 2.81 KBs/doc |
| MessagePack | 2.74 KBs/doc |
| TinyBits | 1.53 KBs/doc |
| Oj | 11.13 KBs/doc |
| JSON | 10.55 KBs/doc |
| CBOR | 16.46 KBs/doc |
| MessagePack | 10.5 KBs/doc |
| TinyBits | 9.62 KBs/doc |
An All Rounder
As we have seen TinyBits, at least for the provided data set, offers the best encoding/decoding performance, all while delivering the smallest encoded sizes without sacrificing compressibility in the process.

In summary TinyBits is:
- ~9% faster than the fastest existing schema-less encoder (MessagePack)
- ~38% faster than the fastest existing schema-less decoder (stdlib JSON)
- All while producing over 35% reduction in size compared to the most compact format (MessagePack)
- Not to mention that it compresses as good as, if not slightly better than JSON, which is usually the most compressible format
End to end performance
If you try to think of the advantage for data transmission (assuming a 100Mbps connection):
| JSON | 13.54 | 336.2 | 24.64 | 274.37 |
| CBOR | 9.48 | 200.29 | 31.13 | 240.89 |
| MessagePack | 8.06 | 201.32 | 28.23 | 237.60 |
| TinyBits | 7.40 | 111.24 | 17.9 | 136.54 |

In this (hypothetical) scenario, TinyBits delivers 45% time savings over the second best alternative (MessagePack)
Bonus Feature: Multi-Object Packing (experimental)
The TinyBits Ruby gem comes with a feature that was purpose built for my use case, but I believe it could be useful for others as well.
- The packer can stitch multiple objects into the same buffer
- The objects can be added to the buffer at separate points in time (e.g. once each is generated)
- The objects will all share the same deduplication dictionary
- The unpacker can unpack these objects one after the other
I had a specific need for this feature as I wanted to:
- Capture objects on the fly as they are being generated, without copying them or keeping them around
- Benefit from shared strings across those different objects
- Still be able to unpack each object individually on the other end
Here’s an example code snippet
State Of Development
The TinyBits C library and the format spec are almost finalized. Currently only an (experimental) Ruby extension is available, the Python extension is also progressing nicely and work is underway to produce extensions for other languages and platforms. Of course any help in that regard will be greatly appreciated.
Conclusion
Born from a real need to encode->transmit->decode schema-less data efficiently, TinyBits turns out to be the most space efficient and generally the fastest of the existing options for serializing schema-less data. Give it a try and see for yourself!
.png)


