Rust serialization - RustStack.Org

Serialization has always been a strong point of Rust. In particular, Serde was available well before Rust 1.0.0 was released (though the derive macro was unstable until 1.15.0). The idea behind this is to use traits to decouple the objects and (de)serialize from the serialization format — a very powerful idea. Format writers only need to implement Serde’s (de)serializer traits and users can #[derive(Serialize, Deserialize)] to get serialization for their objects, regardless of the format.

Of course, there are format-specific crates, such as protocol buffers, bincode, FlatBuffers, etc. Those can offer good compile-time and runtime performance, but they lock the data into their respective protocols, often with implementations available in other languages. For many uses, and especially in polyglot environments, this is an acceptable tradeoff.

In this guide, we’ll zoom in on both kinds of frameworks, considering API usability and performance. While I’m sure you’ll find plenty of value in examining this juxtaposition, make no mistake: we are comparing apples to oranges.

For our benchmark, we’ll use this relatively simple data structure (please don’t use this for anything in production):

pub enum StoredVariants {
    YesNo(bool),
    Small(u8),
    Signy(i64),
    Stringy(String),
}

pub struct StoredData {
    pub variant: StoredVariants,
    pub opt_bool: Option<bool>,
    pub vec_strs: Vec<String>,
    pub range: std::ops::Range<usize>,
}

The benchmark will then serialize and deserialize a slice of those StoredDatas. We’ll measure the time it takes to compile the benchmark, as well as the time to bytes and back.

Serde, the incumbent serialization/deserialization library, is elegant, flexible, fast to run, and slow to compile.

The API has three facets. The Serialize and Deserialize traits bridge the formats to the data structures the user might want to serialize and deserialize. You rarely need to implement those traits manually since Serde comes with powerful derive macros that allow many tweaks to the output format. For example, you can rename fields, define defaults for missing values on deserialization, and more. Once you have derived or implemented those traits, you can use the format crates, such as serde_json or bincode, to (de)serialize your data.

The third facet is the Serializer and Deserializer traits that format crates need to implement to (de)serialize arbitrary data structures. This means reducing the problem of serializing N data structures to M formats from M × N to M + N.

Because Serde relies heavily on monomorphisation to facilitate great performance despite its famous flexibility, compile time has been an issue from the beginning. To counter this, multiple crates have appeared, from miniserde, to tinyserde, to nanoserde. The idea behind these tools is to use runtime dispatch to reduce code bloat due to monomorphisation.

Our benchmark will serialize, then deserialize our data structure to a Vec<u8> with sufficient capacity to ensure allocation for the output does not disturb the benchmark.

serde_json crate allows serialization to and deserialization from JSON, which is plain text and thus (somewhat) readable, at the cost of some overhead during parsing and formatting.

Serializing can be done with to_string, to_vec, or to_writer with _pretty-variants to write out nicely formatted instead of minified JSON. For deserializing, serde_json has from_reader, from_string, and from_vec. serde_json also has its own Value type with to_value and from_value functions.

Of course, it’s not the fastest approach. Serializing StoredData takes about a quarter of a microsecond on my machine. Deserializing takes less than half a microsecond. The minified data takes up 100 bytes. Prettyfied, it becomes 153 bytes. The overhead will vary depending on the serialized types and values. For example, a boolean might ideally take up 1 bit, but it will take at least 4 bytes to serialize to JSON (true, not counting the key).

YAML’s name is a recursive acronym: YAML ain’t markup language. This is another textual format with multiple language bindings like JSON, but it’s very idiosyncratic. Serializing takes about 3 microseconds and deserializing takes about 7, so it’s more than 10 times slower than JSON for this particular workload. The size is 99 bytes, comparable with JSON.

Using YAML probably only makes sense if your data is already in YAML.

Rusty Object Notation is a new, Rust-derived textual format. It’s a wee bit terser than JSON at 91 bytes, but slower to work with. Serialization took a tad more than half a microsecond; deserializing took roughly two microseconds. This is slower than JSON, but not egregiously so.

Like serde_json, bincode also works with serde to serialize or deserialize any types that implement the respective traits. Since bincode uses a binary format, there are obviously no String-related methods. serialize creates a Vec<u8> and serialize_into takes a &mut Writer along with the value to serialize.

Because Writer is implemented for a good deal of types (notably &mut Vec<u8>, File, and TcpStream), this simple API gives us plenty of bang for the buck. Deserialization works similarly with deserialize, which takes a &[u8], and deserialize_from takes any type that implements Read.

A small downside of all this genericity is that when you get the arguments wrong, the type errors may be confusing. Case in point: when I wrote the benchmark, I forgot a &mut at one point and had to look up the implementors of the Write trait before I found the solution.

Performance-wise, it is, predictably, faster than serde_json. Serializing took roughly 35 nanoseconds and deserializing a bit less than an eighth of a microsecond. The serialized data ended up taking 58 bytes.

MessagePack is another binary format with multiple language bindings. It prides itself on being very terse on the wire, which our benchmark case validated: the data serialized to just 24 bytes.

The Rust implementation of the MessagePack protocol is called rmp, which also works with serde. The interface (apart from the small differences in naming) is the same as the above. Its thriftiness when it comes to space comes with a small performance overhead compared to bincode. Serializing takes a tenth of a microsecond, while deserializing comes at roughly one-third of a microsecond.

jsonway provides an interface to construct a serde_json JsonValue. However, there is no derive macro, so we have to implement the serialization by hand.

struct DataSerializer<'a> {
        data: &'a StoredData,
}

impl Serializer for DataSerializer<'_> {
        fn root(&self) -> Option<&str> { None }
        fn build(&self, json: &mut jsonway::ObjectBuilder) {
                let data = &self.data;
                use StoredVariant::*;
                json.object("variant", |json| match data.variant {
                        YesNo(b) => { json.set("YesNo", b) },
                        Small(u) => { json.set("Small", u) },
                        Signy(i) => { json.set("Signy", i) },
                        Stringy(ref s) => { json.set("Stringy", s) },
                });
                match data.opt_bool {
                         Some(true) => json.set("opt_bool", true),
                         Some(false) => json.set("opt_bool", false),
                         None => {},
                };
                json.array("vec_strs", |json| json.map(data.vec_strs.iter(), |s| s[..].into()));
                json.object("range", |json| {
                        json.set("start", data.range.start);
                        json.set("end", data.range.end);
                });
        }
}

Note that this will only construct a serde_json::Value, which is pretty fast (to the tune of only a few nanoseconds), but not exactly a serialized object. Serializing this cost us about 5 millis, which is far slower than using serde_json directly.

Concise Binary Object Representation (CBOR) is another binary format that in our test came out on the larger side, taking 72 bytes. Serialization was speedy enough at roughly 140 nanoseconds, but deserialization was, unexpectedly, slower at almost half a millisecond.

The Postcard crate was built for use in embedded systems. At 41 bytes, it’s a good compromise between size and speed, because at 60 nanoseconds to serialize and 180 nanoseconds to deserialize, it’s roughly 1.5x slower than bincode, at roughly 70 percent of the message size.

The relatively fast serialization and the thrifty format are a natural fit for embedded systems. MessagePack might overtax the embedded CPU, whereas we often have a beefier machine to deserialize the data.

FlexBuffers is a FlatBuffers-derived, multilanguage, binary, schemaless format. In this benchmark, it performed even worse than RON for serialization and worse than JSON for deserialization. That said, the format is as compact as Postcard with 41 bytes.

I would only use this if I had to work with a service that already uses it. For a freely chosen polyglot format, both JSON and msgpack best it in every respect.

There are other Serde-based formats, but those are mainly to interface to existing systems — e.g., Python’s Pickle format, Apache Hadoop’s Avro, or DBus’ binary wire format.

From Google comes a polyglot serialization format with bindings to C, C++, Java, C#, Go, Lua, and Rust, among others. To bring them all together, FlatBuffers has its own interface definition language, which you’ll have to learn (I had to learn it while writing this).

It’s got structs, enums (which are C-style single-value enums), unions, and, for some reason, tables. Interestingly, the latter are FlatBuffers’ main way to define types. They work like Rust structs with all-Optional fields. Besides this, structs are the same, only with nonoptional fields. This is done to facilitate interface evolution.

Apart from that, the basic types are mostly there, only with different names:

Rust	FlatBuffers
`u8`, `u16`, `u32`, `u64`	`uint8`, `uint16`, `uint32`, `uint64`
`i8`, `i16`, `i32`, `i64`	`int8`, `int16`, `int32`, `int64`
`bool`	`bool`
`String`	`string`
`[T; N]`, e.g., `[String; 5]`	`[T:N]`, e.g., `[string:5]`
`Vec<T>`	`[T]`

FlatBuffer’s unions are akin to Rust enums but they always have a NONE variant and all variants must be tables, presumably to allow for changes later on.

Below is the StoredData from above as a FlatBuffers schema, which goes into a storeddate.fbs file. The fbs extension stands for “FlatBuffer Schema.”

file_identifier "SDFB";

table Bool {
    value: bool;
}

table Uint8 {
    value: uint8;
}

table Int64 {
    value: int64;
}

table String {
    value: string;
}

union StoredVariants {
    Bool,
    Uint8,
    Int64,
    String
}

struct Range {
    start: uint64;
    end: uint64;
}

table StoredData {
    variant_content: StoredVariants;
    opt_bool: bool;
    vec_strs: [String];
    range: Range;
}

root_type StoredData;

FlatBuffers appears to lack a direct representation of pointer-length integers (e.g., usize nor of Ranges), so in this example, I just picked uint64 and an array of length 2 to represent them. This is less than ideal on 32-bit machines. The documentation also tells us that FlatBuffers will store integer values as little endian, so on big endian machines, this will cost you in terms of performance. But that’s the price to pay for being network widely applicable.

Compiling this to Rust code requires the flatc, which is available as a Windows binary. It may also be in your Linux distribution’s repository. Otherwise, it can be built from source. To build it, you’ll need bazel, a build tool developed by Google.

After installing that, you can clone the FlatBuffers repo (I used the 1.12.0 release) and build it with bazel build flatc.

Once the build is done, the executable will be at bazel-out/k8-fastbuild/bin/flatc. Putting it on the $PATH allows the following command line.

flatc --rust stored_data.fbs

Now we have a storeddata_generated.rs file we can include! in our code. Obviously, this is not our original data structure, but for the sake of comparability, we’ll benchmark the serialization and deserialization via this slightly modified type. Deserialization is basically a memcpy, so it’s nearly free. However, this is actually misleading since the accessors do all the actual work. To account for this, I added code to convert FlatBuffer’s types into our own type on deserialization.

This crate is a slight misnomer, because it really is an abomination. Using it is definitely unsafe, also likely unsound. Using it for anything but benchmarking to measure the maximum theoretical performance of serialization and deserialization is downright inadvisable. You have been warned.

Abomonation does not use serde and has its own Abomonation trait to both serialize and deserialize any data instead. What it really does is basically a memcpy, plus fixing the occasional pointer so it can handle things like Vec and String. However, for now, it lacks handling of BTreeMap, HashMap, VecDeque, and other standard data structures — including Range, which we use in our StoredData. I cloned the repository and set up a pull request to implement the Abomonation trait for Range. Until it’s merged, I’ll use my own branch in this benchmark.

For deserialization, we need to keep the data alive as long as we want to use the decoded values because Abomonatio won’t copy the memory — it’ll opt to reuse it in place. This also means we have to copy the data on each benchmark run. Otherwise, we would corrupt the data.

While the resulting data takes up 116 bytes, it is indeed very fast. Serialization takes a bit more than 15ns, and deserialization take just a smidgen more than 10ns, even with the additional copy.

Again, please be warned before using it in production. For many types, it is actually unsound, and every time you use it, the coding gods kill a sweet little crab. Or something.

The following table shows all formats with the serialized size and rounded time to serialize/deserialize (measured on my machine — your mileage will vary).

Format	Crate version	Bytes	Time to serialize	Time to deserialize
json	1.0.57	100	250ns	450ns
yaml	0.8.13	99	3µs	7µs
ron	0.6.0	91	650ns	2µs
bincode	1.3.1	58	35ns	120ns
msgpack	0.14.4	24	100ns	200ns
cbor	0.11.1	72	140ns	460ns
postcard	0.5.1	41	60ns	180ns
flexbuffers	0.1.1	41	1.5µs	750ns
flatbuffers	0.6.1	104	180ns	120nsith
abomonation	master*	116	15ns	10ns

*With an additional change to allow for serializing Ranges

Serialization really is a strong point of Rust. The libraries are mature and fast.

I feel I should sing Serde’s praise here. It’s a great piece of engineering and I highly recommend it.

Let’s quickly summarize what we learned about the other choices:

For your choice of format, if you need fast serialization and deserialization, bincode is the best you can do
For the smallest possible serialized size, MessagePack is the format to beat, though you pay with more runtime on deserialization
Postcard offers a good compromise between size and speed that allows for embedded usage
FlatBuffers are complex, feel decidedly un-Rust-y, and take up more space than they should. Unless you use the schema definition in multiple languages, there is really no reason to use it. Even then, JSON is faster
JSON is the fastest of the three readable formats, which makes sense since it has seen wide industry usage and benefits from SIMD optimizations

Regarding maturity, only bincode and JSON are marked with a 1.* major version number. Still, there’s a tendency in the Rust world to be very careful — perhaps even too careful — when it comes to going 1.0, so this doesn’t say much about the actual maturity of the crates.

I found all of the Serde-based crates easy to use, though more consistency of interfaces between the crates wouldn’t hurt. The benchmark serializes with to_writer, serialize_into, serialize(Serializer::new(_)), to_slice, or to_bytes and deserializes with from_slice, from_bytes, from_read_ref, or deserialize.

My benchmark code is available on GitHub. If you find problems or improvements, feel free to send me an issue or PR.

https://blog.logrocket.com/rust-serialization-whats-ready-for-production-today/

Introduce

Serde

JSON

YAML

RON

bincode

MessagePack

jsonway

CBOR

Postcard

FlexBuffers

FlatBuffers

Abomonation

Results

Conclusion

Reference