By Ugorji Nwoke   14 Dec 2014   /blog   technology go-codec

Serialization In Go

View articles in the go-codec series, source at http://github.com/ugorji/go

For data transfer between systems to occur, the sending side must encode the data structures into a stream of bytes, and the receiving side must efficiently decode the stream of bytes into a representative data structure.

There is efficient and extensive support for this when using go as your language runtime. The standard library provides support for the following general-purpose encodings:

In addition, the Go Authors and the community at large provide libraries for the following popular encodings:

Let’s compare these in the table below:

Encoding Binary Streaming Mandatory Codegen Phase Symbols
Json N Y N N
msgpack Y N N N
cbor Y Y N N
protocol buffers Y N Y Y
cap’n’proto Y N Y N
binc Y N N Y
gob Y N N N

Each of the columns in the table above show an orthogonal way of comparing different encoding formats.

Binary Vs Text

Text encodings are human-readable and human-writable, and can be examined by the human eye. JSON is a perfect example.

The format of a binary encoding is usually much simpler. Binary encodings do not need separators between values in the stream. They do not need delimiters to separate key-value mappings from sequences, etc.

Due to these, binary encodings have the following advantages:

  • More efficient to parse, in terms of cpu and memory usage
  • More efficient to write
  • More compact

Some would argue that the “compactness” argument can be mitigated by using post-compression. This is true, but that increases the CPU and memory usage when encoding into and decoding from a stream.

Streaming support

Streaming support refers to the ability to encode a sequence of items or key-value pairs into a stream, or decode same sequence from a stream, without knowing the number of elements in the sequence.

This is required for proper memory management. The encoder need not know the full number of elements before encoding starts. On the other side, the decoder need not reserve a large amount of memory before decoding starts.

Due to use of separators and delimiters, many text formats support streaming implicitly. However, many binary formats (e.g. messagepack) do not support streaming natively.

Mandatory Code Generation Phase

Some encoding formats pride themselves on requiring a schema, to dictate the structure of the data and ensure that binary compatibility is maintained as the data changes.

I believe this was a great idea at a time. However, many languages have strict type systems that could enforce the schema without an external compiler being required to specify the format.

This was a motivating factor in creating gob, go’s native binary format provided with the standard library.

Symbols

Symbols are a way of de-duplicating values (especially strings) which repeat a lot in the stream.

Consider a key-value map which has the same set of keys for each object in the stream.

Without symbols, the keys will be repeated unnecessarily, leading to increased CPU and wall time during encoding and decoding.

With symbols, the key is stored as a symbol in the stream, and the symbol is put in the stream wherever that key would have been. This leads to reduced length of the encoded byte stream, at the expense of slightly increased encoding time due to the requirement to lookup the integer symbol mapped to a string value.

Standard Library

The standard library provides support for gob and json.

The argument against gob is that:

  • gob is not portable or usable outside the go runtime.
    This means you cannot interoperate with python, java, C or other languages.
  • gob does not perform well for small number of elements to encode or decode.
    This is because gob takes time to write out a description of all structures in the stream a-priori, before it starts encoding the stream itself.

There are other high-quality encodings with high-quality libraries available, if gob does not fit your use-case.

Features possible due to serialization: RPC

Serialization formats enable more efficient remote procedure calls in Go.

Go has a net/rpc package which allows you use any serialization format of your choosing.

There are implementations for:

  • cbor
  • messagepack
  • binc
  • gob
  • json

Conclusion

The go ecosystem provides a number of high-quality packages for popular encoding formats. These packages have been extensively tested and used by companies who trust that the libraries do not introduce dirty data into their systems.

Go forth and transfer your data, knowing that the go community has your back.

Tags: technology go-codec


Subscribe: Technology
© Ugorji Nwoke