By Ugorji Nwoke   Thu, 30 May 2013 10:49:00 -0700   /blog   technology

Announcing Binc data interchange format

Binc is a lightweight, compact, limitless, schema-free, precise, binary, high-performance, feature-rich, language-independent, multi-domain, extensible, data interchange format for structured data.

See the format documented at http://www.ugorji.net/project/binc

UPDATE:

See Announcement of enhancements. Highlights:

  1. Binc spec is now stored https://github.com/ugorji/binc
  2. Binc now support for symbols, compact variable-length integers and compact floats
  3. Encoded size is now 25% less than in v0.1.0 for representative datasets. v0.1.0 size was already lower than all compared encodings.
  4. Performance is still better than compared encodings.

Let’s talk about each of these descriptions one by one:

  1. Lightweight/Compact:
    In tests, Binc encoding has been shown to take up less than 60% the size of JSON, BSON and other lightweight encodings.

    Care was taken to support compact encodings for common values. For example, signed integers from -1 to 16, booleans, and other special values are encoded with only one byte. For small containers, the size(length) is encoded into a single byte.

  2. Limitless:
    Binc allows for extremely high precision integers (up to 2^15 bits of precision) both signed and unsigned, and the full spectrum of IEEE 754 floating point types (including decimals, extended precision binary floats, etc). Maps and arrays can have lengths that fit into a unsigned 64 bit integer value.

  3. Schema-Free:
    Just like JSON, a schema is not required. This is conceptually an advantage of others like protocol buffers, Thrift, etc which require a schema and a compilation step before use.

  4. Precise:
    Binc aims to remove all ambiguity in the format. There are distinct signed and unsigned integers, distinct precisions, distinct unicode strings (utf8 vs utf16LE, utf16BE, utf32LE, utf32BE), distinct bytearray (binary) type, etc.

  5. Binary:
    Binc is a binary encoding format. This affords significant benefits in space (encoding size) and time (encoding and decoding real and cpu time).

  6. High-Performance:
    By “packing” the bits intelligently while still allowing easy traversal, encoding and decoding performance is achieved. We have tests that show encoding and decoding the same structure taking about less than 40% the time that JSON takes.

  7. Feature-Rich:
    By not taking a lowest-common denominator approach, the codec can represent a larger surface area, even beyond types natively supported by any target language. JSON for example has types which are limited to what Javascript supports. Binc goes beyond that to support arbitrary precision signed and unsigned integers, all IEEE 754 2008 floating point types, very large arrays and maps (with size up to maximum value of unsigned 64-bit integer), special values like NaN, +/- Infinity, etc. Binc also supports rich timestamps (with timezone data, dst flag, nanosecond precision) using 4-14 bytes only.

  8. Language-Independent:
    Binc is not limited by any specific language. Instead, implementations are free to expose the extent of their support.

  9. Multi-Domain Use:
    By supporting arbitrary precision integers, it is a good fit for scientific data interchange. By supporting precise decimal types, it is a good fit for financial data interchange. Different domains would require different levels of support.

  10. Extensible:
    Binc natively supports user-defined extensions. This allows users to transfer custom types and expose how they will be encoded and decoded.

The decision to create Binc was not done lightly. A lot of analysis for features and performance was done. Other schema-free binary codecs were evaluated before Binc was created. These include:

  1. bson: verbose format with features in use only by mongodb
  2. bjson: simplistic, lacks features
  3. ubjson: stays too true to json. lacks extensions, binary support
  4. msgpack: lacks timestamp, binary and extensions
  5. tnetstrings: simplistic and lacking features
  6. smile: complex. lacking features
  7. binary plist: simplistic and lacking features
  8. protocol buffers, thrift, avro: require schema and pre-compilation step

In particular, my application use-case required extreme compactness and high encoding/decoding performance without compression. I also required precise support for timestamps, user-defined extensions, and distinct binary and string types. None of thees encodings supported these features natively.

The closest I got was msgpack which I had standardized on and engaged the community and author to include timestamp and distinct binary and string types. However, after a few months working on it, progress just halted and could not be jumpstarted (see https://github.com/msgpack/msgpack/issues/128).

However, I believe Binc has significant features beyond those provided by msgpack, and stands tall on its own.

We implemented a Binc encoder/decoder using the same high-performance codec library used to build the de-facto and best performing msgpack encoder/decoder for the Go Language, and ran extensive benchmarks agains other encoders. The results are reproduced below, and show the 40% savings in data size and 60% savings in time for encoding and decoding vs others.

..............................................
Benchmark: 
    Struct recursive Depth:             1
    ApproxDeepSize Of benchmark Struct: 4758
Benchmark One-Pass Run:
       msgpack: len: 1504
          binc: len: 1508
           gob: len: 1908
          json: len: 2402
     v-msgpack: len: 1536
          bson: len: 3009
..............................................
Benchmark__Msgpack__Encode     50000         60824 ns/op
Benchmark__Msgpack__Decode     10000        115119 ns/op
Benchmark__Binc_____Encode     50000         55140 ns/op
Benchmark__Binc_____Decode     10000        112132 ns/op
Benchmark__Gob______Encode     10000        143350 ns/op
Benchmark__Gob______Decode      5000        434248 ns/op
Benchmark__Json_____Encode     10000        157298 ns/op
Benchmark__Json_____Decode      5000        303729 ns/op
Benchmark__Bson_____Encode     10000        174250 ns/op
Benchmark__Bson_____Decode     10000        223602 ns/op
Benchmark__VMsgpack_Encode     20000         80438 ns/op
Benchmark__VMsgpack_Decode     10000        157330 ns/op

We hope you find good use for the Binc format.

If you are looking for a high-performance library for it, please check out the Go Library codec at https://github.com/ugorji/go/tree/master/codec#readme . You can find API docs for it at http://godoc.org/github.com/ugorji/go/codec .

Tags: technology


Subscribe: Technology
© Ugorji Nwoke