By Ugorji Nwoke   Tue, 27 Sep 2011 17:15:00 -0700   /blog   appengine geek golang technology

Datastore Enhancement for GO Language Runtime in Google App Engine

This attempts to make a case for datastore enhancements in the GO Language Runtime of App Engine.

Quoting from http://code.google.com/appengine/:

Google App Engine enables you to build and host web apps on the same
systems that power Google applications. App Engine offers fast
development and deployment; simple administration, with no need to
worry about hardware, patches or backups; and effortless
scalability.

And paraphrasing from http://golang.org/

Go is an expressive, concise, clean, and efficient, with
sophisticated concurrency primitives and a novel type system which
enables flexible and modular program construction. It's a fast,
statically typed, compiled language that feels like a dynamically
typed, interpreted language, complete with garbage collection and
run-time reflection. GO is the bees-knees.

I spent the last year building an app on the Java Runtime, waiting on the GO Language Runtime to hopefully become available. It became available in July, and I started working with it.

It is a much better fit for new application development IMHO. Among the many reasons are:

  1. I had to worry less about creating abstractions and solutions, and could just focus on the task at hand. The natural primitives and modern bundled API’s make me less reliant on 3rd party code or custom built solutions.
  2. It is conceivably more performant that the Java Runtime (since it uses GO-Routines which could scale more than just pure OS threads). I look at the way things like NodeJS scale, and thing GO’s model could get me closer to that.
  3. The runtime is leaner and meaner (more CPU space, more RAM space, less runtime overhead ==> more requests handled per instance)
  4. It has all the modern features I really need: closures, first-class functions, extensive type system, conversions, clean syntax, and simple yet sophisticated builtins for concurrency and messaging.
  5. Programming is WAY more fun and productive (it’s not even close)

GO is truly an extremely delightful language, and I am very excited to hop on it.

There are some features I have built over the datastore on the Java Runtime, which also exist bundled on the Python runtime, and which I have come to depend on. I’d enumerate them, and then try to explain and make a case for them. I would love for the App Engine GO Team to discuss these, and see if they could/would be implemented natively in the GO SDK.

Datastore Features

The features are enumerated below:

  1. “Optional” Integrated caching: L1 (request-scoped, in-process) and L2 (Memcache)
  2. Embedded Types (stored as . separated columns)
    Include Support storing maps of primitive to primitives (2 columns: fieldName and fieldName_)
  3. Alternate Datastore Column Names for fields
  4. Callbacks: preSave, postLoad. Also allow app reject a load/save request
  5. Functions for decisions: store/index this property?
  6. Polymorphic Queries

I’d try to describe and make a case for each one below.

(1) “Optional” Integrated caching: L1 (request-scoped, in-process) and L2 (Memcache)

Caching is now much more important with the new billing structure. It would be nice if it was transparent, in such a way that the SDK “could” check caches (request-scoped in-process, and longer-lived memcache) before checking the datastore, for GET’s and also for PUT’s and DELETE’s. Folks could “configure” L1 and/or L2 caches for specific structs, and just depend on the SDK API’s to do it transparently.

Without the SDK providing it, almost everyone will create a custom solution, which would end up being wrapper functions around most of those SDK API’s.

Caching will bypass queries and transactional GETs, and clear entries during transactional PUTs/DELETEs.

(2) Embedded Types (stored as . separated columns)

Imagine a struct like:

type A struct { A1 int, A2 int, B1 B }
type B struct { Ball1 int, Ball2 bool }
var A A

Imagine you want to query on A.B1.Ball. It will be nice to store the columns with dot separated keys. ie an entity A could have columns: A1, A2, B1.Ball1, B1.Ball2.

The solution will also support a slice of B. Suppose we have:

type A struct { A1 int, A2 int, B1 []B }.

The columns are then stored as slices ie the column types for A1, A2, B1.Ball1, B1.Ball2 would be int, int, []int, []bool respectively.

The embedded-types structure stored is only applicable where there’s a need to index and query on them. Where there’s no need to index or query, they could be stored easily as blobs (using gob encoding).

In addition, it would be nice to store maps of primitives to primitives with support for querying. For example:

type A struct { A1 int, A2 int, B map[string]bool }

We should be able to search for where A1 is 1, A2 is 5, and the map has a mapping for ‘goog’.

(3) Alternate Datastore Column Names for fields

With App Engine, the column names are replicated many many times during the storage of a single entity (during each index write, during entity write, during each composite index permutation write, etc). This can add significantly to the cost of storage. It was standard practice to define short alternate names for the datastore to use.

At the minimum, the SDK should respect this configuration during reads and writes. As a bonus, it can also respect it during queries (so that all code will just use the field names, not the configured alternate names).

(4) Callbacks: preSave, postLoad. Allow app reject a load/save request also

The SDK should call methods on structs just after loading, and before saving. These methods should return an error which should signal to the SDK that it should not proceed completely with these entities. During a save, those entities should not be saved and should be returned in a MultiError. During a load, those should not be returned normally, but as part of a MultiError.

These allows some things:

  1. Modify entities after loading to compute extra properties which are not stored but evaluated at runtime (e.g. loadTime, isLoadedFromDatastore, etc)
  2. Modify entities before saving to reset some properties (e.g. lastModifiedTime, etc)
  3. Inform the SDK that a certain entity is dirty and should not be stored in the datastore
  4. Signal that a certain entity is dirty and should be treated carefully after loading

(5) Functions for decisions: store/index this property?

Storage is expensive.

Indexing makes it proportionally more expensive. The number of write operations in each entity store is 1 + 2X number indexes. A single entity update with 4 properties could be 1 write operation or 25 write operations (if all 4 properties are indexed asc and desc including delete and put index operations, and you have two composite indexes).

To alleviate this:

  1. Some entities should not be stored if they have errors.
  2. Some properties in them should not be indexed if we’re never going to query on those values.

For example, we may only need to query on a boolean property if it is true. Or only need to query on a numeric property if it is over a threshold (e.g. 100). In these scenarios, we only want to index the property if its value is true or >100 respectively. A simple true|false configuration will not suffice. We need to evaluate the decision at runtime just before the entity is stored.

The callbacks defined above could help here.

(6) Polymorphic Queries

Polymorphic storage allow you to store different structs in the same datastore kind, but use a discriminator column to find and load the appropriate struct. This simulates efficient JOINS (since it’s really just different entities stored in the same table). A solution like this is also natural for a datastore like BigTable (since all entities must not be uniform).

With the general model of the SDK (where a struct is passed to the GET request), it is a bit less natural to implement. A possible solution may involve passing in a nil interface{}, so that the API would look for a discriminator column and determine the type of struct to return. (The JSON package did something like this in its Marshal method, where passing in a nil interface{} allowed it create and return a diff object).

UPDATES

The App Engine team informed me (via the groups discussion at https://groups.google.com/d/msg/google-appengine-go/b8gkgnN0L1Q/q95QC4mD6zcJ that they already had two of them on the short-term roadmap:

  1. Alternate Datastore Column Names for fields
  2. Functions for decisions: store/index this property?

These are two of the more important ones. The others can be worked around using application-defined convention and wrapper methods that expect the convention. I’ve built support for all the other 4 features for my application in about 400 lines of GO code, which doesn’t duplicate but builds upon and depends on support provided by the SDK (including the 2 features on the roadmap). This shows that the team made the right decision in picking these 2 to support from the jump.

“If you’re having code problems, I feel bad for you son. I got 99 problems but JAVA ain’t one … GO!!!”

Tags: appengine geek golang technology


Subscribe: Technology
© Ugorji Nwoke