By Ugorji Nwoke   Wed, 16 Nov 2011 10:06:00 -0800   /blog   appengine

GO App Engine datastore operations design

GO App Engine datastore.Load/Save uses goroutines and channels to iterate over datastore entity properties, causing overhead.

Background
With GAE 1.6.0, Support for Indexed Properties, Hooks, etc was introduced with a nice, elegant design using a PropertyLoadSaver interface that uses channels (as an iterator).

I noticed that, after updating my code to utilize the PropertyList, some of my application requests started taking about double the time they were taking before. Previously, with datastore.Map, my requests still took roughly same amount of time.

On digging further, I found the following in the implementation:

appengine/datastore/load.go
  func loadEntity(dst interface{}, src *pb.EntityProto) ...
      c := make(chan Property, 32)
      errc := make(chan os.Error, 1)
      go protoToProperties(c, errc, src)
appengine/datastore/save.go
  func saveEntity(defaultAppID string, key *Key, src interface{}) ...
      c := make(chan Property, 32)
      donec := make(chan struct{})
      go func() { ... }

That is, For each entity (analogous to each row in a table), we create and use:

 1 goroutine and 2 channels.

The deprecated datastore.Map retrieval bypasses this Channel/Goroutine dance, which is why my response time did not change until I switched to datastore.PropertyList.

Concerns:

Can we do without the goroutines/channels, especially in the API? This way, we can use different implementations.

Alternative solution using iterators
An alternative, equally elegant solution would just use iterators:

  • type PropertyIterator interface:
    Next() (Property, os.Error) //To signal end, os.EOF/datastore.Done is returned
  • type PropertyLoadSaver interface:
    PropertyLoad(PropertyIterator p) os.Error
    PropertySave() PropertyIterator, os.Error

For implementations of PropertyIterator:

  • type PropertyIteratorFunc func() (Property, os.Error)
    Next() (Property, os.Error) //calls itself
  • type PropertyList []Property:
    Iterator() PropertyIterator //actually a PropertyIteratorFunc
  • type ChanPropertyIterator chan Property: //implements PropertyIterator itself
    Next() (Property, os.Error) //does a <- on channel, and returns os.EOF/datastore.Done appropriately
    //optional: for people who prefer goroutine/channel alloc over slice alloc)
    //this would be like the current solution today, but not exposed in the API)

Since GO Runtime is still experimental, making a contained API change should be ok.

But RPC dominates the overhead per request. Why focus on goroutines/channels use?

Definitely, the RPC time will dominate the overhead from a goroutine and 2 channels. However, we’re talking about potentially 100’s or 1000’s of goroutines per request (equal to the number of “row” returned by, or sent to the API call). E.g. for a GET that returns 100 entities, thats 100 goroutines and 200 channels created to service that 1 API call. And these goroutines/channels we’re making have nothing to do with concurrency: we’re just using this for iterators.

Also, within our application code, we still have to optimize our code (and especially our exported APIs), even though we know that RPC overhead will overshadow it.

Main Concern: Implementation bleeds into the API
My main concern is that this bleeds into the API. By using Iterators, you can use channels and a goroutine in the implementation, and change that afterwards, without application users having to know about it.

The alternative implementation proposed above shows how thic can be done using iterators. It’s trivial to implement (in GO code) and you can gain what you want, without restricting your implementation:

  • Objects don’t need to exist longer than it needs to populate the fields
  • Intermediate state is supported
  • No need to pass around []Property for a large entity

However, the API is not tied to an implementation, so you can implement with goroutines/channels, or with a List. User code that passes a PropertyLoadSaver can use whatever is most applicable/optimized for his usecase. For example, in my user code, I can pass PropertyList into each call and will not incur the overhead of goroutines/channels.

Have others solved similar problems using goroutines/channels? Where?
It seems that the use of goroutines/channels as iterators is not done in other similar places:

  • See datastore.Query whose iteration doesn’t expose goroutines/channels
  • See exp/sql/driver whose iteration doesn’t expose goroutines/channels (just a Next([]interface{}) method)

What is the performance overhead (load on CPU, RAM) with this? Does it scale?

Initially, when I did this, I ran some rudimentary tests to find the maximum number of goroutines I could create on my machine and how much resources it took.

The summary of the results is that, On a 2.0GHz core, I could start a maximum of 5e5 (500,000) goroutines which basically did nothing (beyond that, I got errors). The RAM usage was 2.0GB.

An app engine instance is 600MHz single core with 128MB limit. That’s about a 14 the CPU and 120 the memory. (Even my nexus one has way more resources than that.)

In summary, 2.0GHz, 2GB RAM produced 500,000 goroutines max. I wonder how many a 600MHz, 128MB app engine instance would accomodate.

I’d suspect a few thousand goroutines on such a tiny “computer” (600MHz, 128MB) would tax the system. However, it’s really easy to get into such a situation with the current design. If most of the time is spent on RPC (I/O) and CPU load is low, GO can easily support a large number of concurrent requests. 50 concurrent requests each retrieving 200 entities will mean 10,000 goroutines (+20000 channels) at the same time, just serving API requests, and imposed by the SDK runtime (ie not application code which we can control or tune). In this scenario, the runtime is imposing an overhead which does not seem necessary.

If we expect that most people will pass a PropertyList to calls to GetXXX or PutXXX, then the goroutine/channel is completely redundant.

Also, remember that each goroutine allocates an initial stack of 4K, so each goroutine has a cost in memory allocation, which becomes non-trivial under load.

The rudimentary go code used to run this test is available at:

  • Shared online: You can download a go file, to compile and run on your computer here.
  • On Golang Play: You need to run this on your local computer
Tags: appengine


Subscribe: Technology
© Ugorji Nwoke