Wednesday, January 21, 2009

Using AppEngine -- Or Similar Datastore -- To Integrate Complex Legacy Data Formats

I gave a lightning talk last night at the SF Bay Area App Engine Developers, showing some work I've been doing to represent gnarly legacy records in AppEngine so as to maintain source fidelity, minimize upfront analysis, and make them easy to integrate with other systems.

I had started with an XML record that I wanted to parse and represent in the datastore -- without knowing which tags and structures would be present, since this format had, ahem, evolved to obscurity over time, as often happens with real-world legacy records.

Before I talk about my approach, here's why I thought this effort might be interesting to the group: a lot of data structures have a tree structure in common with XML. From C structs and file blocks that include a header, telling which types to cast the next n bytes to (and so on inside of those) ... to mainframe "structured data" records I've encountered which consist of nested records, parsed recursively, with their meanings occasionally opaque, lost to history, or belonging to some partner company.

My approach -- which is simply to create a mapping of how to assemble and disassemble the records -- enables a record to be stored in a single App Engine record. But not as a block (or blob) -- rather with fine-grained addressable fields that are easy to talk to using the GAE Datastore API.

In my case, since my original was XML, I created a mechanism similar to a tiny subset of XPath describing the sequence of tags where a data element lived -- but with the characters changed so that it would be Python and GAE-friendly. That is, instead of "/foo/bar[2]/baz" I used _Foo_Bar__2_Baz.

This let me "flatten" the XML into a set of key-value pairs, while allowing that the XML might contain arbitrary structures injected by others ... and that I might want to inject my own extra structures. This arrangement is perfect for the Expando models in App Engine Datastore, or any similar store (e.g. Hypertable, which is modeled after BigTable, or Microsoft SQL Data Services which uses SQL 2008's sparse tables to similar effect).

So now I can store and retrieve my records. Any fields/subrecords which I understand and care about, I can easily work with from other systems, by mapping to the appropriate "key" in the stored record.

For example, if I'm storing a bunch of catalog data, and another system just cares about enumerating each "Product" with "Name" and "Price," then I can create a facade or wrapper in GAE that maps, say, Price to _Strange_Old_Way_To_Represent_Current_Price, and we're all set.

To be sure, there could be performance issues if you tried to use this to create arbitrary queries and reports against the data. That's not really the purpose and, in my experience, if there are no "shortcuts" to processing these legacy records, then the business folks are not used to being able to make an OLAP cube out of them either. (They probably have a batch or offline extraction process.)

Nonetheless, it's another tool in our chest when we need to work with systems and data that have been out in enough real-world battles to come home scarred with lots of cruft.


gaohui said...

The holidays are a time ed hardy of getting together with friends ed hardy shoes and family, attending elaborate ed hardy clothing parties, and other exciting events ed hardy clothes that involves dressing up in stunning ed hardy store wardrobes. If you ed hardy Bikini are pregnant during ed hardy swimsuits the holidays, it does not ed hardy Caps mean that you are unable buy ed hardy to look fabulous and ed hardy swimwear stylish. Now, an expectant ed hardy sale mother has many styles of chic ed hardy glasses maternity clothing that allows cheap ed hardy her to show off her baby bump Christian audigier while looking spectacular.

Jack said...

If so, there is nowhere else better than here for you. There are a wealth of seafood and delicacies in my palace and you can eat whatever you want without anyone stopping you.

cheap wow gold|Tera gold|Cheap Tera gold|Runescape gold|wow gold

chengnuo said...

Usually when people are sad, they don't do anything. They just cry over their condition.
But when they get angry, they bring about a change.
Tera Gold
RS Gold
Cheap tera gold
Cheap Tera Gold

Anonymous said...

Great! Frequently We by no means understand whole content articles nevertheless the means people composed this post is purely wonderful this also retained this involvement in reading through i loved that.

the cambridge satchel|satchel cambridge|cambridge satchel|cambridge satchel co|the cambridge satchel company|cambridge satchel bag|cambridge satchel company bag|cambridge leather satchel|cheap uggs

cambridge satchel said...

cambridge satchel are very fashion, now they have become a popular fashion trend. You will like cambridge backpack very much. ugg boot cambridge satchel company satchel cambridge leather bag

heygames said...

Nonetheless, it's another tool in our chest when we need to work with systems and data that have been out in enough real-world battles to come home scarred with lots of cruft.leather bags
ugg boots
buy ugg boots
cambridge satchel
ugg boots
buy ugg boots
ugg in winter

Anonymous said...

I came across a nice quote while re-reading Randall Gould's great China memoir China in the Sun the other day. Gould was a veteran member of the old China press corps before the war.Cheap Soccer Jersey | Cheap Football Shirts | france jersey euro 2012 | germany national team jersey | italy jersey soccer shirt | japan soccer jersey 2012 | mexico soccer jersey wholesale | netherlands jersey euro 2012 | portugal euro 2012 jersey | russia jersey shirts wholesale | spain soccer jersey 2012 | cheap Spain soccer jersey | uruguay soccer jersey shirt wholesale | croatia euro 2012 jersey | denmark euro 2012 jersey

Anonymous said...

the cambridge satchel

Michelle Ding said...

Before you go, put together almost like machine tattoos you most likely are browsing beach front. An advanced person who bronzes, shaves and waxes during the summer time, you ought to go through this particular plan leather shoes for men before with a spree for your brand new suit.

Anonymous said...

Pet them, snuggle with these, not to mention dog leashes prepare these guys.

For every they make certain that you are unable to permit them to have almost any give diet that you simply can not consume. Incorporate insert them in numerous room in your home although you are drinking all your recipe.

You also need so that you give these items dog boots poorer part in daytime. Of having the unhealthy weight . puppy could be equipped to lose weight is that if there're exercising not to mention having to eat the appropriate way of measuring pet dog collars and leashes foodstuffs. Immediately you're letting them hold the correct amount liquid. It willaids to make sure they're hydrated all through play dog leashes time.

summermobile said...

To be sure, there could be performance issues if you tried to use this to create arbitrary queries and reports against the data.