Wednesday, January 21, 2009

Using AppEngine -- Or Similar Datastore -- To Integrate Complex Legacy Data Formats

I gave a lightning talk last night at the SF Bay Area App Engine Developers, showing some work I've been doing to represent gnarly legacy records in AppEngine so as to maintain source fidelity, minimize upfront analysis, and make them easy to integrate with other systems.

I had started with an XML record that I wanted to parse and represent in the datastore -- without knowing which tags and structures would be present, since this format had, ahem, evolved to obscurity over time, as often happens with real-world legacy records.

Before I talk about my approach, here's why I thought this effort might be interesting to the group: a lot of data structures have a tree structure in common with XML. From C structs and file blocks that include a header, telling which types to cast the next n bytes to (and so on inside of those) ... to mainframe "structured data" records I've encountered which consist of nested records, parsed recursively, with their meanings occasionally opaque, lost to history, or belonging to some partner company.

My approach -- which is simply to create a mapping of how to assemble and disassemble the records -- enables a record to be stored in a single App Engine record. But not as a block (or blob) -- rather with fine-grained addressable fields that are easy to talk to using the GAE Datastore API.

In my case, since my original was XML, I created a mechanism similar to a tiny subset of XPath describing the sequence of tags where a data element lived -- but with the characters changed so that it would be Python and GAE-friendly. That is, instead of "/foo/bar[2]/baz" I used _Foo_Bar__2_Baz.

This let me "flatten" the XML into a set of key-value pairs, while allowing that the XML might contain arbitrary structures injected by others ... and that I might want to inject my own extra structures. This arrangement is perfect for the Expando models in App Engine Datastore, or any similar store (e.g. Hypertable, which is modeled after BigTable, or Microsoft SQL Data Services which uses SQL 2008's sparse tables to similar effect).

So now I can store and retrieve my records. Any fields/subrecords which I understand and care about, I can easily work with from other systems, by mapping to the appropriate "key" in the stored record.

For example, if I'm storing a bunch of catalog data, and another system just cares about enumerating each "Product" with "Name" and "Price," then I can create a facade or wrapper in GAE that maps, say, Price to _Strange_Old_Way_To_Represent_Current_Price, and we're all set.

To be sure, there could be performance issues if you tried to use this to create arbitrary queries and reports against the data. That's not really the purpose and, in my experience, if there are no "shortcuts" to processing these legacy records, then the business folks are not used to being able to make an OLAP cube out of them either. (They probably have a batch or offline extraction process.)

Nonetheless, it's another tool in our chest when we need to work with systems and data that have been out in enough real-world battles to come home scarred with lots of cruft.

6 comments:

Unknown said...

The holidays are a time ed hardy of getting together with friends ed hardy shoes and family, attending elaborate ed hardy clothing parties, and other exciting events ed hardy clothes that involves dressing up in stunning ed hardy store wardrobes. If you ed hardy Bikini are pregnant during ed hardy swimsuits the holidays, it does not ed hardy Caps mean that you are unable buy ed hardy to look fabulous and ed hardy swimwear stylish. Now, an expectant ed hardy sale mother has many styles of chic ed hardy glasses maternity clothing that allows cheap ed hardy her to show off her baby bump Christian audigier while looking spectacular.

Jack said...

If so, there is nowhere else better than here for you. There are a wealth of seafood and delicacies in my palace and you can eat whatever you want without anyone stopping you.

cheap wow gold|Tera gold|Cheap Tera gold|Runescape gold|wow gold

heygames said...

Nonetheless, it's another tool in our chest when we need to work with systems and data that have been out in enough real-world battles to come home scarred with lots of cruft.leather bags
ugg boots
buy ugg boots
cambridge satchel
ugg boots
buy ugg boots
ugg in winter

Anonymous said...

the cambridge satchelhttp://www.kaboodle.com/reviews/purple-cambridge-satchel-9?refItemId=AAAAAU_7O2EAAAAAA6IgSQ&fromStoreDomain=cambridgesatchelon.com

Unknown said...

Before you go, put together almost like machine tattoos you most likely are browsing beach front. An advanced person who bronzes, shaves and waxes during the summer time, you ought to go through this particular plan leather shoes for men before with a spree for your brand new suit.

summermobile said...

To be sure, there could be performance issues if you tried to use this to create arbitrary queries and reports against the data.

http://www.meizu-mx5.com/