Wednesday, May 24, 2006

It's Easier Getting Them Out (OO Part 2)

When my wife and I were preparing for the birth of our son, we attended some prenatal classes. One of the instructor's favorite (overused) jokes was that, as far as babies, "it's easier putting them in than getting them out."

When it comes to objects and relational data stores, the opposite is true. Most developers can -- and have already needed to -- learn plenty of query description mechanisms (SQL, OPath and variants, JDOQL, LINQ, and a hundred product specific ones). Learning how to get data out of a store, and putting it into an object, XML schema, structure, whatever is just not that big a deal.

The hard part is putting 'em in. Actually, there are two hard parts.

The lesser hard part is taking data and moving it into a well-defined persistent store. This needs to be done while avoiding egregious transaction, concurrency, and performance problems on the DB side and also not requiring a lot of tuning and locking hints from the developer.

The greater hard part is taking a model built on OO semantics (using a language, UML, etc.) and automagically figuring out how to create a schema and persist this into relational store in a way that (1) does not create all sorts of concurrency and performance problems; (2) is transparent enough that DBA tools can be used for administrative, maintenance, and infrastructural reporting needs; and (3) is amenable to refactoring of the OO model since the in this case the OO model is prior to the persistence, and we recognize the benefits of "refactoring mercilessly" on the model side.

I am glad to see continued work on this problem. But I want to see more discussion right up front on exactly which models have priority in each of these schemes, so that developers can more efficiently choose the right approach for a given project. For example, Rails gives priority to a database schema; EJB 3, Hibernate and the WilsonORMapper take a middle position requiring modest work on both sides of the divide; gives priority to the object model. All good products for the right problem, none perfect.

Microsoft's white papers on ADO vNext suggest avoiding the issue by abstracting up a level: defining models in terms of a more abstract entity definition system. In exchange for defining domain models using Yet Another System, we would be given tools to seamlessly match these entities to objects, on the one side, and storage mechanisms like RDMBSs on the other side. Both the OO languages and the DB become second class citizens to the entity model. Since I'm writing applications in these OO languages, I'm not sure I'm ready for that, but I'll wait and see.

Meantime, if we're not clearer on what we're doing -- putting an object model into a database versus the entirely different affair of pulling database records into objects -- we're going to continue to see grand new schemes that never really take hold.

No comments: