Monday, May 19, 2008

mod_ndb: Wicked REST for My[Giant]SQL Cluster

I caught a great presentation at CommunityOne a couple of weeks ago, and haven't had a chance to write about it until now.

In a nutshell, mod_ndb is an Apache module which allow a limited set of parametrizable MySQL queries to be automagically exposed as REST services via Apache. There are three things I left out of that sentence for clarity, that make this uber-cool:

  1. The operations actually run against MySQL Cluster, a high-performance, shared-nothing (code for you don't need to manage a SAN and shared filesystem) scale-out system.
  2. In exchange for not having full SQL capability, these services interact with MySQL Cluster using the cluster's native NDB API for maximum performance.
  3. The latest version of MySQL Cluster, 5.1, has significant technical advantages over the earlier but still impressive 4.1 and 5.0. These include the ability to store more data on disk instead of dedicated RAM, and this collection of improvements. (In that doc, where you see references to 'carrier grade edition' note also that, according to the CommunityOne talk, future versions of MySQL Cluster will be on a unified codebase from that carrier grade version.)

I've liked MySQL Cluster since it's debut, and I'm thrilled to see this evolution.

As far as "let's download this right now and set it up," it is true that in the four years or so since that debut, the niche for scale-out clustering has narrowed:

On one hand, machines and storage have continued to get cheaper and faster, with the result that a data set and transaction load that might have a cluster of 2-4 dual-proc servers, and associated hassle, can now be handled by a single server with a couple of fat quad-core Xeons. The single OS and single filesystem simplify things vastly, to the benefit of databases like SQL Server, which have not invested in a scale-out strategy.

On the other hand, applications with a need for super-massive data that can live with some latency and a lightly structured, don't-call-me-relational data model can pay as they go for that scale-out capability with Amazon SimpleDB, Microsoft SSDS, etc.

Still, there is a well-defined middle section for a product like MySQL Cluster (and its arch-nemesis, Oracle RAC):

  • big data sets and/or large-scale OLTP,
  • plus a highly-structured relational data model. (Microsoft has asserted that SSDS will move toward full relational capabilities in the longer-term roadmap, but inasmuch as the initial service is still in beta, I don't count that.)
  • legal or business requirements to keep the data in-house (not in the cloud)

Or you can flip it around the other way: if your app doesn't need a cluster like this (or something similar), what are you really doing, anyway?

No comments: