Monday, January 26, 2009

Is that Service Really a Scalable Cloud or Just Full-Service Web Hosting?

A lot of cloud stacks, or cloud app platforms promise scalability for your app, "With a little EC2 in every box!" (TM). There is a big catch and a little catch, though, and if your app gets big, then either or both of these may be a deal-breaker.

First, and most important: Running a vanilla RDBMS (e.g. MySQL) in a VM somewhere does not make it magically scalable. Read that sentence one more time.

Some cloud offerings integrate tightly to the traditional sort of DB instance you might attach to your web app on a single server. Examples include Heroku, which applies your Rails migrations to a PostgreSQL instance, and Stax, which offers MySQL.

The great thing about these environments is that they don't require significant changes to your standard app built on their supported platforms (mostly Rails and Java variants). Upload, minimal admin, and IJW (it just works).

That's turn-key, full-service web hosting, right there. It's beautiful -- in fact, in an OO and Rails course I wrote, I chose Heroku for deployment as a way to let students get something up and running on the web without getting into the operations/deployment/tuning aspects of Rails which deserve their own course.

But if your app gets large -- or just uses large datasets -- the database is rapidly going to be a bottleneck. Scaling out an app logic tier to a dozen EC2 instances automatically may sound good, but it won't do a thing for a DB-bound app (it may make it worse). And these databases don't scale out without a little architecture, planning, configuration -- all of the things which these cloud platforms are designed to avoid. And which, on some platforms, you cannot do at all.

For example, so far as I can tell on Heroku or Stax, there is no way to even configure multiple servers and replication, which is just a minimum starting point for scaling a DB to multiple machines. Stax may allow for a logical sharding setup, but it's not clear how one would control which VMs and disks the databases run on. Rightscale seems like the kind of firm which would specialize in the management scripts / meta-API that one would need to automate sharding, but the sharding option doesn't appear in any of the models on their website. With replication, which Rightscale does offer (though they're not exactly an app platform, more an infrastructure play), you get to this, still limited, picture:

Other cloud platforms offer datastores specifically designed to scale out, including Google App Engine, 10gen, and others. These platforms offer a non-relational or pseudo-relational datastore, with different data access APIs and a variety of restrictions relative to what you may be used to. These datastores are architected to scale easily, but there are real tradeoffs that must be considered. In fact, if you don't know these tradeoffs cold, you are not the right person to be making this platform decision. Get on craigslist and hire (or borrow) someone who knows the stuff.

The other catch is that whichever approach you choose, these vendors are offering you convenience, some outsourced operations management, and (in some tiers) elasticity and scalability ... but they are not offering cheap compute cycles. That is, if you know you'll need a large, predictable amount of raw compute time, then know also that you're paying a premium to do that computation in one of these environments.

A friend who has designed, built and operated feature film renderfarms for a number of studios confirmed that he has, on a semi-regular basis, analyzed the costs of remote VM-based datacenters (e,g. EC2) compared to their physical ones. Because the studios use these machines intensely, and are consistently consuming raw compute power, the local physical servers have always made more sense.

What does this have to do with your web app and datastore? Well, suppose you have designed your app to leverage a scalable datastore. These may not be tunable, may not perform fast, and may require you to do certain operations in code which traditionally are done in the DB. You may never see these slow queries or operations ... until they show up in your bill. That is, if the system is truly elastic and scalable, it will apply resources as needed to handle your work. If your query or sort or filter takes a lot of CPU cycles, the cycles will be made (almost) instantly available, so the user always sees your app perform well. And then you'll pay for all those cycles or instances at the end of the month.

Either way, there is no free lunch on the data persistence side. Which is not in itself a reason to avoid cloud environments. But it should be a bigger part of the conversation than it is today. And it absolutely must be part of the conversation, if larger businesses are going to move their services into the cloud.


gaohui said...

The holidays are a time ed hardy of getting together with friends ed hardy shoes and family, attending elaborate ed hardy clothing parties, and other exciting events ed hardy clothes that involves dressing up in stunning ed hardy store wardrobes. If you ed hardy Bikini are pregnant during ed hardy swimsuits the holidays, it does not ed hardy Caps mean that you are unable buy ed hardy to look fabulous and ed hardy swimwear stylish. Now, an expectant ed hardy sale mother has many styles of chic ed hardy glasses maternity clothing that allows cheap ed hardy her to show off her baby bump Christian audigier while looking spectacular.

Michelle Ding said...

By doing this, it is also possible for you to speedily define machine tattoos each of the many choices you've on hand and find a new accommodate which you anticipate sporting. The exact same idea may very well be mens clothing cheap placed on coloring likewise.