Friday, December 07, 2007

MozyHome Backup Fails to Backup Designated Files

I have been testing out Mozy's MozyHome remote backup product, and have found that it sometimes ignores new or changed files in the backup set marked to be backed up. Sometimes it "discovers" these files (or changes) days or weeks later and backs them up; other times, if I modify a file "nearby" (relative to the way my backup set is constructed), it will suddenly discover all the other changed files and back them up. Still other times, no poking or prodding seems to make it back up these files.

This is a serious problem. After all, the raison d'être of the product is backup. Imagine that a home user or a business installs this solution, marks files to be backed up, observes that the backup process is running successfully and then -- after a data disaster -- learns that, well, some of the files are backed up and some simply are not.

If Mozy were a fly-by-night outfit, one might say that better due diligence is required in choosing a backup provider. But with its recent acquisition by storage giant EMC and its global contract with GE, Mozy appears to be a solid company.

The software, though not perfect, basically works. The backup and restore are straightforward. You can encrypt locally with your own key so that no one but you could ever decrypt your data in the case of a breach (although interestingly, the file names are not encrypted, so don't count on hiding the existence of my_illegal_off_balance_sheet_transactions.xls, or gifts_to_my_mistresses.doc).

But we're talking about the number one, sine qua non, only -- really -- important use case for backup. It has to back files up. Or at least, if it doesn't, it needs to tell you what failed, when, and why.

When I first discovered this problem, I realized that publicizing it could have a negative impact on Mozy's business. So instead of blogging, I contacted them directly to learn more. Unfortunately, after a few back-and-forths, including my running their diagnostic tools and sending them the reports and explaining that I was not going to give their support personnel full remote access to my box without more information, they have gone radio silent.

As a software engineer, I am fully aware that this problem could be a strange corner case. Perhaps it is so narrow that it never affects anyone besides me. But I doubt it. And, in any case, this problem is severe enough that it warrants a little investigation to determine its breadth.

Why do I doubt it is an unusual corner-case failure? Simply because my configuration of the service is so "typical." I'm running XP SP2, NTFS on a well-maintained, modern machine, on a secure home network behind NAT, with no strange services or applications of any kind running (e.g., the kind that might hook and hack kernel file system operations, or leverage alternate data streams). The affected files are not under unusually named file paths, or have any funky attributes set on them.

The only things I am doing that are not defaults are using my own encryption key, and adding a few files to my backup set that aren't in the "My Documents" tree. My conclusion is that it is likely that whatever glitch is causing the software to miss files on my file system, is also missing files on other people's file systems. And they have no idea.

Perhaps this problem doesn't affect other users -- but without trying to verify the bug, how can Mozy know? In my last email to them, I specifically asked if their QA team had even attempted to replicate the bug. Had they tried and failed to reproduce it? Fair enough, maybe I could offer some help. But if they haven't tried, it makes you wonder what bug report could possibly be a higher priority? Maybe if their app runs off the rails and reformats your drive, that's a higher priority. But barring active destruction of your data, or a major security bug that could compromise you to a third party, I can't think of anything.

The ultimate problem here is not even an engineering problem. Yes, there's a bug in the software, but there are bugs in almost all software. Rather, it's a process problem. How does the QA process work? How does customer support work? What steps do you take if someone reports the Really Big Bug? Is the right thing to assume they're a crackpot? Can you afford to do that and not even look into it? (Hint: No.)

</end of regular post> <free:bonus>

Since I'm really not out to get these guys, but ideally want to help, I'm gonna offer a free first step: it's really easy to tell after the fact if this bug is manifesting, since the set of files actually backed up simply doesn't match the local description of the backup sets. So an easy diagnostic is to write a list of these two sets of files and diff them. If there are deltas, you've got a problem.

So... you push out an update to the client that creates a list of the files in the active backup sets and sends it over to QA as the last step in the online backup process. Then QA just has to generate a matching file list (the match is on the account id [email address] and either the date/time or the id number of the backup) from the Mozy meta-data store and compare.

1 comment:

Anonymous said...

I have the same problem with mozy, also on a 'vanilla' WinXP SP2 machine. So, you are not alone. I went back and forth with tech support with no admission of a problem and therefore obviously no solution. I agree that this is of MAJOR significance which they have been treating too lightly. If no solution arrives soon, then I'm off to look at alternatives.