The story had been brewing for a week or more: TMobile Sidekick users started reporting that their information stored online: email, contacts, calendar, etc., were inaccessible starting around a week ago Friday. TMobile acknowledged the problem that Sunday, but it wasn’t until this past weekend that TMobile posted a message to Sidekick customers portending doom:
Regrettably, based on Microsoft/Danger’s latest recovery assessment of their systems, we must now inform you that personal information stored on your device – such as contacts, calendar entries, to-do lists or photos – that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger. That said, our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low.
Sidekick enthusiast site Hiptop3.com (Hiptop is the name of the Danger developer platform, running on Sun and Java) speculated on the causes of the failure, although we haven’t seen confirmation of this:
Currently the rumor with the most weight is as follows:
Microsoft was upgrading their SAN (Storage Area Network aka the thing that stores all your data) and had hired Hitachi to come in and do it for them. Typically in an upgrade like this, you are expected to make backups of your SAN before the upgrade happens. Microsoft failed to make these backups for some reason. We’re not sure if it was because of the amount of data that would be required, if they didn’t have time to do it, or if they simply forgot. Regardless of why, Microsoft should know better. So Hitachi worked on upgrading the SAN and something went wrong, resulting in it’s destruction. Currently the plan is to try to get the devices that still have personal data on them to sync back to the servers and at least keep the data that users have on their device saved.
Now there is no doubt that the fault here lies directly with Microsoft: they own Danger, they’re responsible, end of story. However numerous tweets and blog posts we’ve seen, including the Hiptop3 post, seem to be taking “owning the company” to mean “running Microsoft software”, and from what we can tell, this doesn’t seem to be the case. On Danger’s developer website, all development is in Java, on Sun systems, using Apache:
All end-user applications are written in Java, as is the overwhelming majority of the high-level operating system. Arguably, Danger has the premiere "Java Operating System" on the market today.
While this isn’t definitive that Danger wasn’t running on Microsoft servers (as opposed to Microsoft owned servers), it would seem to be doubtful that a complete Java shop wouldn’t use open source storage, as well. And not that anything went wrong specifically because of the OS, but it seems premature to suggest that the problem was with Microsoft server architecture or software design, and indeed points a finger at Danger’s (open source) system. Even if, as Hiptop3 suggests, Microsoft “failed to make these backups for some reason”, there certainly should have been an ongoing backup solution in place. Indeed, one could speculate that the supposed SAN upgrade was to get the data onto a system that would/could be backed up more reliably, but went horribly wrong.
We asked Microsoft for comment on the whole ordeal, but only received the message TMobile posted in response. We’re afraid that Microsoft is going to have to do better than that, and no matter what platform Danger was running on, or what caused the outage (and what happened to a backup plan), this reflects poorly on Microsoft.
We need to hear the whole story on this one, Microsoft.