Mastodon

RIM’s Flaw: Single Point of Failure

I became a Research in Motion (RIM) BlackBerry user a little bit over a year ago now, and overall I remain as happy with my BlackBerry ‘Bold’ 9000 as I was when I wrote my positive review of the device. It’s a good phone, and it has treated me well—far better than the AT&T 8525 (HTC TyTN HERM100) that preceded it. The hardware has held-up through heavy use with only minimal damage, and the software remains solid and crashes only on very rare occasion.

But while I have very few complaints about the phone itself, one of my biggest qualms about the BlackBerry universe before getting one remains my biggest problem today. I hate having to rely on a middle-man to pass emails from my server(s) to my phone. Phones from Apple, Palm, HTC, Motorola, and others all manage to communicate directly with mail servers, but RIM insists on passing all BlackBerry email traffic through their North American data center in Canada. The RIM servers stand between my phone and my email, polling for messages on the servers and pushing them out to my phone.

Usually this works okay—in day-to-day use, what does it matter what route your email takes to get to your phone? My objection is that it adds another potential point of failure to the system. With any other smartphone, if your wireless carrier provides a working data connection and your email servers are up your email will work. With a BlackBerry, you need both of those plus a working RIM data center. This adds an extra, unnecessary place where the system can break down.

That’s why I, along with probably millions of other BlackBerry users, didn’t have email for over eight hours yesterday. My email servers were working fine, the AT&T data connection was working fine, but RIM’s data center was down. Worse, BlackBerry phones pass regular Internet traffic through RIM’s servers too—many of us weren’t getting email, and couldn’t get on the web either!

The iPhone people all had a good laugh at our expense . . . laugh it up, touch-screeners. I’m sure you’ll finish typing up your one-liner BlackBerry jokes someday ;-).

When RIM first brought their iconic BlackBerry to the market, this centralized architecture made sense. People didn’t have ‘raw’ Internet access on their phones. There had to be a middle-man. Today, however, there is absolutely no valid technical reason whatsoever that my phone can’t talk directly to my mail servers, or to the Internet as a whole.

Look, outages happen—especially in complex, technical systems. I get it. I work on complex systems day-in and day-out and I know that they simply can’t be perfect. But any complex system should be designed to have the fewest possible single points of failure. The system used by every other smartphone works, in whole or in part, in every situation except when the mobile carrier doesn’t provide the user with a working data connection. Oh, sure, an email server might go down . . . but a downed email server doesn’t kill your connection to your other mail servers, or block your ability to surf the web. Only one situation can stop you in your tracks completely.

BlackBerry users have twice as many points of complete failure. We lose our email and Internet entirely if our carrier doesn’t provide a working data connection (just like everybody else), but we also lose it when RIM has a system meltdown. Worse, when RIM has a meltdown it affects everybody regardless of carrier. Yesterday’s outage killed data services for BlackBerry users in North America entirely, whether they be on AT&T, Verizon, T-Mobile, Sprint, or one of the smaller carriers.

RIM has their reasons for sticking with this architecture, but I think it’s time to reconsider. The whole point of this Internet thing (pay attention now) is redundancy and reliability—the elimination of single failure points, not the creation of new ones. At a bare minimum, here are the things I’d really like RIM to do:

  1. If a BlackBerry phone loses connection with RIM’s data center, it should automatically (and immediately) start passing web traffic directly over the carrier’s network. This is a no-brainer. RIM outages shouldn’t stop me from surfing the web. Period.
  2. Email is trickier, but I’d like to see BlackBerry phones get some rudimentary email polling capability built into the device (e.g., checking mail servers directly every 15 minutes). This could be set up by RIM to kick-in only when the phone loses connection to RIM’s system for over 30 minutes, and turn back off when the connection to RIM is reestablished.

Implementing these kinds of fail-safe, redundant systems would allow BlackBerry users to continue about their business largely-unimpeded when outages occur. In fact, if the system is designed well, users wouldn’t even know an outage happened! This would save us from major frustration, and save RIM from widespread public embarrassment. Win-win.

Scott Bradford has been putting his opinions on his website since 1995—before most people knew what a website was. He has been a professional web developer in the public- and private-sector for over twenty years. He is an independent constitutional conservative who believes in human rights and limited government, and a Catholic Christian whose beliefs are summarized in the Nicene Creed. He holds a bachelor’s degree in Public Administration from George Mason University. He loves Pink Floyd and can play the bass guitar . . . sort-of. He’s a husband, pet lover, amateur radio operator, and classic AMC/Jeep enthusiast.