Updated: Nov 22, 2021
In Banff’s splendid National Park, an iridescent gemstone better known as Lake Louise lies nestled beneath granite tines. There, on its North-East shore stands one of the world’s great hotels, the Fairmont Chateau.
Several years ago my wife and I were in Canada exploring the Banff/Jasper region. When we got to Lake Louise, we initially bypassed the hotel in favor of a long hike. But by day’s end, tired and contented, we sat in the lounge sipping wine, sharing a plate of hors d’oeuvres, and looking wistfully at the receding glacier on the far shore.
We had not planned on staying, but after a second glass of wine, spending $400 on an evening’s lodging seems like a most excellent idea. At the reception desk, however, we encountered a small, annoyed crowd and the faces of distressed employees. They were so very, very sorry, they said, but were unable to help anyone at the moment because the hotel’s computer was down. And sadly, they had no idea when service might be restored.
So there we all stood, in the lobby of a world-class hotel, staffed by hundreds of competent and capable people, and not a single one knew which rooms were occupied, which were reserved, and which might be available. And no amount of waving an American Express card was going to change that.
Ever hopeful, we lingered over coffee for another hour, and finally left.
Banff National Park, Alberta, Canada. Image by Alberto Aldana, 2004
What became clear to me in that moment is this: From a customer’s perspective, there is no such thing as high availability: there is only availability or the absence of it.
The reality is, that regardless of the nature of your business, there will be any number of occasions when computers will either fail or must be temporarily removed from service. IT managers typically view such disruptive events as problems needing solutions. But that is a costly misnomer. Treating service disruption as a “problem” means that little will be done about it until it actually occurs—like a tire you can safely ignore until it goes flat.
Disruptions in computer availability are not “problems” in the most fundamental sense. They are in fact symptoms of a much larger, more basic, and far more prevalent problem: Failure to meet expectations. If you run a business that depends on computers for its success and survival (in other words just about anything other than palm reading), the public expectation is that your business will operate flawlessly every minute of every year. And that expectation begins with your own employees and managers.
The second thing that became clear is that there is no point in trying to meet expectations after they have already been shattered. Just as there is no point in performing quality checks on products after they come off the assembly line. The quality must be built in. In the case of computer operations, the choice is to either provide sufficient safeguards, or endure costly disruptions and loss of customer confidence. But trying to meet expectations after failing to do so is like trying to renegotiate an agreement after you have already broken it—not a very effective strategy.
Historically, system integrity was protected by doing backups. But traditional backups are paradoxical. On the one hand, they are designed to ensure the availability of essential data; but on the other, they take that data out of service while the backup is being performed. And, if the system had to be restored to a previous backup, some data would invariably be lost. I have no doubt that the Fairmont Chateau had excellent backups, but in that particular instance, they were of little use.
Valley of the Ten Peaks and Moraine Lake, Banff National Park, Canada. Mountains from left to right: Tonsa (3057 m), Mount Perren (3051 m), Mount Allen (3310 m), Mount Tuzo (3246 m), Deltaform Mountain (3424 m), Neptuak Mountain (3233 m)
The occasional hardware failure, however—regardless of how costly and inconvenient—pales next to the challenges presented by disaster recovery. Consider the natural disasters that occurred in 2016 alone. Winter storm Jonas dumped record amounts of snow on the East Coast of the U.S. Seven states reported in excess of 30 inches of accumulation, with one location in North Carolina topping out at 66 inches. Fifty-five deaths were reported with an upward estimate of $3 billion in economic losses.
There were major earthquakes in New Zealand, Italy, Taiwan, Japan, and Indonesia. Wild fires raged in California and Tennessee where over 1,600 structures were lost. There was flooding in Louisiana, which received over 20 inches of rain in 72 hours. And Hurricane Matthew tore through the Western Atlantic, killing 603 and causing an estimated $15 billion in damage.
These were just the major weather and geological events. Many lesser events caused localized damage and disruption. In a hierarchy of data center concerns, recovery is climbing ever closer to the top as large-scale disasters—previously considered 100-year weather events—now occur with alarming regularity. And while it is tempting to believe that global warming will plateau at some manageable level, there is no current evidence to support that. Weather-related disasters are increasing in number and severity and are likely to continue doing so. Therefore, prudence alone dictates that planning for disaster recovery should be an urgent and crucial concern for IT management.
At a minimum, the essence of a well-run data center is the ability to provide continuity without interruption, and replication without loss. To sustain those goals, the capacity to monitor remote locations, preferably from the device of your choosing, may also be essential especially when technical support resides off site.
Water reflection at Two Jack Lake, a subsidiary part of Lake Minnewanka, with Mt. Rundle in the background.
As for my Canadian holiday, before leaving Lake Louise I grabbed my cell phone and searched for alternate lodging. We found a place that looked inviting so I called to see if they had an available room. The voice on the phone bid me good evening and asked if I could hold.
Yes, I could. And yes, I did.
I listened to unmemorable music longer than I wanted to, until a recorded voice spoke the seven most annoying words in the English language: “Your call is very important to us…” It’s a lie based on the improper use of technology.
Rosabeth Moss Kanter said: “Confidence isn’t optimism or pessimism, and it’s not a character attribute. It’s the expectation of a positive outcome.” That, precisely, is what technology—when properly managed—should provide.
This article is written by Victor Rozek, who is an award-winning columnist and writes for IT Jungle The Four Hundred. #victor4maxava