Dave's WebLog

Welcome to my WebLog. Hopefully this will be my little corner of the web where I can be honest and say what I want to say. Please bear that in mind. This is me.


Disclaimer: This weblog in no way reflects the views of my employer. This weblog, like any other opinion piece is intended to be read as such and not taken seriously at all.

NOTICE!


I have found a new place to do my Blog (HERE - davespace.blog-city.com). I will try to keep this somewhat up to date so I can continue to post in other Blogger Blogs but there are no guarantees.

Quotes:


Arnold Ross: “Think deeply of simple things.”

Hanlon's Razor: “Never attribute to malice that which is adequately explained by stupidity.”

Gunther Rall: “So every airplane has some problems in some areas, and if you know it, you can overcome it.”

Name:
Location: Moncton, New Brunswick, Canada

Friday, July 30, 2004

Nothing like it..

There is nothing like the pressure and stress of post-install outage investigation. There's all that finger pointing, blaming, scurrying around and covering your butt - it almost makes it a sport. (except for the high salaries and there are not nearly enough fans cheering you on).

One of the biggest issues is the outage call. For us it involves about 20 managers from all levels of the organization on one call at the same time. Add to the mix an unhealthy helping of developers who are occupied with covering their butts and actually trying to solve the problem presented to them. The bigger problem is that most of the managers are pissed and they want someone's ass right then. Rather than wait patiently, stalking their prey, they demand to know right then who and what caused the outage and what can be done to fix it in the future.

I'm sorry. Piss off.

My solution to this mess is to have 2 calls going at the same time: the political call and the resolution call. The key is to have someone experienced in crisis management handling the in-between stuff. Let me give you an example. The outage today was caused by an unknown problem in the code that was installed last weekend. My boss and I are on the political call, fielding questions and blame as best we can. At the same time, the developers are working together through instant messenging and phone calls to investigate, troubleshoot and resolve the call. My boss and I on the political call intercept questions and orders to get the developers on the call and, instead, act as intermediaries between the two.

Q: How long until a fix is in place?
*pause as I ask over my shoulder to the huddled group*
A:15 minutes.

Q:What caused the outage?
*pause as I listen in on the converstation the developers are having and ask a quick question to someone who is involved in the discussion but not active at a keyboard or explaining something to the group*
A: Problem with one of the database tables and the load script.

This leaves the troublshooting team free of the politics of a dozen miffed managers and the fingerpointing that is sure to follow. It keeps the team focused and the information flowing through the pipeline in an orderly manner. Requests and responses down one managed channel.

0 Comments:

Post a Comment

<< Home