Troubleshooting Methodology 2 : Scoping the problem

The following Einstein quote is probably apocryphal, but that doesn’t make it any less useful: “If I had an hour to solve a problem and my life depended on the solution, I would spend the first 55 minutes determining the proper question to ask, for once I know the proper question, I could solve the problem in less than five minutes.”

Scoping the problem or determining the proper question to ask is honestly the most important part of the whole troubleshooting process, as if you don’t know what the problem is how can you fix it? Also, if you don’t define the problem how do you know it’s a problem? That one is a little odd but true but the “issue” could just be a configuration that is never going to work.

A long time ago I used to be broadband support tech in a ISP. Wireless routers were new on the market and just being rolled out. Soon after I started supporting broadband (previously I was supporting only dial-up) I had a call with a customer who couldn’t get their internet working. After about 5 minutes struggling to get a handle on the problem and checking some things on the computer, I went back to the beginning and asked them where their router was. The response was, “Oh that thing? That’s in the shed. Why would I need it -the internet is wireless.”

This one has always stuck in my head. If I had have nailed down what the issue was in the beginning, it would have been a faster solution and there would have been far less messing around. It also illustrates that sometimes the issue may not be a technical problem but a configuration problem. The setup the customer had was obviously never going to work. Correct scoping helps fix these too, and helps keep you clear of rabbit holes.

For scoping a problem I generally have a list of questions that I run through. Some, or even all of them, may not apply to every case, but it’s a good starting point.

The List:

  • What is happening? By this I don’t mean the overall issue, I mean the symptoms of the issue. Computer blue screening would be an obvious example of a symptom rather than an issue.
  • Has this setup ever worked? Is it a new configuration or something that has been in place for some time?
  • What is the configuration trying to accomplish?
  • If it has, when was the last time it worked?
  • How many issues are they experiencing? If more than one, are they all different or related?
  • How many users is this affecting? How many servers is this affecting?
  • How severe is the problem? Crashes , slowness, application crashes or just a vague feeling of unease.
  • If intermittent, how frequent or random is it?
  • Is it reproducible?

This was the second post in a series of 4 on troubleshooting, the remaining two blogs will cover the data collection , results and analysis bits of the process. Hopefully others will find this method as useful as I have over the years.