Why Do We Overcomplicate IT Service Management?
Not too long ago, in the IT service management (ITSM) minefield that is LinkedIn groups, a service delivery manager asked:
“When is an incident not an incident?”
The service delivery manager’s dilemma, in more detail, was that:
“I have a situation where a regular outage occurs on a critical service on a daily basis, the service self-recovers, there is no workaround, no fix and no understanding of how to resolve it yet all that is known is that it’s an issue. Opinions are split as to whether this should be managed as an ongoing major incident, should be problem-managed, or managed by the service manager.”
I read through the many comments and suggestions that people had added to the post and what struck me was how many people seemed to be having the same difficulty – that is, defining the difference between an incident, a major incident, and a problem. Although I have to admit that this is certainly an interesting scenario for confusion.
Then, to further complicate the situation, we also have service requests to add to the mix.
Please Stop the Incident vs. Problem Merry-Go-Round, I Want to Get Off
It seems like a daily occurrence for incident vs. problem questions to be posted in LinkedIn groups. Also questions such as “What’s the difference between an incident and a service request?” and, more often than not, it’s the most simple service requests that cause the most confusion. For example, “Is a password reset an incident or a service request?”
On the surface these should be easy things to categorize but public opinion differs so greatly that these types of questions return again and again. It can be exasperating for LinkedIn group members, in fact it causes almost as much annoyance as when someone throws out the word-hand-grenade that is “We’re implementing ITIL” (and I’d need a whole new blog to cover that puppy).
Thinking Pragmatically About ITSM Definitions
I understand why people want industry definitions and to neatly categorize things into neat bundles, but does it need to be an industry-standard definition? After all, if these worked, we would all be running with the ITIL (the ITSM best practice framework) definitions and LinkedIn groups would be virtual ghost towns.
So take a step back from various and potentially contradictory industry definitions to think about this logically. I agree that these are things that need to be defined and agreed within your organization – there is a need for consistency and understanding. However, surely it doesn’t matter what is agreed as long as everyone knows the internal definitions and the differences, and then deals with the issues encountered or service requests as agreed. Plus, the agreed definitions don’t need to be complicated.
Keep It Simple Stupid – The KISS Principle
I like to quote a pretty clever guy when talking about simplicity – Albert Einstein – who is quoted as once saying:
“If you can’t explain it to a six year old, you don’t understand it yourself.”
Here’s an extremely simple definition set that I use as a guide:
- Incident: Something bad that happens, an IT issue that isn’t supposed to, but the business can temporarily live with it (notice that I deliberately say “business” and not “end user,” I’ll come back to this later).
- Problem: A recurring incident, or one that affects multiple end users, where the cause needs to be investigated in order to deliver more than a temporary fix.
- Major incident: An IT issue where the adverse impact is unacceptable to the business
- Service request: Something that someone needs that they don’t already have.
I did warn you that they are simple!
Achieving Internal Consistency Is What Really Matters
I know there are some potential overlaps with the definitions that will require an internal discussion within your organization, in order to address them. If we return to the password reset example, I would class it as an incident, but some would still argue that this meets the “something that someone needs that they don’t already have” definition of a service request. The end user needs to access the network but they don’t have access to it because they have forgotten their password.
However, in my opinion, it really doesn’t matter how you decide to categorize password resets as long as the decision is made, and everyone is informed of the decision and how to move forward with it. Consistency is key!
Agreed Definitions Need to Be Agreed by the Right People
Another important factor to consider here is ensuring that you’ve had appropriate discussions with business colleagues to agree on things. IT can’t differentiate between an incident and major incident if it doesn’t know what is acceptable and unacceptable to business operations.
For instance, IT might think that a certain system going down at midnight on a Sunday is an issue that can wait until Monday morning. But the 24/7 care team, which IT didn’t even know existed, and who need to access medical records 24/7 because a patient’s ability to breath is at risk – will strongly disagree as they fight to save the patient’s life. If only IT had consulted its business colleagues – a major incident, and life or death situation, would have been avoided here.
So that’s my brief view on how to start to uncomplicate ITSM. What would you add?