How DevOps Brings High Velocity to ITSM
“The harder I practice, the luckier I get.”
– Gary Player, professional golfer
At the core of DevOps is learning. And DevOps stands on the shoulders of many giants, with one of those giants being the Toyota Way from which DevOps borrows concepts such as Kata, which helps people, teams, and organizations to improve, adapt, innovate, and achieve whatever they set out to do.
One of the ways in which DevOps turns thinking, such as Toyota Kata, into reality is to put learning and experimentation at the heart of everything it does, with the belief that this is the way to quality. This has led to something called “continuous delivery,” through which high-performing organizations can also operate at high velocity. In the recent State of DevOps Report 2015, by Puppet Labs, this high performance coupled with high velocity is measured as:
- More deployments
- Faster deployments
- More successful deployments, and
- Faster repair times
The DevOps changes that create high velocity, and high performing organizations, happen in the IT service management (ITSM)/operations areas of change, configuration, release, incident, and problem management but it requires three alignment activities to link DevOps and ITSM together. And if they are not aligned, then the performance will not be as strong.
In this blog I want to explain how best to link DevOps and ITSM together, through:
- Having DevOps-minded product teams
- Using DevOps to change – change, configuration, and release (CCR)
- Using DevOps to change incidents and problems
Plus, the anti-patterns of doing DevOps with ITSM.
1. Have DevOps-Minded Product Teams
Start to think of CCR as a value stream per product, instead of big, hairy functions that all products flow through. For example, if you have two systems, one called Order-to-Cash and one called Banking WebApp, have two separate product teams that are responsible for executing CCR in a DevOps manner, reporting all the usual metrics and using the same tools as the ITSM team (just in a different way).
One of the biggest barriers to high performance is high lead times, which is a common unwanted by-product of multiple handover points. This is resolved by having teams aligned to products (think left-to-right horizontal) instead of functional process areas (think vertical silos with gaps and handover points that created delays and mistakes).
2. Use DevOps to Change CCR
The most important thing a DevOps-minded product team will do for CCR is to reduce batch sizes, limit work in progress (WIP), and make changes more frequently. In doing, so they will develop a very robust CCR process that can be relied upon.
Breaking a product into loosely coupled architecture (e.g. microservices) is something a DevOps-minded product team might do (depending on the application), because any product changes can then be limited to specific parts of the product without breaking the whole service. They can also introduce “graceful degradation” techniques to help protect the customer from issues, and to manage dangerous scenarios such as cascading failures.
The DevOps-minded product team will also increase the velocity of CCR by reducing change batch sizes and decreasing the dependencies between them, while increasing the frequency of change. Small regular changes applied to small areas of the customer-facing product cause less damage and are easier and faster to fix.
Conversely, “big ball of mud” changes done twice a year cause outages that are hard to fix. Anyone who has worked in large, complex, enterprise IT organizations will be familiar with this mode of operation that is being debunked by the more scientific approach of DevOps.
However, just applying DevOps to CCR is not enough, because DevOps people need to embrace and learn from incidents and problems.
3. Use DevOps to Change Incidents and Problems
IT incidents and problems are traditionally frowned upon as failure, causing: organizational fear, resistance to change, and effectively pouring concrete over people, process, and technology.
Some believe that perfection is having zero incidents and zero problems. But I believe that this is inhumane and fantastical thinking: inhumane because individuals are rarely the sole cause of incidents in complex, fragile systems; and fantastical because complex systems are inherently imperfect.
A DevOps-minded product team will embrace incidents and problems as opportunities to exercise the core of DevOps – learning. KPIs such as mean time between failures (MTBF) and mean time to recovery (MTTR) are important but it is vital not to use these for punishment. Instead they should be used for organizational alignment (with everyone focused on the same KPIs) and for common goals (something to aim for, to measure gaps, and to keep improving).
The DevOps-minded product team will seek to use incident and problem management activity for two things:
- Telemetry of incidents, providing data about system-wide behavior including the process to deliver CCR, not just the product itself. For instance, successful organizations can point to visual incident data of when releases failed as causal effects of service outages.
- Problem resolution will feed into design and building quality into the system. This is in effect the Toyota Andon Cord, which stops the production line until a fix is identified and built in, preventing further occurrences of an issue or issues.
Anti-Patterns of Doing DevOps with ITSM
The biggest challenges to bringing velocity to ITSM, and therefore barriers to being a high performing organization, are all surmountable as evidence by the growing list of successful companies that use DevOps and ITSM. A number of these common barriers have emerged, and were explained (with humor) at the 2015 DevOps Enterprise Summit (DOES15) by Ross Clanton and Mark Peterson of the US retailer Target.
Their humorous take on DevOps and ITSM points to a number of anti-patterns, i.e. the things that your organization shouldn’t be doing:
- Ignore customer satisfaction and needs, let techies drive the answer.
- Functional silos are best, get cheaper labor and economies of scale with service pools and optimize costs.
- Limit collaboration and communication; believe that it is cheaper to pass tickets and forms around rather than to have humans work together.
- A focus on local optimization of silos and individuals is more important (who cares if we sub-optimize the whole system?).
- Optimize around projects; don’t let high-performing individuals and teams get bogged down in production support.
- Hyper-specialize roles and use Taylorism to break down activities into small components with many hand-offs.
- Roll up features and functions into large batch sizes (change bundles).
- Have lengthy approval processes via large committees.
- Don’t learn from history and others outside your own organization.
- Let HR dictate structures; don’t allow cross-team comms such as ChatOps.
- Highly customize incentives for individuals such as leaders.
- Throw money at problems before you finally scrap it.
- Work long hours with lots of motion (heroes).
- Rely on top-down cascading messaging to change culture.
- Repeat the same problems without learning.
- Focus on the long-term plan over short-term success and building on it.
- Engineers should be considered artists and true beauty happens at the command line.
- Avoid risk by eliminating change.
So how many of these does your organization unfortunately suffer from? And what are you doing to prevent these anti-patterns and to bring high velocity to ITSM?