Establish Change Risk Metrics to Drive IT Agility
Risks in the production environment can have major consequences for users and the bottom line. Change risk is a particularly egregious driver of production risk. Modern organizations are strongly invested in improving agility, so when systems fail, they not only impact operations and profits but the overall pace at which changes can be implemented and managed.
In many organizations, the change advisory board (CAB) oversees all changes with the goal of reducing change-related incidents. Unfortunately, CABs are often overwhelmed with the sheer volume of changes they must review and approve. Ideally, the CAB should have an analytical way to identify, assess, quantify and respond efficiently to the various risks that could enter the production environment.
Uncovering KPIs that reveal the hows and whys of managing IT production risk
To make objective decisions, CABs should review data from the actual applications, tools, and programs essential to the production environment, including the services that are being monitored and maintained.
Aggregating data from across these types of sources allows the creation of highly informative KPIs. Ideally, these KPIs can reveal the most critical information CABs and IT leaders need to know the current risks facing their production environment.
Metrics employed in measuring production stability include:
- Performance degradation
- Application errors
- Customer-reported change-related incidents
- Autogenerated change-related incidents
Metrics and KPIs become particularly powerful when they are combined to form a higher-order KPI, sometimes called a derivative. These bring together more variables and data to offer complex information in a synthesized way and tell a story about production risks.
An example expressive KPI for managing IT production risk: “Change risk credit score”
One example is a behavioral KPI that can be referred to as a “change risk credit score.” This type of KPI was developed by a Numerify customer that was a director of service management for a leading healthcare service provider. The team struggled with change activities, which caused 2 out of every 3 problems they identified.
They initially hypothesized that these problems were driven by emergency-related changes, but the data showed the real driver was human-related factors. This discovery led them to create a “change risk credit score” that could present an objective marker of relative risk and establish accountability for the correlating change manager.
The KPIs were created using each change manager’s own historical performance data, including their change success rate, the number of changes that caused outages over a period, and the number of open problems remaining. The score ranged from 300 to 850, not unlike a FICO score. Each change manager had access to the score and could improve it using the underlying metrics such as closing out more open problems, going longer between outages, and achieving a higher change success rate.
Implementing this expressive KPI dramatically cut down on the discussion time during meetings and the amount of money and effort IT leaders needed to address change-related issues.
What this example shows is that people are highly compelled by narratives, so discovering KPIs capable of revealing cause-and-effect relationships or illustrating the full nature of change risks can have the effect of earning collective and rapid buy-in.
Using AI-powered analytics to discover deeply buried insights in IT production environment data
Artificial intelligence can significantly enhance analytics by discovering KPIs that could never be achieved through traditional methods such as pivot tables or spreadsheets.
Predictive models provide information that can be used to proactively address production risks rather than reactively respond once an outage or adverse event has already occurred. They can assess production risk by examining KPIs and automatically selecting the most relevant attributes to illustrate the factors that can make those KPIs rise or fall.
Dashboards make IT production risks visible and put them in perspective
Visualization of data, metrics, and KPIs makes all of these activities much more efficient by presenting information in an actionable way. A dashboard presents information in a highly visible and objective single source of truth for all key stakeholders, driving unity even across complex operations.
Higher-performing dashboards allow IT teams to explore data — make comparisons, develop ad-hoc KPIs, or drill down into specific incident clusters for more information. Specific analyses can be applied to trends or current metric curves to answer new questions as they emerge.
The healthcare service provider client referred to this capability as akin to unifying all of the disparate perspectives and informational views into a single pane of glass that gave everyone the same view. All this information is sourced from dozens, sometimes hundreds, of different activities and systems of record, compressing them into a single, ever-evolving object of value that can make production risks more visible than ever.
Perhaps more importantly, capabilities such as predictive models and risk scoring give prescriptive information that don’t just tell IT production risk management teams that they need to respond but potentially how to respond based on a change-related incident’s likely impact and its likelihood to occur. This information is incredibly vital, and it accelerates the processes needed to keep organizations agile while lowering the chances that they will break something critical through change deployment and other production activities.
Learn more about how Analytics and AI can reduce production risks and change-related problems in our webinar “Make IT Change Management Smarter and Faster with Artificial Intelligence (AI) ” featuring speaker Charles Betz, Principal Analyst of Forrester
What Kinds of Data Should You Be Using to Reduce IT Operations Risk (Part 1)
Changes to the production environment can precipitate major incidents, disrupting critical business services. In fact,…
Using AI to shift from reactive to proactive major incident management
What Predicting Tornadoes and Major IT Incidents Have in Common When the weather turns bad,…
How to Prioritize IT Problems for Maximum Impact
Prioritizing which problems to tackle depends on your organizational objectives. For example, businesses worried about…