Having worked in web IT operations for some years I have had interact with and lead initiatives with a number of network operations center (NOC) organizations. What has consistently struck me is the focus on a few statistics related to immediate problem resolution while ignoring other mangerial operational metrics that can be used to understand the nature of the various problem origins and identify ways to eliminate them. Here is a list of key performance indicators (KPI) I feel every NOC should track to help their managerial staff make better decisions.
Incident / Ticket Management
- Total Incidents
- Incidents open
- Incidents Closed
- Incidents reopened
- Mean time to notify
- Mean time to Isolate
- Mean time to Resolve
- Avg incident create to resolve duration
- Avg incident resolve to close duration
- % duplicate incidents
- % incidents auto generated
- % incidents auto resolved
- % incidents caused by changes
- % incidents caused by change management issues
- % incidents causing changes
- % incidents created by Tier 2
- % incidents created by Tier 3
- % incidents escalated
- % incidents linked to problems
- % incidents linked to testing errors
- % incidents misrouted
- % incidents caused by data integrity issues
- % changes audited with errors
- % changes causing incidents
- % incidents linked to lack of training
- % incidents received by email
- % incidents received by phone
- % incidents resolved by Tier 1
Change Management
- Changes created
- % changes rejected
- Changes implemented
- % high risk changes
- Avg change implementation time
- % emergency changes
- % changes implemented within target time – critical
- % changes implemented within target time – high
- % changes implemented within target time – medium
- % change failures due to change management review issues
- % changes – post implementation feedback
- % changes – post implementation review
- % changes – process compliance
- % changes audited
- % changes audited with errors
- % changes causing incidents
- % changes causing problems
- % changes implemented without back-out plan
- % changes implemented without testing
- % changes requiring scheduled outages
- % changes specified inaccurately
- % changes with incorrect data
- % changes without sign-off
Tier 1 – Service Desk
- % calls abandoned
- % calls answered
- % registered support users logging service requests
- % users registered for it support
- Avg call abandonment time
- Avg call answer time
- Avg call talk time
- Avg cost per call
- Avg service requests per user
- Avg daily incidents handled per service desk agent
- Number of incoming calls
- Customer satisfaction – service support
- Surveys sent
- % survey response rate – service support
- % calls converted to service requests
- % Tier 1 support positions unfilled
- % Tier 1 support trained
- % IT support with industry certification
- % IT support staff turnover
Routine Service Requests
- Service requests created
- % service requests resolved within target
- Avg service request create to resolve duration
- % service requests auto generated
- % service requests caused by changes
- % service requests caused by data integrity issues
- % service requests caused by virus
- % service requests dispatched
- % service requests linked to lack of training
- % service requests received by email
- % service requests received by telephone
- % service requests resolved by first level support
- Avg service request resolve to close duration
- Avg service request cost
- % of calls converted to service requests
- % of service requests resolved on initial contact
- % of service requests resolved by second level support
- % of service requests resolved by third level support
- % of service requests resolved by knowledge base
- % of service requests escalated
- % of service requests misrouted
- % of service requests reopened
- % of service requests void
- % of service requests requiring onsite support
- % of duplicate service requests
This list is by no means complete, but even in this limited form it can be especially powerful if the KPIs are measured per, engineer, tier, group, and/or subsystem for additional analysis detail. Even so, it is a long list and will have to be modified to your needs and possibly automatically integrated into your monitoring and ticketing systems for the minimum intrusiveness and maximum acceptance and compliance. It should provide a starting point for any team that wants to have a better understanding of its mission and also for the company on the whole that needs to more quickly isolate, fix and reduce operational problems in both the long and short term.

