How do you set up an effective IT incident management process?
From a blocked printer output to an application that's out of service, there are many incidents of varying degrees of criticality that your IT system may experience. Hence the importance of implementing an incident management process.
But how can you ensure the performance of your incident management procedure? What resolution steps should you define, and how should you determine the roles of each player in your process? Is it possible to provide a satisfactory solution for the user, in line with your SLA (Service Level Agreement), and within reasonable timescales?
To help you achieve greater efficiency and consistency, Appvizer's article explains the principles and stages of the ITIL framework, and reminds you of the benefits to be gained from this way of working.
What is IT incident management?
Most IT incidents are managed in accordance with the ITIL( Information Technology Infrastructure Library) standard.
But what exactly is ITIL 🤔?
A project developed in the 1980s by the British Office of Government Commerce, ITIL is a set of documents listing the best practices to be applied in managing IT services on a broad basis. The aim is to provide methodological support for professionals, with the intention of continuous improvement.
The ITIL process covers several themes (organization of the information system, configuration management, change management, etc.), including incident management, specified as follows:
An incident is defined as any event which is not part of the standard operation of a service and which causes, or may cause, an interruption or reduction in the quality of that service.
💡 This definition encompasses different types of incidents:
- software or application incidents. Examples:
- program error slowing down the user,
- application slowdown, etc.
- hardware incidents. Examples include
- printer output blocked,
- hard disk nearly full, etc.
- service requests. Examples: forgotten password
- forgotten password,
- request for special documentation, etc.
Incident management VS problem management
Incident management is often confused with problem management. Yet they involve different procedures.
According to ITIL, problem management is used to :
Minimize the negative impact on business of incidents and problems caused by errors in the IT infrastructure, and prevent the recurrence of incidents induced by these errors.
➡️ In other words, problem management is more proactive, while incident management is more reactive.
Nevertheless, the two processes work in parallel, with problem management operating through the identification of recurring incidents.
Why is incident management important?
A standardized process for managing your incidents generates numerous benefits for your company 🤩 :
- it reduces the sometimes critical impact of incidents on the company and the business more quickly;
- it greatly simplifies the procedure by avoiding, for example, back and forth emails ;
- identifies recurring incidents, enabling the deployment of the problem management process mentioned above;
- it improves the quality of the business knowledge base, thanks to the creation of incident handling databases;
- brings transparency to incident resolution within the organization;
- increases user satisfaction and the productivity of all company players.
☝️ Keep in mind that an incident management process goes beyond simply resolving an IT problem. It provides solid support for the company's business functions, reducing the number of slowdowns or stoppages that impact on sales.
Example of a 5-step incident management procedure
#1 Identify and record the incident
To begin with, you need to identify the incident, specifying :
- its name and number,
- the identity of the person responsible,
- the date on which the incident occurred,
- and above all its characteristics (nature, severity and impact on operations).
👉 E.g.: a server breakdown affecting several departments will be considered a major incident, while a connection problem at a single workstation will be considered less critical.
It's up to the responsible department to record these details on the device of their choice (software, spreadsheet, form, etc.) and report it to the support teams responsible for handling it according to procedure.
#2 Incident classification and analysis
The incident is then classified according to the order of priority defined upstream and specific to your organization, depending, for example, on the impact on the business and the urgency of the situation.
👉 E.g.: a network failure could be classified as a "connectivity" incident, with a "high" severity level if it paralyzes the entire company.
At the same time, an initial analysis is carried out to determine the possible causes of the incident. Diagnostic tools or even previous experience can be mobilized for this assessment.
☝️ Note that if this is a service request, you must follow the associated procedure.
#3 Investigating and diagnosing the incident
All information relating to the incident is analyzed, with the aim of resolving it and getting it back into service within the required timeframe. The teams in charge of this work use a variety of methodologies, from log analysis to real-time testing.
👉 E.g.: if a server goes down, the team will consult event logs for critical errors, or use monitoring tools to check hardware performance.
Be aware that sometimes the first level of service is unable to resolve the incident: this triggers an escalation of incidents, i.e. their resolution is transferred to the next level.
#4 Incident resolution and return to service
Incident resolution takes various forms:
- the incident is repaired immediately. It has been resolved and operations are back to normal;
- a workaround has been found. Indeed, incident management must lead to the rapid restoration of services. If the system is not perfect, but makes the situation "acceptable", the process is respected.
☝️ Note that if the underlying causes of an incident are unknown, but seem to share the same origin, it is recommended to initiate a problem management process. Remember that incident and problem management flows are often crossed.
#5 Closing the incident
To close an incident properly, the teams in charge of the process take a number of actions:
- they take care to record all the details of the incident and the time spent on it. ☝️ This documentation is used to create a history that can be consulted to improve protocols in the future;
- they inform the user of the resolution;
- they ensure that all solution details are clear and legible.
This level of detail reduces the risk of conflict between different stakeholders.
Stakeholders in incident management
Different stakeholders are involved in incident management. While they differ from one organization to another, a few basic roles can be identified:
- The requestor/user: he/she reports the incident, clearly specifying what it is. The technical team may also call on them at the end of the process to respond to inquiries.
- The different levels of support: depending on their level, the support teams provide the solutions needed to resolve the incident, and sometimes reassign the unresolved incident to the next level up.
- The Incident Manager: guarantees the proper conduct of incident management, plans the procedure and may recommend areas for improvement.
- Process owner: assumes overall responsibility for the incident management process within the company. They may also be responsible for defining KPIs (Key Performance Indicators).
10 best practices for incident management
To better prepare yourself to manage IT incidents and minimize their impact on your organization's operations, we recommend you follow these 10 best practices:
- ✅ Train staff. Make sure the support team is well trained on procedures and tools. The aim is to ensure both rapid and accurate diagnosis.
- ✅ Prioritize effectively. Establish clear criteria to intelligently prioritize incidents according to their severity or impact on the business.
- ✅ Establish rigorous documentation. Document every stage of resolution, from diagnosis to corrective action, for effective follow-up and future learning.
- ✅ Communicate transparently. Communicate clearly and regularly with stakeholders to keep them informed of incident status and actions taken.
- ✅ Implement a validation process. Before closing any incident, validate the resolution with users. This confirms that their problems have been fully resolved.
- ✅ Carry out a post-incident analysis. Carry out a post-incident review. It will serve to identify root causes as well as potential areas for improvement.
- ✅ Update the knowledge base. Regularly update the knowledge base with incident resolution information, again to help resolve similar incidents in the future.
- ✅ Automate repetitive tasks. Use automation to manage routine tasks, such as incident triage. The time saved will enable the team to concentrate on more complex problems.
- ✅ Think "continuous improvement". Carry out regular audits of your incident management procedure, with the aim of identifying opportunities for improvement.
- ✅ Use an incident management tool. This is undoubtedly the most important tip! Indeed, by investing in a robust incident management system (ITSM in particular), you track and document all incidents centrally.
The right tools for incident management
Now that you've got a clearer picture of incident management, you may be wondering how you can put all these recommendations into practice. Can you see yourself applying your incident management procedure using an Excel spreadsheet or a conventional project management tool?
Fortunately, specific software has been developed to support your teams at every stage of the incident management procedure.
To help you, discover our selection ✔️ :
- Jira. Developed by Atlassian, the Jira ticketing tool standardizes the processing of tickets opened following the reporting of an incident.
😀 Why Jira?- create tickets with a precise level of information (descriptions, severity level, etc.) and follow all the processes required to manage them ;
- easily classify and prioritize bugs, and assign them to the right employee or department;
- integrate your tickets into a ready-made workflow, or customize one to suit your needs and processes.
- NinjaOne. NinjaOne is a complete IT asset management solution for SMEs, ETIs and large corporations.
😀 Why NinjaOne?- centrally and proactively supervise your entire IT infrastructure to detect incidents as early as possible ;
- automatically and reliably apply the necessary patches to all your terminals;
- store all standardized, structured process documentation within the platform.
- Octopus. Octopus is ITSM (Information Technology Service Management) software.
😀 Why Octopus?- benefit from a tool developed in line with ITIL best practices: your teams can apply them naturally without needing to master them perfectly beforehand ;
- easily manage requests from your users, whether incidents or service requests;
- improve preventive action, thanks to a database that manages all aspects of your information systems' configuration.
- Splunk Enterprise Security. Splunk Enterprise Security is a SIEM (Security information and event management) designed to support you in strengthening the security of IT systems, and in incident management.
😀 Why Splunk Enterprise Security?- benefit from a solution focused on analytics and therefore streamlining cybersecurity-related tasks ;
- get real-time information thanks to customized dashboards and views;
- detect incidents faster and take preventive action.
What does IT incident management mean?
Incident management, as standardized by ITIL, is a procedure that you should quickly integrate into your information system, as it promises to provide a clear and rapid response in the event of a setback.
What's more, it gradually leads to a reduction in the number of incidents by feeding your problem management processes, and thus your preventive actions.
And the good news is that everyone benefits from implementing such a working method:
- technical teams work more efficiently and transparently ;
- users are less affected by bugs and more satisfied with your product;
- the company incurs fewer losses in the event of a critical incident.
Finally, it's worth remembering that good incident management goes hand in hand with the use of relevant tools, which support your process and save your teams precious time.