It’s a peaceful Fall day – the wind is blowing through the colorful trees, the birds are flying South overhead, and all seems right with the world. But I know better. I know that somewhere out there, a company’s storage usage is getting close to exceeding its threshold. Another’s network equipment is not responding. And worse still, another business’s internet has just gone down. Somewhere out there, a company’s team is freaking out, unable to get into any of their systems and work is halted, people are upset, pitchforks are raised. Downtime means loss time, money, and sanity.
I know this because I work in Managed Services and I see the constant flow of alerts and alarms that are triggered by our event management and monitoring software. When something bad happens to an individual user or an entire company’s IT, we’re the ones they turn to for help.
One common challenge that those who work in Managed Services face is what’s called alert fatigue.
The never-ending stream of alerts that notify the team when there’s a problem can be overwhelming. Team members responsible for handling alerts often experience longer response times, burn out, dissatisfaction with their work, and anxiety about failing to understand the real impact of the issue. Essentially, every alert may become meaningless because the monitoring software has “cried wolf” so many times.
As technology requirements and capabilities grow and evolve, the systems responsible for checking their health must do so as well. This means more tools and alerts, followed by new processes and procedures to handle them. The teams responsible for handling these can get overwhelmed by the volume of alerts and ultimately become desensitized to them. This leads to alerts that do matter getting missed because they are lost in the volume of events that are not actionable.
Think of being in one meeting room with several independent conversations are going on at once. Where do you start? How do you prioritize what you tackle first?
Four Steps to Tackling Alerts and Defeat Fatigue
If you’re in the habit of fighting endless fires, it’s time to refocus your efforts by developing a strategic process around alert management that reflects the new business realities of today. Here are four simple steps you can start taking to help reduce the likelihood of alert fatigue.
- Identify Redundant Alerts
The primary culprit of alert fatigue is sending the same meaningless, non-actionable alert over and over. Focus on the noisiest alerts first.
- Give Alerts Context
For alerts to have meaning they must have context. What data points must be collected together to fully understand the alert? Where did the issue originate, and what other systems were impacted? Giving context will help gather all the clues to find the root-cause, as well as make the process faster.
- Consolidate Alerts into a Single Pane of Glass
Designating a single place for all team members to review active alerts ensures everyone is looking at the same data - no matter where it came from. This will improve collaboration amongst all contributors to ensure redundant alerts do not fatigue the team, and that alerts continue to have context and actions associated from them.
- Business Correlation
Put in the extra effort to document what end user applications and or experiences were negatively influenced by the alert. Doing so will give your alerts more context over time. This information will allow your support teams to improve their alert process and procedures. Ultimately leading to a better user experience.
Managed Services is not for the faint of heart. Engineers must always be thinking strategically and seek to find the root cause of each problem. Investigating each alert individually and answering as many unanswered questions about it to put it into context will save time in the long-run and help you feel less like a robot executing on mundane tasks and more of what you are – a problem-solving engineer. For me, working for Managed Services is extremely rewarding – there is always a puzzle that needs to be solved and I’m always stretching myself to find new solutions.
Businesses who are thinking about partnering with a Managed Services Provider should ensure that the MSP utilizes processes and procedures and deploys root-cause analysis to solving problems. This way they won’t be affected by alert fatigue as well.