Cloud monitoring solutions have become very powerful of late. When you think back to the old view that CPU, load average, and disk size were enough; the agent-based insights that are now possible are truly remarkable. All major cloud vendors have their monitoring solutions, which are complemented by a range of 3rd party vendor offerings. They all deliver powerful insights to you via their scaled design, which sees metrics and logs feed monitoring dashboards along with data repositories. Monitoring via conditional configuration provides alerts in a manner of ways to inform you that something has gone wrong along with its recovery point should you configure it as such. I reviewed a duo of alerts this morning illustrating this very point. The capability of monitoring cannot be underestimated, but the power of monitoring depends on how you design, configure, and implement your monitoring solution. Here are some thoughts on this challenge, which often does not receive much focus in a set of project requirements.
Firstly, do project requirements have any detail on what the expectation of a monitoring solution is? If so, are they setting objectives such as a 'monitoring solution to measure infrastructure health to x level'? If so, include them in your monitoring solution considerations, as you now have strategic guidance on how detailed you need to get.
Is your cloud infrastructure supporting resources that are considered revenue-earning? If so, you should invest in detailing an agent-based solution to mitigate the risk of application failure (and revenue loss) in a highly monitored solution?
Is your cloud architecture monolith, or microservice-based? Also, are you working to detail a monitoring solution in a highly available or lowly available environment? The wider the surface, the more you need to monitor. The more impact a loss of resource functionality is on your business, the more detailed and deep your monitoring solution needs to be.
Does your company have an on-call team or are you office hours only? This is critical in determining the level of automation you deploy in your solution and how your alerting patterns are set. For example, auto remediation of >90% CPU in an office hours-only environment may cause money-making sessions on an alarming server to drop if you configure automation to restart it. Maybe >90% with a load average that exceeds the virtual core size is a more qualitative conditional pattern before you trigger a server restart by a lambda function.
Automation has managed features and products laid out by all cloud vendors in their offerings. Are you comfortable designing a monitoring element in the knowledge its implementation may need to be redeveloped if you ever moved solution provider?
Do you know your time to detect, time to notify and time to remediate? Is it detailed in project requirements or company policy? After learning these values or expectations, is your monitoring solution configured with the appropriate mix of lead (metrics) and lag (logging) indicators? Both data workflows provide powerful insights when included in a monitoring solution. When deployed correctly, they underpin site reliability supporting company policy around digital product quality based on these key performance indicators. They can heavily influence how you set up, configure and handle your monitors, and their alerts.
There is as always more to do in projects around monitoring. I have walked into cloud projects where a 3rd party monitoring solution was chosen based on corporate decision-making but was not technically the most desirable. However, with a considered view of requirements and expectations, I was able to design and configure the most effective solution to the circumstances of the project. My job in addressing the risk posed by digital assets in scope was done through that lens of keeping objectives in mind as I solution the best possible outcome for the company.
To not miss out on any updates on my availability, tips on related areas, or anything of interest to all, sign up for one of my newsletters in the footer of any page on Maolte. I look forward to us becoming pen pals!