Cloud Monitoring is at the Heart of Your SaaS Operations
In today’s cloud environment an idea can quickly transform itself into software. The small developer can generate and run the application on a PaaS platform or on an Amazon cloud micro-instance. At first the idea can be pitched to potential users at very low cost. Product operations can easily track and support the first users, making sure that the service is available and fits the users’ needs. The R&D team of the newly established start-up can work directly on the production servers. The new global SaaS grows at a steady pace while adding more and more customers, hence new global users.
In such a scenario, the R&D team typically finds itself enlisted to perform operational tasks to keep the system live. The R&D manager deploys a monitoring system and the team wants to solve all uptime issues, generating more and more operation automation scripts. At the end of the day, the company management realizes that the product development is harmed when 30%-60% of the R&D efforts are invested in uptime maintenance and monitoring. Does the R&D like to wake up in the middle of night? The obvious answer is No!
“Animoto.com average of 5,000 users a day spiked to 750,000 in three days. At one point, 25,000 people used Animoto in one hour.” Read More
The cloud, specifically the Amazon AWS cloud, contains an “endless amount of resources” and together with its elastic manner supports exponential growth of a new online web service. The new SaaS organization must plan ahead for good times, to be able to accommodate rapid growth as its customer base expands.
It is important to define your monitoring system in advance, including defining alerts, priorities, categories, constraints, and so on. In most cases, the process of defining the relevant alerts begins “on the fly” with a request to add an alert, for example, if the DB service, the billing system or the main page is down, or if a particularly important customer has a problem.
So what are the first steps to ensure that you are on top and prepared for your “Animoto great day”?
1 – Know and Measure Your Monitoring KPIs – don’t wait to find the point to measure after your great day (don’t worry you will find some more afterwards). Thresholds should take extra risk in mind, for example, time to launch new instances. Besides being able to know about issues as they occur, you will also obtain important after-the-fact data that you will be able to investigate and learn from.
2 – Someone on Duty – You probably don’t have everything automatically in place so make sure to have someone from your team who is fully dedicated. Don’t work in panic mode, make sure to arm him with specific script protocols based on what/if scenarios.
3 – Know your application capabilities – What’s your application edge constraints with regards to amount of requests, storage, file management, etc.
4 – Data Backups - Make sure to have backups in place for your critical data and application. Backup automation in place must also including monitoring of the processes in place.
Automation is important, however, bear in mind that it actually extends the need for monitoring. In most cases, life monitoring takes place after the fact. As the application is dynamic the monitoring is “running behind and alerts are being added after a problem was occurred). New SaaS vendors must be able to implement minimum viable monitoring before they launch their next marketing campaign, and make sure to run a refinement cycle before the next one. Automation is important, but not always, sometimes for deeper investigation you will prefer to run a manual process due to a low occurrence frequency or a need to control while automation include great efforts.
Monitoring and overall uptime operations are perceived as such only once deployed. The reality is driven from SaaS product releases. These are frequent and hence the bugs, alerts, and support calls. Eventually, the uptime system needs to be enhanced and changed continuously to support the new features and the new application behaviors due to demand growth. The uptime operations should also maintain a road map that includes specific immediate fixes and ongoing mini-projects. Contrary to the traditional on-premises IT deployment, in the SaaS reality the vendor is liable and needs to react immediately, be transparent, and show improvement.
About the Author
Avi Shalisman, CEO & Co-founder in MoovingON LTD. Over 25 years of experience in management of complex services in mobile, billing and telecom. Vast experience with production environment architecture and on-going operational management, starting from working process, monitoring, application support down to infrastructure level. Avi leads MoovingON, which provides UPTIME management as a manage service, including 24/7 NOC, Tier 1 & 2 support and end-to-end operational package for startups.
Keywords: Amazon web services, Amazon AWS instances, Amazon cloud computing, Cloud Monitoring, AWS CloudWatch, CloudWatch Alarm, CloudWatch Alert, Cloud Watch, AWS Performance, cloud SaaS, cloud PaaS, Cloud Monitoring, NOK, Cloud Elasticity, CPU Log, Cloud Capacity