AWS Cloud Best Practice: Introduction to High Availability Deployment
This article is a quick introduction on designing to achieve High Availability (HA) with your AWS infrastructure architecture. We’ll start with a simple (and vulnerable) architecture design and work our way up to a complex multi-region setup. How far you go up these steps is up to you, depending on your uptime requirements (or SLAs) and budget.
Deploying applications on AWS can appear daunting at first; there are lots of new terms and concepts to familiarize yourself with. Before you launch your first instance on AWS, you should understand all the different Amazon cloud service options.
After tracking more than $200,000,000 in AWS EC2 spend, Datapipe Cloud Reports has identified that more than 35% of its users operate an AWS cloud that is highly vulnerable to outages. Many cases were identified in which users had improperly configured Elastic Load Balancers (ELBs) and as much as 40% of users’ cloud data wasn’t backed up. High availability when deploying applications is crucial. This article will explain how Cloud Reports analyzes your availability levels but first let’s talk about best practices when deploying applications in the cloud.
(Disclaimer: I work for MadeiraCloud, a visual tool to assist you with designing and provisioning your application architecture on AWS, as shown in some of the following graphics.)
So how many data centers does Amazon cloud have worldwide? A common answer is the seven regions pictured above, but this is not completely accurate. You should choose a region based on its proximity to most of your customers or specific legal or other requirements, but regions are really just geographic groupings of data centers.
Availability Zones (AZs) are more like independent data centers than regions are, in that they are distinct locations within a region that are engineered to not share a single point of failure (SPOF). So a better way to think about it is that Amazon cloud services has 18 data centers, which are then grouped by geographical proximity to provide low latency and inexpensive connectivity to other Availability Zones in the same AWS region.
An interesting note is that the names Amazon gives the AZs are not universal for all users. For example, if you and I both launch instances in the US East region, we know that they are both in Virginia so if they need to communicate, latency and cost are low. However, if we both launch instances in US-East-1A, they are not necessarily in the same AZ. This doesn’t make a difference if one user is launching instances in the same AZ, but will affect data transfer charges between two instances owned by different users.
Let’s take a look at how all this affects you when deploying your application to maximize availability:
Below we have an architectural design view of a simple application, just one web server, one application server, and one database server. The orange box shows that all the EC2 and RDS instances are within the same availability zone, and hence AWS region. This is the simplest and cheapest way to set up an application with these basic requirements, however if any one node goes down, so does your application!
So let’s improve the resilience of the application by adding auto-scaling (the blue box) to each tier of the application, with an Elastic Load Balancer (ELB) diverting traffic between the web tier, and adding a slave database set up to automatically sync data and ready to take over if anything happens to the master database.
This is better. If any individual node goes down in any of the tiers, there is a replacement automatically on hand. However, it’s still inside the same AZ of the same region, so if the AZ goes down, so does your application!
AWS Availability Zones rarely go down, but it does happen. To protect against the failure of an entire AZ, we can duplicate our architecture across two AZs so that we have a whole backup ready to go in another AZ should anything happen.
We can use the AWS elastic load balancer (ELB) to divert traffic between the two AZs, and although our master DB is still in one AZ, it is now syncing with a slave in both AZs should anything happen to either one.
This is much, much better. Because of the way AWS has engineered AZs to share no SPOF, the odds of both AZs going down are very low. However, they are both inside the same region, which is in the same geographic area, so a catastrophic event such as an earthquake could potentially take down both. In the unlikely event that the region goes down, so does your application!
If we really want to design for the ultimate in high availability, we need to reproduce our architecture across multiple regions and use a DNS service, which AWS handily provides with Route 53, to route between the ELBs in each region. We can also use CloudFront to serve our content from any of 34 Edge locations worldwide. Now, from an architectural point of view, we’re nearly bulletproof.
The key to high availability is reducing the number of single points of failure (SPOF), but note that with every step we roughly doubled the architectural requirements, and hence the cost (if they are all running all the time). It is a lot cheaper to add 1% uptime to a 95% SLA than it is to add 0.09% to a 99.9% SLA. Cloud application vendors (SaaS) need to pay very close attention to the additional resources that are invested in order to support a 99.9XX…% uptime SLA, and perhaps build it into their pricing plans.
Amongst all issues, cloud deployment is naturally the most mature subject. Within this, one of the most important aspects is to design for failure by architecting robust deployments to meet your HA needs. Shlomo Swidler’s famous Ten Cloud Design Patterns is a must view along with AWS’ Miles Ward. Netflix (the experts at HA) and SmugMug also have great advice on how they avoided the AWS outage in 2011.
This not a new problem caused by the cloud, but actually a new solution. The Amazon Cloud’s regions are all identical, making it very easy to scale applications across availability zones and regions. This provides new opportunities for highly available applications that were a lot harder to implement with traditional hosting. Just spending a little bit of time learning about how to create HA applications on AWS can greatly increase your uptime and customer satisfaction.
Monitoring Cloud Availability
Datapipe Cloud Reports continuously analyzes your baseline disaster recovery and identifies if best practices have been implemented. Additionally, it recommends AWS features and best practices that can help you reach optimal availability, increase outage protection and ensure a quick recovery. Furthermore, Cloud Reports reviews the status of your instances and identifies unhealthy and overloaded instances that are vulnerable to outages. Having a clear picture of your cloud availability will provide you with the knowledge and tools to protect your cloud from outages and vulnerabilities.
For IT Managers
Datapipe Cloud Reports helps see if you are operating in your usage equilibrium. We also help you make better decisions by giving you visibility into your AWS cloud costs, assets, and risks. Sign up today and enjoy:
- Comprehensive visibility to your cloud operations.
- Resolve problems faster – identify issues and prioritize your activities by severity.
- Save time with actionable operational insights.
About the author:
Originally from the UK, Daniel is now based in Beijing, where he is the CEO and Co-Founder of MadeiraCloud. He has worked as a manager, developer, and designer in London and New York for a wide range of companies, with a focus on IT and telecommunications industries. MadeiraCloud, the Visual Integrated Management Environment for managing applications on the public cloud.
Keywords: amazon cloud services, cloud deployment, amazon elb, amazon elastic load balancer, EC2, RDS, Auto-scaling, HA, High availability, amazon cloud usage, Amazon AWS best practices, aws regions, AWS Availability zones, aws cloud, aws ec2 architecture