5 Challenges for an AWS EC2 Backup Solution
When you keep important business data in your EC2 servers, you need a backup and disaster recovery (DR) solution like you would for any server. For operational backup, the most efficient and effective approach in EC2 is to use EBS snapshots. These are the parallel of hardware snapshots in a traditional data center.In this post we will discuss the challenges you encounter when you use EBS snapshots.
EBS snapshots are provided by AWS as infrastructure ability. You can run a snapshot on a volume by using the AWS management console or by using the API. There are a few scripts circulating on the web that can perform snapshots even while freezing the XFS file system. Some users use them with “cron” jobs in Linux or Windows task scheduler to automate the backup process. There are not many other solutions available.
Although you can use any software backup solution, snapshots are faster and they don’t affect your instance’s performance. EBS snapshots are incremental, which positively affects not only their performance, but also the cost of storing them. Since they are light, you can take snapshots much more frequently than you would with a file-based backup solution. The snapshots are stored in S3, which gives them excellent durability. Furthermore, AWS recently released a new feature, “EBS Snapshot Copy,” which allows the copying of EBS snapshots between regions, giving them extra value as a disaster recovery solution as well as a solution for migrating volumes between regions.
[Newvem analyzes your baseline disaster recovery (DR) status, reflecting how well AWS DR best practices have been implemented, and recommends AWS features and best practices to reach optimal availability, increase outage protection, and quick recovery. Learn more about Newven]
Let’s look at the challenges of EBS snapshots backup:
1 - Management
Management of snapshots becomes a challenge in larger cloud environments. When the number of snapshots you manage reaches the hundreds (or even less), it is important to not lose your way. If you have many snapshots and it’s difficult to distinguish which snapshots need to be deleted and which snapshots to use for a critical recovery operation, you have a problem. If you are leaving many snapshots because you’re not sure they should be deleted, it’s not only confusing; it needlessly increases your AWS bill. What if you have different instances that require different backup policies? What if some instances need to be backed up daily and keep data available for 30 days while other instances need to be backed up every 2 hours and keep data available for 2 weeks? When something changes in the environment (e.g. A new instance, a new volume on an instance etc…), it is important to be able to easily adjust.
2 - Frequent Backup
If you are able to manage and control an environment with many snapshots, you could easily take snapshots much more frequently without the risk that you will get lost under an unmanageable mountain of snapshots. When you take frequent snapshots, you reduce risk by ensuring that when in need of recovery, the data will be recent, and thus you minimize your RPO (Recovery Point Objective).
3 - Monitoring (or Visibility)
Another extension of the ability to manage a large and complex environment is to be able to monitor your backup and make sure everything is going as planned. You need to be sure you know what’s really going on. If a backup on some instance stops running for some reason, when will you find about it (hopefully before a recovery is necessary…)? You need to be able to quickly and easily determine that all your snapshots are running correctly and according to plan. If your database doesn’t freeze and snapshots are not consistent, you want to know about it as well as have an easy way to understand what went wrong. If something has changed in your environment you need to find out and correct backup configuration as soon as possible with minimal difficulty.
[Newvem analyzes your EBS volume and snapshot usage patterns to help you increase control and enhance your backup policies. Learn more about Newvem's features]
4 - Application Support
EBS snapshots are by nature “crash-consistent.” They present the image of the volume exactly as it was at the point-in-time of the snapshot, but there are no guarantees on what was the state of the system and applications at that time. It is exactly like the image of a physical disk at the moment that someone pulled out the power cord. When later booting the system, you can’t guarantee that everything will work correctly. In many cases, you will want to move your application into “backup mode”, or what is also called “application quiescence.” This means you want to “tell” your application that it’s about to be backed up just before you take a snapshot. Your application will then momentarily “freeze” its activity (Make sure data on the volume is consistent by flushing caches, closing files etc). At this point, the snapshot/s will start. Immediately after they start, you need to “thaw” or “unfreeze” your application and allow them to continue working. This freezing time is usually not more then 1-2 seconds and allows applications to continue working and serving requests without the need to fail or close any sessions or connections.
Sometimes you will want to perform additional operations on your applications after the backup has finished consistently and successfully. This is a good time to perform transaction logs truncation on a database and prevent the logs from growing unnecessarily and consuming excessive storage space. You want your backup solution to be flexible enough to support such operations easily and to support different kinds of environments and applications. A small list of applications that are supported include MySQL, PostGreSQL, MongoDB, XFS (a file system can be viewed as an application) and probably VSS (Volume Shadow Service) for Windows servers, to support SQLServer, SharePoint etc.
5 - Rapid and Easy Recovery
Backup solutions are usually good enough when you don’t need to perform recovery. They’re like insurance policies: they’re only put to the test when something bad happens. When you need to recover data, you will not want to start looking through dozens or hundreds of snapshots in your AWS console to figure out which you need to use. You don’t want to start thinking which snapshots belong to which volumes and instances and what would be the best ones to choose. You also don’t want to start digging to find what configuration (e.g. instance type, security groups, key pairs, tags) you need to launch your instances, that may have even crashed and no longer exists. Maybe you need to get the most recent consistent backup because the freeze failed? Maybe you need the most recent consistent backup from last Wednesday? There are many possible scenarios. When the situation is stressful, you don’t want to make mistakes which are highly probable with manual processes. In the “moment of truth,” when you need to recover critical business data, you want to perform recovery as quickly as possible and without making mistakes.
To answer all the challenges in managing a backup in an EC2 environment, you need the right solution. Such a solution should allow you to easily manage and monitor the backup of all your instances and volumes, as well as support consistent backups, like file system freezing and consistent application backup. It should be easily managed and deployed and provide automatic retention management (deletion of old snapshots) to help you easily and quickly recover complete instances and separate volumes.
[Newvem analytics tracks your AWS cloud utilization:
- Hourly Utilization Pattern Analysis
- Reserved Instances Decision Tool
- Resource Resizing Opportunities
About the Author
Uri Wolloch is the founder & CTO of N2W Software. He has over 15 years of software development experience working at various companies in different roles. In the past 10 years, Uri’s professional focus has been on IT infrastructure software and storage. Uri has worked as a software architect at IBM Tivoli focused on data protection software in physical and virtual environments. In 2011, he founded N2W Software, a company providing IT solutions for cloud environments. N2W Software’s new solution: Cloud Protection Manager (CPM), is a comprehensive backup and recovery solution for Amazon EC2 fit for enterprise organizations as well as smaller companies.
Cloud Protection Manager (CPM), allows EC2 users to use EBS snapshots as their mean of backup and recovery of EC2 instances, and will provide a comprehensive backup & recovery solution that even an enterprise can use.
Keywords: Amazon web services, Amazon AWS console, Amazon Cloud Services, Cloud Scalability, Cloud Performance, AWS Console, EC2, S3 Bucket, Glacier, Backup, Disaster Recovery, AWS DR, EC2 Backups, Cloud Outage, Compliance, Regulations, RTO, RPO, Continuity, High Availability, Cloud outage