Case Study: PostgreSQL DB Replication Between AWS Regions
In this article I describe how we created a redundant PostgreSQL database on the Amazon cloud using EBS snapshots as backups to deploy a PostgreSQL DB server DR mobile application for one of our customers.
PostgreSQL 9.1 includes new capabilities for asynchronous fast replication syncing between master and slaves. The master server streams new data to the current available slave. This version includes great improvements that generated significant fast WAL (Write Ahead Log) processing, which generates replication and fast launching capabilities for the slave servers.
One of main lessons learned from the famous Amazon cloud outage in April 2011 was that there are interdependencies between AWS cloud AZs that are located in the same region. We also learned that Amazon maintains a strict separation between its regions’ data centers to present totally separated clouds. To gain availability and be able to present a competitive SLA, you might want to plan cross-regions and even cross-cloud DR architecture. We implemented cross-regions as well as cross-clouds (maintaining replication on Rackspace) in order to fulfill a strict SLA for one of our customers’ mobile applications. Note: Don’t forget that costs play a huge role in how you implement and maintain your application DR cloud architecture.
Check out Newvem Availability Insight – Servers are not Balanced across Multiple Availability Zones
The diagram below illustrates our implementation of PostgreSQL MASTER-SLAVE binary replication between AWS regions (supported byPostgreSQL version 9.1). With the PostgreSQL new version improvements, you can now have multiple SLAVE DB replicas on other cloud servers. The basic requirements for this are:
All master and slaves must be running PostgreSQL 9.1.
All master and slaves must be on the same architecture – i.e., all on 64 or 32 bit systems.
- The whole master server has to be replicated – you can’t replicate just one table.
We use Amazon EBS Snapshots to quickly set up a new slave. Our master DBs transfer data to backup slaves that include frequent EBS snapshots; from these snapshots, we can quickly generate a new read-slave and then change the role to master. To maintain a cross-cloud DR, we simply maintain backup images. One backup slave has been set up on Rackspace for an off-cloud provider DATA backup solution. The new data is replicated in real-time. This type of replication is WAN resistant (a lag in the network will not affect the performance of the master server).
PostgreSQL DB 9.1 replication capabilities enable having a reliable disaster recovery solution. In a case of a disaster, for example, if all AWS on the east coast (AWS Virginia DC) are affected and there is uncertainty regarding the time required for full AWS service recovery, we can re-launch the whole server infrastructure and all related operations within around 30 minutes on the west coast (Oregon DC) – without losing data or performance.
Data transfer is an important cost factor. Because the AWS regions and the Rackspace cloud are totally independent clouds, it is crucial to measure the data outgoing traffic in order to control its cost. For this purpose we use the Nagios` Check_traffic_limit plugin to predefine daily and monthly traffic limits for the relevant servers. Nagios generates email alerts if, for any reason, our EC2 servers generate unusually high outgoing traffic.
We can assume that this configuration covers availability of at least 99.9% (“three nines”) (maximum downtime of about 43 minutes per month) while maintaining an acceptable balance between cost and availability.Maintaining availability is a really crucial point – especially when it comes to the cloud. The cloud service provider must maintain ongoing improvement iterations while having great visibility to the environment.
I invite you to send me your thoughts and feedback. Also, feel free to contact me for more information and help with your cloud DR plan and implementation.
Keywords: amazon aws case study, cloud case study, postgreSQL DB replication, aws regions, EBS snapshots, aws cloud.
About the Author
Aleksandar N. , IT Consultant, Cloud & Linux Expert at WhiteCity Soft LLC.
Aleksandar is responsible for stage and production environment monitoring, performance, deployment, security, reliability, availability, and capacity. He work closely and effectively with development teams and have an extensive experience with organizing and moving system operations and development process to Amazon AWS and Rackspace clouds.
Specialties: Linux Servers, Cloud environment administration and management,Smart Startup, Web Business and Enterprise Cloud Computing Solutions, High Performance Sites.