Getting Started with Amazon Glacier Part -1
AWS S3 is one of the largest data storage solutions on internet, holding more than 1 trillion objects. It is highly reliable and provides scalable options for storing objects. AWS S3 is ideally suited when for frequent usage of data, e.g., uploading, sharing or accessing images, document files, static media, videos, and so on. Along with CloudFront, S3 will provide low latency data.
The below cases are good example scenarios for storing content on S3. S3 will provide unlimited, scalable, reliable and secure data storage These scenarios share a few common factors: (1) large size storage, (2) the storage size keeps growing but needs to be accessed relatively infrequently, (3) the storage must be secure and reliable.
Now consider a few cases:
- You have some data and your company or government rules require you to store it for longer periods. This data may never be accessed or viewed, unless there is a planned audit or verification need.
- You have a huge mail archive, which is required for reference only. At more than 1 TB in size, it is a huge in-house storage cost for your organization because you have to back it up, and ensure its availability and data security.
- You have a huge stack of guidance documents and they are required only once a year for reference.
- You have documents of former (retired/resigned) employees. These documents are for internal HR system references. They might be used only when an employee is rehired or when a reference check is required.
- You have a big SCM (Suply chain management) system, which has millions of transactions every day. Your system is generating thousands of POs every day and you need to archive and store them to some storage system for auditing at the end of each month.
[Usage Configuration and Policy Analysis - Newvem S3 analytics helps you define, configure, implement and validate your storage policies. Use Newvem to validate your S3 storage structure and policies. Learn more about Newvem's analysis features]
With the above three factors in mind, AWS has introduced a new offering called Amazon Glacier. Think Glacier – a large persistent body of ice, which is accumulated over many years or centuries. AWS Glacier follows the similar concept and is ideally suited when you need to store your data files/objects, collected over years, in a reliable place.
Amazon Glacier is an extremely low-cost storage service that provides secure, durable, and flexible storage for data backup and archiving. It enables customers to offload the administrative burdens of operating and scaling storage to AWS, so that they don’t have to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time-consuming hardware migrations.
So you ask, if S3 provides similar functionality, then why offer AWS Glacier as an additional option? Glacier is a very cost effective option for storing objects that are less accessed but stored for longer periods of time. Here is a brief comparison between S3 and Glacier:
Reference Point | AWS S3 | AWS Glacier |
Storage Cost | $0.125/GB up to 1 TBFrom 50 – 450 TB costs $0.095/GB | $0.01 per GB for all sizes [almost 10 times less than S3] |
Data Request Cost | PUT and GET Requests $0.01 per 1,000 requests | UPLOAD and RETRIEVAL Requests: $0.050 per 1,000 requests |
List and other requests $0.01 per 1,000 requests(delete is free) | LISTVAULTS, GETJOBOUTPUT, DELETE: No Cost (free up to 5% of monthly storage) | |
Max Size of single object | 5 TB | 40 TB (using multipart upload) |
Storage Method | Storage in Unique Bucket | Stores archives (objects) in Vault. |
Number of buckets/vaults | Max 100 in US Standard Region | Max 1000 vaults/region |
Access (Data retrieval) | In seconds | In 3-5 hours |
Best suitable | Frequently Accessible Objects | Rarely accessible objects but for longer storage period |
Based on the above, we tried to compare estimates of storing data in Glacier and S3 using the AWS Calculator:
For example, the following calculation is for S3 Storage Cost (without free usage):
And the following calculation is for AWS Glacier (without free usage):
Based on these calculations, AWS Glacier is a clear winner from the cost perspective. This does not mean we can use Glacier instead of S3. Data Object access is important. Glacier lines up objects for retrieval only after 3-5 hours. If you want to be able to immediately access objects, then S3 is the solution for you. Amazon Glacier is a solution for an organization to easily and cost effectively retain data for months, years, or decades.
Some basic features of Amazon Glacier -
Vault:
A vault is a container for storing archives. When you create a vault, you specify a name and select an AWS region where you want to create the vault. Each vault resource has a unique address like https://<region-specific endpoint>/<account-id>/vaults/<vaultname>
So a vault is like a bucket but, unlike a bucket, it doesn’t have to have a unique name across S3 because it is named and accessed per account.
Archive:
An archive is a base unit of storage in Amazon Glacier. It can be any data (such as a photo, video, or document). Each archive has a unique ID and an optional description. Archive IDs are unique within a vault.
Job:
Retrieving an archive and vault inventory (list of archives) are asynchronous operations in Amazon Glacier. First you initiate a job, and then download the job output after Amazon Glacier completes the job. With Amazon Glacier, your data retrieval requests are queued and most jobs take about four hours to complete.
Notification:
Because jobs take hours to complete, Glacier has integrated AWS SNS with jobs. You can set one notification per vault to receive the information like Job completion status.
Supported regions:
Currently, Glacier is supported in only five regions: US East, US-West-1, US-West-2, EU and Asia Pacific (Tokyo).
Now that you are a bit acquainted with Glacier, we will delve further into Glacier Features in the next part.
[Reduced Redundancy Storage (RRS) and Glacier Opportunities - Newvem S3 analysis helps identify storage migration opportunities and supports migration actions. Learn more about Newvem]
About the Author
Taral Shah
Cloud architect for more than 2 years with around 12 years of IT Experience. His area of focus is Amazon Cloud and I have written a couple of White papers using AWS. Responsible for designing or migrating HA, scalable application on Cloud. In his past worked as Consultant, Developer, Technical Leader, Project Leader and Account Manager with various global clients.
Contact him | Linkedin
Keywords: Amazon web services, Amazon AWS console, AWS S3, Amazon Cloud Services, AWS Management Console, AWS Glacier, S3 Standard Storage, IAM, Amazon Glacier, S3 Usage, Access Policy, RRS Storage, Storage Objects, Archive, Restore, Durability, Data Access, S3 Storage Cost, Archive, Durability
There are 3 comments .