(Almost) Endless Space for a Spectacular Price: A Case for Simple Storage Service (S3)
Do you remember the last time you got stuck with a tough decision, like which backup NOT TO DELETE, so you could leave space for something else that suddenly gained a sense of urgency?
Or perhaps you simply wished it could be easy to provision and share files (regardless of size) not only across the local network, but across the whole Internet – at the same time having critical aspects like Availability, Replication, Scalability, Authorization, Auditing, and Authentication covered, but without the need to configure and manage a web server? And what if this web server could have almost infinite disk space?
It’s time to say Goodbye to your storage nightmares, and start day dreaming with Amazon Web Services Simple Storage Service (S3). S3 is here and very real. In this blog we’ll introduce you to the Simple Storage Service and its basic concepts, and provide some guidance regarding its adoption.
So, what is S3?
Launched in 2006, the Amazon cloud Simple Storage Services (S3) is one of the first Amazon Web Services’ products offered. It addresses a very recurring problem, in an elegant fashion: Managing and Serving Objects across the Web, as a Cloud Based Service.
S3 broke the rules by introducing a disruptive business model: S3 consumption charges are calculated from your on-demand resource consumption, metered on a continuous basis, just like a utility bill. This way, there is no need to pick a Service Plan, and you end up paying for what you really used. You might recall back in 2006, Dropbox wasn’t real, and even Virtual Private Servers (VPS) were seldom known, and were expensive. If you wanted to have something similar, you could only use FTP and make sure your URLs did match – Problems that no longer exist with S3.
What does S3 offer to my applications?
AWS S3 features:
- Management of Buckets and Resources (see below), from the API Level, as well as with a friendly UI console
- Full Support for Security, supporting Access Policies for Authentication, Authorization, Auditing and Delegation
- Options to host your data in several geographic regions, but still with replication in a global level
- Full HTTP Support not just for the API, but for the contents as well
- BitTorrent support
- Ability to handle file lengths of up to 5 TB
- A range of storage options, including both Server-Side-Encryption and Reduced Redundancy Storage (a cheaper approach)
It is important to mention what S3 is not:
- S3 is not a database
- It is not a Content Delivery Network
- It is not a file system
Regardless of what S3 is not, there are several solutions that turns S3 into one complete offering, like Apache Hive, CloudFront, and others. We will discuss this further on this article.
How does S3 compare against other solutions?
It depends on what your workload needs. There is a clear tradeoff between speed/price and available resources. Let’s considerthe Memory Pyramid:
Its purpose is to help you decide which kind of memory is best to use, given how many information do you have available. Clearly, S3 is positioned in a situation when you need lots of memory, and pricing and availability becomes a concern, such as for:
- Log Files
- Media Assets (photo collections)
- Online Daily/Differential Backup Storage
Nonetheless, a distinction must be made: While S3 could work as a Content-Delivery Network (CDN), it was not designed as such. If you want CDN functionality to coordinate mass delivery of static assets over a globally distributed network of users, I advise you consider using it via AWS CloudFront, as it expands upon S3, and also implements Reverse Proxy functionality using not only the AWS Data Center infrastructure, but an additional network of available Edge Locations.
[AWS User? Newvem analyzes your cloud usage patterns, highlights its impact on spend, and recommends how to improve your usage in order to increase financial efficiency - Learn how]
How S3 Works?
AWS cloud’s wide range of Services and Concepts might be difficult to grasp to the newcomer. Recognizing its wide nature, they introduced the Simple Icons, which is a set of stencils for their products, introducing a common visual language when describing AWS Services together:
In the picture above, the S3 endpoint services are reached via the AWS Cloud, if I ask for which data I’m keeping there, it will return two buckets I already have: Pets and backups. In AWS, my content (the blue dots) are called “Resources”, and are defined by a Key, which is unique within a S3 Bucket. S3 Bucket keeps a symbolic name, unique to all AWS customer accounts, and it represents a named collection of resources.
Suppose I took a couple of pictures of my pets on my phone and I’d like to share them using S3. As S3 is a Web Service, I will need to call it either as a REST method, or as a SOAP Service in the AWS Cloud. In the Pets bucket, there are three resources. Remember they have a unique key (the “filename,” in this sense).
S3 comes from what Amazon – The store – learned while building high-scalability solutions, reflecting on S3’s internal design. It is based on Dynamo (the “grandfather” of DynamoDB), using an internal network of servers together to address replication and redundancy in order to bring SLA-based aspects into play. The same architecture is also used in SimpleDB and DynamoDB as well. Other than that, almost all AWS Services that include “Elastic” in their names are S3 Clients by definition.
In a bucket, AWS resources might contain slashes in their names, but while they are not directories in a filesystem sense per se, they are only there to make it easier to list child resources. However, the AWS Management Console does a great deal of abstracting of directories via slashes to enhance navigation.
Giving S3 a Try
If you want to just learn basic S3-fu, I suggest you read Amazon Web Services’ Getting Started Guide for S3. It covers the essential 80% usage cases, including source code examples.
Once you understand the basics, you might consider S3 in your daily routine. In most cases, you could combine S3 focused solutions, like s3cmd and Cloudberry Explorer (my favorite) in your DevOps Utility Belt, or simply abstract S3 as a local filesystem (using JungleDisk or s3fs), which I recommend only for personal usage and backups.
About the Author
Aldrin Leal, Cloud Architect and Partner at ingenieux
Aldrin Leal works as an Architect and QA Consultant, specially for Cloud and Big Data cases. Besides his share of years worth between the trenches in projects ranging from Telecom, Aerospatial, Government and Mining Segments, he is also fond with a passion to meet new paradigms and figure a way to bring them into new and existing endeavours.
Keywords: Amazon AWS elastic cloud services, S3 Bucket, Scalability, Performance, Best Practice, CDN, CloudFront, S3 Storage, AWS Management Console, SLA, High Availability, S3 Bucket, AWS Dynamo, AWS Galcier, EBS, Cache Memory