Enable CloudFront for Your Application’s Non-Dynamic Content
CloudFront is Amazon’s Content Delivery Network, a service that aims to speed up delivery of content to users in different geographies. It gives developers access to a worldwide infrastructure that minimizes latency by serving content from the edge location closest to the end user.
This article describes two basic use cases utilizing a CDN for non-dynamic content in a typical web application, and provides CloudFront-specific configuration examples.
How does CloudFront work?
CloudFront is an origin-pull Content Delivery Network, acting as a caching reverse proxy. This means that it sits between your end user and your ELB/EC2 (or even non AWS web servers) or S3 bucket (referred to from now on as the Origin). Without getting too technical, the high level idea is that the first time a specific file is requested by a user, CloudFront goes back to the Origin, retrieves it, serves it to the end user, and caches it so that subsequent requests for the same file are faster and with no extra load on the Origin (because the file does not need to be retrieved from Origin again). But CloudFront not only caches the content, it also serves it from the nearest edge location to the end user which translates into reduced latency.
In our example, images and their thumbnails are stored on a S3 bucket. CloudFront makes it extremely easy to create a new distribution that will use that S3 bucket as its Origin. For example, the following configuration will create a new distribution that will be accessible via a unique CloudFront hostname (e.g. http://d1j27wc9h1fy07.cloudfront.net).
Newvem continuously tracks and analyzes complete resources utilization patterns, and provides a down-to-the-hour picture of your AWS consumption and usage behavior, as well as future capacity estimates. Learn More
For a more professional result, you can create an alias to the CloudFront hostname (e.g., images.mydomain.com). A file named image1.jpg stored on the above S3 bucket will now be accessible directly from S3 as http://mymediafiles.s3.amazonaws.com/image1.jpg or via CloudFront on the following addresses: http://images.mydomain.com/image1.jpg (or http://d1j27wc9h1fy07.cloudfront.net/image1.jpg).
To achieve better page load times, a common technique is to actually use multiple aliases for the same CloudFront distribution – for example:
This is a trick that bypasses the throttling applied by most modern web browsers to limit the number of files loaded in parallel by a single hostname. You can read more about it here:
For this to work, you need to make sure the above CNAME entries have been set in your DNS configuration (e.g. ROUTE53) and are pointing to the domain name assigned by CloudFront to your distribution.
Obviously, you need to make your application aware of the above scheme and introduce logic that refers to each file via one of the above aliases.
Here we would set up a new CloudFront distribution with its origin set to static.mydomain.com.
We mentioned that CloudFront also acts as a caching proxy, but for how long does it cache the various files before it tries to retrieve them from the Origin again?
By default, CloudFront objects will expire after 24 hours, but CloudFront will also respect the Cache-control max-age directive or the Expires header if you set them on the Origin. So, how can you do that?
If your Origin(s) are running Apache, you can add the following to your .htaccess file to set 30 days as the value of the max-age directive:
Header set Cache-Control “max-age=2592000, public”
If your Origin is an S3 bucket, you can also specify cache control headers for your objects as described here:
Alternatively, you can specify a minimum time that CloudFront keeps an object in cache overriding smaller values that might be set at the Origin (or the default of 24 hours if your Origin does not set cache control directives.)
The above are not only important for the optimal use of CloudFront but can also impact the performance for returning visitors to your web site, since the same parameters govern how browsers cache your objects. If your objects rarely change it is a good practice to set long expiry values.
Invalidating items before they expire
So let’s assume you refer to a CSS file like this:
<link rel=”stylesheet” type=”text/css” href=”homepage.css?version=592″ />
on your next deployment you can increment the dummy version identifier:
<link rel=”stylesheet” type=”text/css” href=”homepage.css?version=593″ />
With the Forward Query parameters option, CloudFront “sees” the above as two different files and will serve the correct version to the end users.
Of course, you don’t want to have to manually go to each part of your code where that CSS file is referenced. It is a good idea to embed this logic to your application framework and turn this version id into a global configuration variable. Then, you only need to make a single change in a config file every time you deploy (or even better, automate this as a post-deployment hook that increments the value).
<link rel=”stylesheet” type=”text/css” href=”homepage.css?version=<?php echo $version_id;?>” />
What we described above is applicable to most web applications out there. For more advanced scenarios, it might be worth checking out CloudFront’s advanced features for dynamic content, which promise to accelerate delivery of dynamic and personalized content as well.
Reduced Redundancy Storage (RRS) and Glacier Opportunities: Newvem S3 analysis helps identify storage migration opportunities and supports migration actions. Use it for Free!
About the Author:
Andreas is the CTO and co-founder of Spitogatos.gr / HomeGreekHome.com (a high traffic real estate portal in Greece). His background includes 5 years of consulting @ Accenture NL and he is the organizer of Greece’s AWS Usergroup.
Keywords: Amazon AWS elastic cloud services, Content Delivery Network, Best Practice, CDN, CloudFront, ElasticCache, S3 Storage, Elastic Load Balancer, Caching.