Enable CloudFront for Your Application’s Non-Dynamic Content

Enable CloudFront for Your Application’s Non-Dynamic Content

CloudFront is Amazon’s Content Delivery Network, a service that aims to speed up delivery of content to users in different geographies. It gives developers access to a worldwide infrastructure that minimizes latency by serving content from the edge location closest to the end user.

This article describes two basic use cases utilizing a CDN for non-dynamic content in a typical web application, and provides CloudFront-specific configuration examples.

How does CloudFront work?

CloudFront is an origin-pull Content Delivery Network, acting as a caching reverse proxy. This means that it sits between your end user and your ELB/EC2 (or even non AWS web servers) or S3 bucket (referred to from now on as the Origin). Without getting too technical, the high level idea is that the first time a specific file is requested by a user, CloudFront goes back to the Origin, retrieves it, serves it to the end user, and caches it so that subsequent requests for the same file are faster and with no extra load on the Origin (because the file does not need to be retrieved from Origin again). But CloudFront not only caches the content, it also serves it from the nearest edge location to the end user which translates into reduced latency.

Our example

Let’s assume we have a typical web application that needs to serve static files, e.g., JavaScript and CSS files, as well as images, thumbnails and other types of media.

Serving Images

In our example, images and their thumbnails are stored on a S3 bucket. CloudFront makes it extremely easy to create a new distribution that will use that S3 bucket as its Origin. For example, the following configuration will create a new distribution that will be accessible via a unique CloudFront hostname (e.g. http://d1j27wc9h1fy07.cloudfront.net).


Newvem continuously tracks and analyzes complete resources utilization patterns, and provides a down-to-the-hour picture of your AWS consumption and usage behavior, as well as future capacity estimates.  Learn More


For a more professional result, you can create an alias to the CloudFront hostname (e.g., images.mydomain.com). A file named image1.jpg stored on the above S3 bucket will now be accessible directly from S3 as http://mymediafiles.s3.amazonaws.com/image1.jpg or via CloudFront on the following addresses: http://images.mydomain.com/image1.jpg (or http://d1j27wc9h1fy07.cloudfront.net/image1.jpg).

To achieve better page load times, a common technique is to actually use multiple aliases for the same CloudFront distribution - for example:

media1.mydomain.com
media2.mydomain.com
media3.mydomain.com

This is a trick that bypasses the throttling applied by most modern web browsers to limit the number of files loaded in parallel by a single hostname. You can read more about it here:
http://developer.yahoo.com/performance/rules.html#split

For this to work, you need to make sure the above CNAME entries have been set in your DNS configuration (e.g. ROUTE53) and are pointing to the domain name assigned by CloudFront to your distribution.

Obviously, you need to make your application aware of the above scheme and introduce logic that refers to each file via one of the above aliases.

Serving CSS and JavaScript Files

For the sake of our example, let’s assume that your CSS and JavaScript files are stored on the web servers and served via their own hostnames (e.g., static.mydomain.com) through Amazon’s Elastic Load Balancer.

Here we would set up a new CloudFront distribution with its origin set to static.mydomain.com.

 

Content Expiry

We mentioned that CloudFront also acts as a caching proxy, but for how long does it cache the various files before it tries to retrieve them from the Origin again?

By default, CloudFront objects will expire after 24 hours, but CloudFront will also respect the Cache-control max-age directive or the Expires header if you set them on the Origin. So, how can you do that?

If your Origin(s) are running Apache, you can add the following to your .htaccess file to set 30 days as the value of the max-age directive:

<filesMatch “\.(css|js|jpg|jpeg|png|gif)$”>
Header set Cache-Control “max-age=2592000, public”
</filesMatch>

If your Origin is an S3 bucket, you can also specify cache control headers for your objects as described here:

http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html

Alternatively, you can specify a minimum time that CloudFront keeps an object in cache overriding smaller values that might be set at the Origin (or the default of 24 hours if your Origin does not set cache control directives.)

 

The above are not only important for the optimal use of CloudFront but can also impact the performance for returning visitors to your web site, since the same parameters govern how browsers cache your objects. If your objects rarely change it is a good practice to set long expiry values.

Invalidating items before they expire

Because JavaScript and CSS files are part of your (hopefully version controlled) application code, there is a problem every time you modify your application. In that case, old versions of the modified files are still (i) on the local cache of a returning visitor’s browser and (ii) on the CDN itself. As such you need a method to easily invalidate them every time you deploy a new version of your application, without compromising the long expiry values advised in the previous paragraph.

CloudFront’s “Forward Query parameters” functionality is very useful here. By enabling the above configuration option and adding a dummy query string parameter to the URL referring to each of our CSS/JavaScript files, you can essentially “break” client and CDN caching whenever you want to and without having to wait for the content to expire.

 

So let’s assume you refer to a CSS file like this:

<link rel=”stylesheet” type=”text/css” href=”homepage.css?version=592″ />

on your next deployment you can increment the dummy version identifier:

<link rel=”stylesheet” type=”text/css” href=”homepage.css?version=593″ />

With the Forward Query parameters option, CloudFront “sees” the above as two different files and will serve the correct version to the end users.

Of course, you don’t want to have to manually go to each part of your code where that CSS file is referenced. It is a good idea to embed this logic to your application framework and turn this version id into a global configuration variable. Then, you only need to make a single change in a config file every time you deploy (or even better, automate this as a post-deployment hook that increments the value).

For example:

<link rel=”stylesheet” type=”text/css” href=”homepage.css?version=<?php echo $version_id;?>” />

 

What we described above is applicable to most web applications out there. For more advanced scenarios, it might be worth checking out CloudFront’s advanced features for dynamic content, which promise to accelerate delivery of dynamic and personalized content as well.


Reduced Redundancy Storage (RRS) and Glacier Opportunities: Newvem S3 analysis helps identify storage migration opportunities and supports migration actions. Use it for Free!


About the Author:

Andreas Chatzakis

Andreas Chatzakis

Andreas is the CTO and co-founder of Spitogatos.gr / HomeGreekHome.com (a high traffic real estate portal in Greece). His background includes 5 years of consulting @ Accenture NL and he is the organizer of Greece’s AWS Usergroup.

Contact Him

Keywords: Amazon AWS elastic cloud services, Content Delivery Network, Best Practice, CDN, CloudFront, ElasticCache, S3 Storage, Elastic Load Balancer, Caching.

There are 2 comments .

Gianfranco Palumbo —

Is it really that useful to hold the images, css and js on S3 buckets instead of the webserver if they are being cached with CloudFront?

It seems difficult to manage new versions of .css and .js with cloudfront, as you have to manually invalidate this objects

    Andreas Chatzakis —

    Sure the above is just an example. With cloudfront you can define whatever origin is more convenient and you could indeed continue storing css/js files on origin servers instead of S3 if you prefer. And if your files have long expiry values your web servers would not receive significant load for serving those files either way. 

    S3 still has several benefits (e.g. high availability & data durability built in, no need to deploy same files on multiple web servers etc) so many people chose to use it in the above use case (and not only for user uploaded content).

    As for the invalidation you are right. This is a challenge you are facing with any type of caching on any part of your stack (memcache, reverse caching proxy, CDN, browser cache etc) anytime you need to invalidate a cached entry before its expiration.
     
    If you built your app around it though (e.g. make the cache-breaking url parameter a configurable variable) you can semi or fully automate this as part of your deployment process or script.

You must be to post a comment.

* As a bonus, you'll receive our weekly newsletter!

Hitchhiker's Guide to The Cloud

Newvem's eBook for Cloud Operations