Better "Origin Response" Function For Cloudfront Hosted Static Pages

Published by Karol Majta on 17th Apr 2018

Tiny Endian

Recently I've seen more and more static sites like blogs and using company landing pages using S3 behind CloudFront as their hosting infrastructure. While CloudFront + S3 combo is very cost efficient and for usual use-cases never bends under heavy traffic, setting it up correctly is not trivial. If you're using a static page generator like Pelican, Jekyll or Hugo or any kind of isomorphic single page application and plan to host it using AWS S3/CloudFront combo, this might be a read for you.

What comes out of the box?

With S3 bucket alone configured to act as a HTTP server you get a lot of bang for your buck. The server is pretty reliable and allows you to set a name for intex document. This will make files like /index.html or /article/some-title/index.html discoverable under nice urls like / or /article/some-title/ respectively.

However, the longer you use S3, the more limitations you'll discover, and for some use cases they are simply prohibitive. Firstly - you cannot have your content served over https. While for tech savvy users this might be just fine, your customers might be disturbed by lack of the "green lock". It just doesn't boost the trust for your brand. On top of that, if your site starts experiencing traffic from around the world, users might start to experience significant latency caused by the fact, that S3 buckets are physically located in single AWS datacenter.

That's when you decide it's time to move to CloudFront.

Overcoming the initial bumps

The first discovery that most people make when moving from S3 to CloudFront is that directory indexes stop working. This case is so common that there even is an official AWS Blog writeup on this topic.

While said article does great job of explaining how to use Lambda@Edge with CloudFront, it only provides a bare minimum of what should be a sane setup for a static page. Namely, it will make nice urls like /article/some-title/ correctly point to index.html files that they contain.

The static page setup

For most serious static pages, just having nice canonical urls is not enough. It's just one of a few points that you want to accomplish to have a SEO friendly setup:

  1. Your site should be browsable using "user friendly" urls as well as regular file urls, which means that resources (i.e. images) and non-index html files should be available only under urls like /some/path/file.png or /some/article.html but index html files should be accessible at both /some/article/index.html as well as "user friendly" /some/article.

    This is the only point that is covered in the official AWS tutorial.

  2. Even though you only publish canonical urls ending with a slash like /my/article/ it is highly likely that either web crawlers or users sharing links will start using urls in form /my/article. It would be a great shame if such traffic encountered 404 Not Found errors instead of reaching your content. To sum up:

    Urls like /some/example/path should issue a HTTP 301 (moved permanently) status code with Location header set to /some/example/path/.

  3. Sooner or later most static sites will require handling of query parameters, should it be for analytical or affiliate/campaign tracking purposes. While CloudFront cache is by default agnostic towards query parameters (requests for /example?query=123 will be routed for /example in the origin server), it is possible to craft the Lambda@Edge origin request function so that redirects from /example?query=123 correctly forward the query parameters to /example/?query=123.

The proper "Origin Request" Lambda

Before we dissect the lambda code in depth I'll just paste it verbatim here for the more impatient readers. It is also available at https://gist.github.com/karolmajta/6aad229b415be43f5e0ec519b144c26e:

'use strict';

const pointsToFile = uri => /\/[^/]+\.[^/]+$/.test(uri);

exports.handler = (event, context, callback) => {

  // Extract the request from the CloudFront event that is sent to Lambda@Edge
  const request = event.Records[0].cf.request;

  // Extract the URI from the request
  const oldUri = request.uri;
  const newUri;

  if (!pointsToFile(oldUri) && !oldUri.endsWith('/')) {
    const newUri = request.querystring ? `${oldUri}/?${request.querystring}` : `${oldUri}/`;
    return callback(null, {
      body: '',
      status: '301',
      statusDescription: 'Moved Permanently',
      headers: {
        location: [{
          key: 'Location',
          value: newUri,
        }],
      }
    });
  } else {
    newUri = oldUri;
  }

  // Match any '/' that occurs at the end of a URI. Replace it with a default index
  newUri = newUri.replace(/\/$/, '\/index.html');

  // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
  console.log("Old URI: " + oldUri);
  console.log("New URI: " + newUri);

  // Replace the received URI with the URI that includes the index page
  request.uri = newUri;

  // Return to CloudFront
  return callback(null, request);

};

Dissecting the code

First we define a function testing if given path points to a file. It's a bit rudimentary, and assumes that everything that has no extension is a file - but this is a simplification we can afford in most use cases:

const pointsToFile = uri => /\/[^/]+\.[^/]+$/.test(uri);

Inside the handler itself, we detect whether path points to a file. If it does, or if the path ends with trailing slash we perform no action. Otherwise (so if the path is something like /example?q=123) we issue a redirect to canonical path ending with slash, remembering to account for the query parameters:

const oldUri = request.uri;
const newUri;

if (!pointsToFile(oldUri) && !oldUri.endsWith('/')) {
  const newUri = request.querystring ? `${oldUri}/?${request.querystring}` : `${oldUri}/`;
  return callback(null, {
    body: '',
    status: '301',
    statusDescription: 'Moved Permanently',
    headers: {
      location: [{
        key: 'Location',
        value: newUri,
      }],
    }
  });
} else {
  newUri = oldUri;
}

Last but not least we repeat the work shown by original AWS tutorial, making our distribution behave as if it was a standard http server returning index.html file as directory index.

// Match any '/' that occurs at the end of a URI. Replace it with a default index
newUri = newUri.replace(/\/$/, '\/index.html');

// Replace the received URI with the URI that includes the index page
request.uri = newUri;

Other considerations

Sometimes the request.querystring might not be present on your CloudFront request object. For it to function properly you need to configure your distribution's Query String Forwarding and Caching (accessible in Behaviors tab) to be either "Forward all, cache based on all" or "Forward all, cache based on whitelist". The latter is a bit more efficient, however it's a judgement call which one you should use - it should not contribute much to cost, whichever you pick.

Wrapping things up

Described lambda will give you a good SEO kickstart when hosting a static page on CloudFront. Of course it can be further improved to meet more particular needs (if you have any ideas how to do it please give us a shout @TinyEndianLtd or just by forking our gist). Just remember to set up you CloudFront query param caching correctly and evolve your setup from there.