Presto! Move content to S3 with no code changes

May 31st, 2008 by Ryan

Our initial version of Photosleeve stored the full-resolution images locally on our server. Clearly this was a temporary measure, and we’re happy to announce that we’ve moved things to Amazon S3 now. But we did it without changing any of our existing back-end code, which I think is kind of interesting.

We had anticipated the move to S3, so storage was appropriately abstracted in our codebase. My original intention was to swap out “FileStorage” for “S3Storage” and be done. But as I read about S3, I saw that it was important to plan for potential periods of unresponsiveness. For example, the Net::Amazon::S3 CPAN module recommends the use of the “retry” parameter, which will use exponential backoff in the event Amazon cannot be contacted.

Well, my customer just spent several minutes uploading his multi-megabyte full-resolution original image to my server. I don’t want to leave him hanging while I desperately wait for Amazon S3 to respond.

The solution was to leave the back-end code alone. It continued to stash the files someplace locally that our webserver could serve them out as static content. Instead, I wrote a perl daemon that watched the location the back-end dropped the files, and every so often pushed the files up to S3. Only when it was certain the files had been properly transmitted to S3, the daemon would delete the local copies (ok, actually it archived them to another offline location because we’re paranoid and didn’t want to mess up anybody’s photos).

So now the trick was getting our existing “original photo” URLs to serve out local content if available or redirect to S3 if it wasn’t. Well, that should be easy, I just need to find the blog of a rewrite rule wizard, and … Oh, wait. We use lighttpd.

We’re big admirers of lighttpd. With almost no tweaking it handles incredible amounts of traffic with almost no load. Maybe you can get Apache to do that, but we don’t know how and probably don’t have the time to figure it out. With this problem, though, I know Apache’s mod_rewrite would be an easy fix. Well, as easily as Apache rewrite rules ever are, I mean. With lighttpd, we clearly had support for redirects, but we couldn’t express the kind of conditional that we needed to redirect only if the file didn’t exist locally.

Enter mod_magnet. With it and LUA, we were able to write an extremely simple script that does exactly what we want. And — bonus! — I bet just about anybody can understand how it works. (I know rewrite rules are powerful arcane magic, worth learning, but I’ve never found the time and find the syntax completely impenetrable.)

-- /etc/lighttpd/s3.lua (Sample Code)
local filename = lighty.env["physical.path"]
local stat = lighty.stat( filename )
if not stat then
    local static_name = string.match( filename, "static/([^/]+)$" )
    lighty.header["Location"] = "" .. static_name
    return 302

Get the filename, ok. Stat the file, sure. If it’s not there, capture a regex match group, and set the location header. Return a 302. Wow. It’s not all on one line, but I sure understand how it works.

Now we just have to hook it up to lighttpd. This does require that LUA support is compiled in. Run `lighttpd -V` and make sure you have the line “+ LUA support”. Kevin Worthington has built lighttpd 1.5.0 r1992 RPMs that have LUA/mod_magnet support compiled in.

# /etc/lighttpd/lighttpd.conf (Sample Code)
server.modules += ( "mod_magnet" )
$HTTP["url"] =~ "^/static/" {
    server.document-root = var.photosleeve-static
    $HTTP["url"] =~ "^/static/[^/]+[.]jpg([?].*)?$" {
        magnet.attract-physical-path-to = ( "/etc/lighttpd/s3.lua" )

I specifically only run the LUA code for the precise sort of URLs that I might want to redirect. That should reduce overhead in general. As far as having the redirects in the first place, I don’t think a little less responsiveness is an issue when you’re going to download a multi-megabyte file. And coming through my server also gives me an opportunity to see the request before Amazon. Perhaps later I’ll want to be smart and cache some of the data locally based on traffic trends. Or I could add access control mechanisms (in which case the redirect would contain a signed S3 request). So many cool possibilities. And in the meantime, lighttpd handles the request without bothering my back-end perl processes.

So that’s it. Now our back-end works as it always has, dropping the content locally and generating URLs back to ourselves. But when it’s not looking a sneaky little daemon shifts things around, and the webserver takes care of hiding the mess.

Huh? What does the perl daemon look like? Fodder for another post, I think.

Tags: , , , ,