Simple maintenance mode scripts for lighttpd

June 10th, 2008 by Ryan

We recently switched to using lighttpd 1.5. Under lighttpd 1.4, we had a custom 500 page configured for our “maintenance mode”. We’d just take down the fastcgi daemon and if there were any requests while it was down, lighttpd would stop trying to talk to it for 5 seconds and instead serve out our maintenance page. Seemed ok.

Well, with lighttpd 1.5, it doesn’t try to talk to the fastcgi backend again for 60 seconds, and instead of serving back an error it would just leave the socket open, so the user’s browser would essentially hang. Not good. As a result, we wrote a script to swap out our live fastcgi process without dropping any requests (for hot swapping), and we also came up with a real “maintenance mode” for lighttpd (for real downtime like a complicated DB schema upgrade). I’ll share the fastcgi hot-swap script in a future post. Today I’ll discuss our lighttpd maintenance mode.

Our scheme doesn’t require lighttpd 1.5, but it does require that lighttpd be built with LUA support. If you do `lighttpd -V` you should see a line like ‘+ LUA support’. Kevin Worthington has built lighttpd 1.5.0 r1992 RPMs that have LUA/mod_magnet support compiled in.

We got the idea to use mod_magnet like this from John Leach’s blog post on maintenance pages status codes and lighttpd, but we removed all logic from the LUA script and provided a workaround for a lighttpd bug.

The interface to our maintenance functionality is going to be two shell scripts. Turn on maintenance mode with /etc/lighttpd/down and go back live with /etc/lighttpd/up. This simple interface is easy to use, and it abstracts away the exact method of turning things on and off so if we decide to change things later (say, touch and rm a special file or something), we can make the changes in one place.

In lighttpd.conf, we need to load mod_magnet and make sure that our maint.lua script runs for all requests.

# /etc/lighttpd/lighttpd.conf (Sample Code)
server.modules += ( "mod_magnet" )
magnet.attract-raw-url-to = ( "/etc/lighttpd/maint.lua" )

Now I make the maint.lua.up file. It does nothing. We could instead have our script run logic to determine whether or not to serve out the maintenance page, but I don’t really want LUA code running on every request if I can help it. And since I can’t claim to properly know LUA anyway, I want to keep things as simple as possible.

-- /etc/lighttpd/maint.lua.up (Sample Code)
-- This file is deliberately empty.

Now for maint.lua.down. It just serves out /etc/lighttpd/maint.html. Simple, huh? Well, actually, we need to work around lighttpd ticket #1420, so we have that hacky div in there. Oh, and I throw in a header to make it easy to distinguish this planned 503 from other kinds of wild unexpected 503s. /etc/lighttpd/maint.lua.down:

-- /etc/lighttpd/maint.lua.down (Sample Code)
lighty.header["X-Maintenance-Mode"] = "1"
lighty.content = {
    { filename = "/etc/lighttpd/maint.html" },
    "<div style=\"display:none\">",
    <!-- work around lighttpd ticket 1420 -->"
}
return 503

As for /etc/lighttpd/maint.html, put whatever you want in there. That’s your maintenance page.

Now we just need two super simple scripts to swap things out:

#!/bin/bash
# /etc/lighttpd/up (Sample Code)
cp /etc/lighttpd/maint.lua.up /etc/lighttpd/maint.lua
sleep 8
#!/bin/bash
# /etc/lighttpd/down (Sample Code)
cp /etc/lighttpd/maint.lua.down /etc/lighttpd/maint.lua
sleep 8

Why ‘sleep 8’? Well, since we’re going to want to do things like call /etc/lighttpd/down just before bringing down the backend, we want some kind of real guarantee that nobody is going to get the unresponsive browser behavior. We did some tests and it seemed like it took 8 seconds for all the relevant caches to flush so that we consistently got back a 503 from the server. I imagine that would be different for other people.

So that’s it. I hope it’s disappointingly (or perhaps refreshingly) simple. Two simple shell scripts, an exceedingly simple LUA script, and no LUA code to run unless you’re in maintenance mode. The only overhead here is that lighttpd will stat the LUA file occasionally, but it’s good at doing that unobtrusively.

Presto! Move content to S3 with no code changes

May 31st, 2008 by Ryan

Our initial version of Photosleeve stored the full-resolution images locally on our server. Clearly this was a temporary measure, and we’re happy to announce that we’ve moved things to Amazon S3 now. But we did it without changing any of our existing back-end code, which I think is kind of interesting.

We had anticipated the move to S3, so storage was appropriately abstracted in our codebase. My original intention was to swap out “FileStorage” for “S3Storage” and be done. But as I read about S3, I saw that it was important to plan for potential periods of unresponsiveness. For example, the Net::Amazon::S3 CPAN module recommends the use of the “retry” parameter, which will use exponential backoff in the event Amazon cannot be contacted.

Well, my customer just spent several minutes uploading his multi-megabyte full-resolution original image to my server. I don’t want to leave him hanging while I desperately wait for Amazon S3 to respond.

The solution was to leave the back-end code alone. It continued to stash the files someplace locally that our webserver could serve them out as static content. Instead, I wrote a perl daemon that watched the location the back-end dropped the files, and every so often pushed the files up to S3. Only when it was certain the files had been properly transmitted to S3, the daemon would delete the local copies (ok, actually it archived them to another offline location because we’re paranoid and didn’t want to mess up anybody’s photos).

So now the trick was getting our existing “original photo” URLs to serve out local content if available or redirect to S3 if it wasn’t. Well, that should be easy, I just need to find the blog of a rewrite rule wizard, and … Oh, wait. We use lighttpd.

We’re big admirers of lighttpd. With almost no tweaking it handles incredible amounts of traffic with almost no load. Maybe you can get Apache to do that, but we don’t know how and probably don’t have the time to figure it out. With this problem, though, I know Apache’s mod_rewrite would be an easy fix. Well, as easily as Apache rewrite rules ever are, I mean. With lighttpd, we clearly had support for redirects, but we couldn’t express the kind of conditional that we needed to redirect only if the file didn’t exist locally.

Enter mod_magnet. With it and LUA, we were able to write an extremely simple script that does exactly what we want. And — bonus! — I bet just about anybody can understand how it works. (I know rewrite rules are powerful arcane magic, worth learning, but I’ve never found the time and find the syntax completely impenetrable.)

-- /etc/lighttpd/s3.lua (Sample Code)
local filename = lighty.env["physical.path"]
local stat = lighty.stat( filename )
if not stat then
    local static_name = string.match( filename, "static/([^/]+)$" )
    lighty.header["Location"] = "http://s3.photosleeve.com/original/" .. static_name
    return 302
end

Get the filename, ok. Stat the file, sure. If it’s not there, capture a regex match group, and set the location header. Return a 302. Wow. It’s not all on one line, but I sure understand how it works.

Now we just have to hook it up to lighttpd. This does require that LUA support is compiled in. Run `lighttpd -V` and make sure you have the line “+ LUA support”. Kevin Worthington has built lighttpd 1.5.0 r1992 RPMs that have LUA/mod_magnet support compiled in.

# /etc/lighttpd/lighttpd.conf (Sample Code)
server.modules += ( "mod_magnet" )
$HTTP["url"] =~ "^/static/" {
    server.document-root = var.photosleeve-static
    $HTTP["url"] =~ "^/static/[^/]+[.]jpg([?].*)?$" {
        magnet.attract-physical-path-to = ( "/etc/lighttpd/s3.lua" )
    }
}

I specifically only run the LUA code for the precise sort of URLs that I might want to redirect. That should reduce overhead in general. As far as having the redirects in the first place, I don’t think a little less responsiveness is an issue when you’re going to download a multi-megabyte file. And coming through my server also gives me an opportunity to see the request before Amazon. Perhaps later I’ll want to be smart and cache some of the data locally based on traffic trends. Or I could add access control mechanisms (in which case the redirect would contain a signed S3 request). So many cool possibilities. And in the meantime, lighttpd handles the request without bothering my back-end perl processes.

So that’s it. Now our back-end works as it always has, dropping the content locally and generating URLs back to ourselves. But when it’s not looking a sneaky little daemon shifts things around, and the webserver takes care of hiding the mess.

Huh? What does the perl daemon look like? Fodder for another post, I think.