Django, lighttpd and FCGI, second take

6/2005
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
8/2005

27. July 2005

In my first take at this stuff I gave a sample on how to run django projects behind lighttpd with simple FCGI scripts integrated with the server. I will elaborate a bit on this stuff, with a way to combine lighttpd and Django that gives much more flexibility in distributing Django applications over machines. This is especially important if you expect high loads on your servers. Of course you should make use of the Django caching middleware, but there are times when even that is not enough and the only solution is to throw more hardware at the problem.

Update: I maintain my descriptions now in my trac system. See the lighty+FCGI description for Django.

Caveat: since Django is very new software, I don't have production experiences with it. So this is more from a theoretical standpoint, incorporating knowledge I gained with running production systems for several larger portals. In the end it doesn't matter much what your software is - it only matters how you can distribute it over your server farm.

To follow this documentation, you will need the following packages and files installed on your system:

[Django][2] itself - currently fetched from SVN. Follow the setup instructions or use python setup.py install.
[Flup][3] - a package of different ways to run WSGI applications. I use the threaded WSGIServer in this documentation.
[lighttpd][4] itself of course. You need to compile at least the fastcgi, the rewrite and the accesslog module, usually they are compiled with the system.
[Eunuchs][5] - only needed if you are using Python 2.3, because Flup uses socketpair in the preforked servers and that is only available starting with Python 2.4
[django-fcgi.py][6] - my FCGI server script, might some day be part of the Django distribution, but for now just fetch it here. Put this script somewhere in your $PATH, for example /usr/local/bin and make it executable.
If the above doesn't work for any reason (maybe your system doesn't support socketpair and so can't use the preforked server), you can fetch [django-fcgi-threaded.py][7] - an alternative that uses the threading server with all it's problems. I use it for example on Mac OS X for development.

Before we start, let's talk a bit about server architecture, python and heavy load. The still preferred Installation of Django is behind Apache2 with modpython2. modpython2 is a quite powerfull extension to Apache that integrates a full Python interpreter (or even many interpreters with distinguished namespaces) into the Apache process. This allows Python to control many aspects of the server. But it has a drawback: if the only use is to pass on requests from users to the application, it's quite an overkill: every Apache process or thread will incorporate a full python interpreter with stack, heap and all loaded modules. Apache processes get a bit fat that way.

Another drawback: Apache is one of the most flexible servers out there, but it's a resource hog when compared to small servers like lighttpd. And - due to the architecture of Apache modules - mod_python will run the full application in the security context of the web server. Two things you don't often like with production environments.

So a natural approach is to use lighter HTTP servers and put your application behind those - using the HTTP server itself only for media serving, and using FastCGI to pass on requests from the user to your application. Sometimes you put that small HTTP server behind an Apache front that only uses modproxy (either directly or via modrewrite) to proxy requests to your applications webserver - and believe it or not, this is actually a lot faster than serving the application with Apache directly!

The second pitfall is Python itself. Python has a quite nice threading library. So it would be ideal to build your application as a threaded server - because threads use much less resources than processes. But this will bite you, because of one special feature of Python: the GIL. The dreaded global interpreter lock. This isn't an issue if your application is 100% Python - the GIL only kicks in when internal functions are used, or when C extensions are used. Too bad that allmost all DBAPI libraries use at least some database client code that makes use of a C extension - you start a SQL command and the threading will be disabled until the call returns. No multiple queries running ...

So the better option is to use some forking server, because that way the GIL won't kick in. This allows a forking server to make efficient use of multiple processors in your machine - and so be much faster in the long run, despite the overhead of processes vs. threads.

For this documentation I take a three-layer-approach for distributing the software: the front will be your trusted Apache, just proxying all stuff out to your project specific lighttpd. The lighttpd will have access to your projects document root and wil pass on special requests to your FCGI server. The FCGI server itself will be able to run on a different machine, if that's needed for load distribution. It will use a preforked server because of the threading problem in Python and will be able to make use of multiprocessor machines.

I won't talk much about the first layer, because you can easily set that up yourself. Just proxy stuff out to the machine where your lighttpd is running (in my case usually the Apache runs on different machines than the applications). Look it up in the mod_proxy documentation, usually it's just ProxyPass and ProxyPassReverse.

The second layer is more interesting. lighttpd is a bit weird in the configuration of FCGI stuff - you need FCGI scripts in the filesystem and need to hook those up to your FCGI server process. The FCGI scripts actually don't need to contain any content - they just need to be in the file system.

So we start with your Django project directory. Just put a directory publichtml in there. That's the place where you put your media files, for example the adminmedia directory. This directory will be the document root for your project server. Be sure only to put files in there that don't contain private data - private data like configs and modules better stay in places not accessible by the webserver. Next set up a lighttpd config file. You only will use the rewrite and the fastcgi modules. No need to keep an access log, that one will be written by your first layer, your apache server. In my case the project is in /home/gb/work/myproject - you will need to change that to your own situation. Store the following content as /home/gb/work/myproject/lighttpd.conf

   server.modules = ( "mod_rewrite", "mod_fastcgi" )
   server.document-root = "/home/gb/work/myproject/public_html"
   server.indexfiles = ( "index.html", "index.htm" )
   server.port = 8000
   server.bind = "127.0.0.1"
   server.errorlog = "/home/gb/work/myproject/error.log" 

fastcgi.server = (
"/main.fcgi" => (
"main" => (
"socket" => "/home/gb/work/myproject/main.socket"
       )
     ),
"/admin.fcgi" => (
"admin" => (
"socket" => "/home/gb/work/myproject/admin.socket"
       )
     )
   )

url.rewrite = (
"^(/admin/.*)$" => 
"/admin.fcgi$1",
"^(/polls/.*)$" => 
"/main.fcgi$1"
   )

mimetype.assign            = (
".pdf"          => 
"application/pdf",
".sig"          => 
"application/pgp-signature",
".spl"          => 
"application/futuresplash",
".class"        => 
"application/octet-stream",
".ps"           => 
"application/postscript",
".torrent"      => 
"application/x-bittorrent",
".dvi"          => 
"application/x-dvi",
".gz"           => 
"application/x-gzip",
".pac"          => 
"application/x-ns-proxy-autoconfig",
".swf"          => 
"application/x-shockwave-flash",
".tar.gz"       => 
"application/x-tgz",
".tgz"          => 
"application/x-tgz",
".tar"          => 
"application/x-tar",
".zip"          => 
"application/zip",
".mp3"          => 
"audio/mpeg",
".m3u"          => 
"audio/x-mpegurl",
".wma"          => 
"audio/x-ms-wma",
".wax"          => 
"audio/x-ms-wax",
".ogg"          => 
"audio/x-wav",
".wav"          => 
"audio/x-wav",
".gif"          => 
"image/gif",
".jpg"          => 
"image/jpeg",
".jpeg"         => 
"image/jpeg",
".png"          => 
"image/png",
".xbm"          => 
"image/x-xbitmap",
".xpm"          => 
"image/x-xpixmap",
".xwd"          => 
"image/x-xwindowdump",
".css"          => 
"text/css",
".html"         => 
"text/html",
".htm"          => 
"text/html",
".js"           => 
"text/javascript",
".asc"          => 
"text/plain",
".c"            => 
"text/plain",
".conf"         => 
"text/plain",
".text"         => 
"text/plain",
".txt"          => 
"text/plain",
".dtd"          => 
"text/xml",
".xml"          => 
"text/xml",
".mpeg"         => 
"video/mpeg",
".mpg"          => 
"video/mpeg",
".mov"          => 
"video/quicktime",
".qt"           => 
"video/quicktime",
".avi"          => 
"video/x-msvideo",
".asf"          => 
"video/x-ms-asf",
".asx"          => 
"video/x-ms-asf",
".wmv"          => 
"video/x-ms-wmv"
    )

I bind the lighttpd only to the localhost interface because in my test setting the lighttpd runs on the same host as the Apache server. In multi server settings you will bind to the public interface of your lighttpd servers, of course. The FCGI scripts communicate via sockets in this setting, because in this test setting I only use one server for everything. If your machines would be distributed, you would use the "host" and "port" settings instead of the "socket" setting to connect to FCGI servers on different machines. And you would add multiple entries for the "main" stuff, to distribute the load of the application over several machines. Look it up in the lighttpd documentation what options you will have.

I set up two FCGI servers for this - one for the admin settings and one for the main settings. All applications will be redirected through the main settings FCGI and all admin requests will be routed to the admin server. That's done with the two rewrite rules - you will need to add a rewrite rule for every application you are using.

Since lighttpd needs the FCGI scripts to exist to pass along the PATH_INFO to the FastCGI, you will need to touch the following files:

/home/gb/work/myprojectg/public_html/admin.fcgi /home/gb/work/myprojectg/public_html/main.fcgi

They don't need to contain any code, they just need to be listed in the directory. Starting with lighttpd 1.3.16 (at the time of this writing only in svn) you will be able to run without the stub files for the .fcgi - you just add "check-local" => "disable" to the two FCGI settings. Then the local files are not needed. So if you want to extend this config file, you just have to keep some very basic rules in mind:

every settings file needs it's own .fcgi handler
every .fcgi needs to be touched in the filesystem - this might go away in a future version of lighttpd, but for now it is needed
load distribution is done on .fcgi level - add multiple servers or sockets to distribute the load over several FCGI servers
every application needs a rewrite rule that connects the application with the .fcgi handler

Now we have to start the FCGI servers. That's actually quite simple, just use the provided django-fcgi.py script as follows:

   django-fcgi.py --settings=myproject.work.main 
       --socket=/home/gb/work/myproject/main.socket 
       --minspare=5 --maxspare=10 --maxchildren=100 
       --daemon

django-fcgi.py --settings=myproject.work.admin 
       --socket=/home/gb/work/myproject/admin.socket 
       --maxspare=2 --daemon

Those two commands will start two FCGI server processes that use the given sockets to communicate. The admin server will only use two processes - this is because often the admin server isn't the server with the many hits, that's the main server. So the main server get's a higher-than-default setting for spare processes and maximum child processes. Of course this is just an example - tune it to your needs.

The last step is to start your lighttpd with your configuration file:

lighttpd -f /home/gb/work/myproject/lighttpd.conf

That's it. If you now access either the lighttpd directly at http://localhost:8000/polls/ or through your front apache, you should see your application output. At least if everything went right and I didn't make too much errors.

tags: Django, Programmierung, Python, Sysadmin, Texte

Jon July 29, 2005, 9:11 p.m.

Just a note on the apache2/modpython thing -- you can use modpython as a dso and then start up apache2 on alternate ports using a different httpd.conf , so only the python stuff is loaded into memory on the instances you need

Bob Aug. 9, 2005, 10:14 p.m.

Thank you so much for these detailed instructions. I was grumbling about trying to get Django working on OpenBSD (and Apache 1.3, grrrrr). Then I saw your work and see that I can use lighttpd to play with Django. Now I can play with 'pure BSD licensed' software. What happens if you get rid of Apache 2 all together? As I understand it, Django caches pretty well, so why have something as large as Apache in front? Perhaps a proxy like Squid or Pound would lead to better performance. I'll let you know if I get this working.

hugo Aug. 9, 2005, 10:30 p.m.

In my case I use the apache in front of the lighttpd+FCGI stuff to integrated several different systems into one big site name - just using mod_proxy to pass parts of the site out to different systems. And often I have mod_perl stuff in the server that does some special handwritten (and handoptimized) caching in front of the dynamic sites, or rewrites HTML code or does one of all those nice things that are very easy to do with Apache and rather hard to do with other systems.

But it's definitely possible to run the setup completely without Apache in front of the system, and I would assume that you would gain performance by that - even though you might be astonished how little performance Apache with mod_proxy really takes away from the setup. Apache is very fast if used just with mod_proxy, it get's a bit slower when accessing filesystem content (but you can configure it in a way that gives much better performance if you need to).

Lllama Oct. 9, 2005, 5:51 p.m.

Thanks for these docs; very useful. I've spotted something with your django-fcgi.py though. The first line uses /usr/bin/python2.3 and so will always try to use eunuchs.

Thanks again,

hugo Oct. 9, 2005, 8:07 p.m.

Yes, you need to patch it accordingly or change it to /usr/bin/python. When the script goes into Django trunk, it will be changed, of course. But there are some more things that need to be fixed before that can happen.

Black_China Jan. 30, 2006, 12:50 p.m.

I am having real difficulties just installing mod_python on a web server which runs Debian 3.1 and Apache 2.0.x ( I believe x=.54).

Firstly, Debian 3.1 does not have a python2.4 version of the software for Apache2.0 and when I install the 2.3 version Apache is giving me some "...... Segmentation fault (11)".

Not sure how to proceed.

Does this thing work? Is it easier to get going that mod_python?

All a want to do is some webdevelopment, not spent all my time trying to install software.

Grateful for any suggestions.

hugo Jan. 30, 2006, 7:30 p.m.

It's used by several people - this and the SCGI version, both with Lighttpd and Apache. This site for example is running Django+SCGI+Apache. Other sites are using the SCGI setup in production environments, too.

David March 4, 2006, 1:39 p.m.

I start the FCGI server using this command
django-fcgi.py --settings=myproject.work.main
--socket=/django/myproject/main.socket
--minspare=5 --maxspare=10 --maxchildren=100 --daemon

I get this error
Traceback (most recent call last):
File "C:\Python24\Scripts\django-fcgi.py", line 164, in ?
import eunuchs.socketpair
ImportError: No module named eunuchs.socketpair

I am running python 2.4.2 under Windows XP
My django project is installed in c:\django\myproject

David

hugo March 4, 2006, 2:32 p.m.

It won't work with Windows, as Windows doesn't have crucial functionality like full fork support and socketpair doesn't work, too. You might be able to change the django-fcgi.py script to use the threading FLUP server instead of the forking FLUP server, in that case it might work with Windows. But you are essentially out alone on any bugs, as I don't have any windows system running and so can't do any tests.

Matt April 6, 2006, 4:25 a.m.

I kept screaming "That's Right!" at the monitor as I read. I have been facing some issues with Zope performance under heavy load and I have been looking for a scalable python-friendly web server platform. Looks like lighty + scgi fits the bill, thanks for the great article!

janders Aug. 27, 2006, 2:21 a.m.

When running django-fcgi.py, I assume that --socket is supposed to match the socket defined in lighttpd.conf, but what is --settings supposed to match? It would be helpful if you defined each of the arguments.

hugo Sept. 11, 2006, 1:26 p.m.

--settings is the django settings module. This is so you can start different settings as servers.

muyufan Oct. 18, 2006, 11:27 a.m.

windows

you just edit django-fcgi.py to use flup's fcgi.py module instead fcgi_fork.py module

and use threaded method

raja Feb. 6, 2007, 4:43 a.m.

Hi,
I am not able to download the script 'django-fcgi.py' from this url.
https://simon.bofh.ms/cgi-bin/trac-django-projects.cgi/wiki/DjangoFcgiLighttpd

is there any other way to download that script?

regards
raja

hugo Feb. 8, 2007, 2:52 p.m.

Django has builtin FCGI support for quite some while now, better to use that than my script, as it is built on a rather old Django version. Guess my SVN is down again, that's usually the reason why the trac instanace doesn't work.

allo Aug. 18, 2008, 10:38 p.m.

Instead of touching the .fcgi you can use check-local = False in the lighttpd config.