Content-type: matter-transport/sentient-life-form

International Components for Unicode is a library of reference implementations of all Unicode standards, specifically concerning character transformation, normalization, and sorting, but also many other localization issues such as date formatting, etc.

PyICU is an integration of the ICU C++ interface into Python. Seems quite comprehensive in terms of scope. Integration with Python string data types is also provided.

Connecting databases to Python with SQLObject is a quite nice introduction to SQLObject - one of the nicer Object-Relation-Mappers for Python.

Unicode HOWTO for Python. Python programmers should read.

Crypt::PasswdMD5 is a Perl module that hashes MD5 passwords the same way Linux and Solaris do.

md5crypt.py is the same algorithm for MD5 passwords, this time in Python.

Store passwords as hashes - safe?

Not quite new (it was new last summer, but I somehow missed it, the underlying paper is even two years old), but still interesting: Project RainbowCrack is a project aimed at creating tools for faster cracking of hashes. Hashes can normally only be resolved through brute force - supported by algorithmic weaknesses (as recently found in MD5 and SHA1). However, there is an approach to create the more complex calculations that arise during the brute-force process (i.e. essentially algorithmic sub-steps) in advance - for example, if you only intend to crack passwords with a maximum number of characters.

Of course, this does not come for free: you trade computing time for storage space. Tables for cracking up to 14-digit Windows passwords occupy a casual 64 GiB of memory. The practical relevance of the approach and the tools may become obvious from this quote:

Some ready to work lanmanager and md5 tables are demonstrated in Rainbow Table section. One interesting stuff among them is the lm #6 table, with which we can break any windows password up to 14 characters in a few minutes.

There is also a web interface to a distributed computing cluster for Project RainbowCrack, through which you can send MD5 hashes to an MD5 cracker, which then - if it is a string with a maximum of 8 characters - spits out the plain text. And this thing is constantly building more Rainbow Tables, making cracking faster and faster.

Just as a warning for those who think that a simple MD5 hash (or ultimately almost any hash) on the password would be sufficient. Unix systems typically use salted hashes - the password is extended by a plain text and then the hash is formed together with it. This extends the password in principle, even if the extension is of course not secret - for the computing time or the table size it doesn't matter, the passwords are simply longer and thus harder to crack. But it is also only a matter of space until they are not secure.

Better are passphrases instead of passwords - just simply normally long sentences. On the one hand, you can often remember these better (many people cannot remember a phone number, but can quote lines from poems) and on the other hand, they are simply longer (and especially flexibly long), so that Rainbow Tables as an attack method are out of the question. The algorithmic weaknesses of MD5 and SHA1 remain, of course.

A Treeview in JavaScript that can be used within pages (without frames) and still remembers its state.

Again something new with Django

There's always news, but this time there's a very interesting feature again: the inspectdb command delivers all the tables and fields from a PostgreSQL database in the format of a Python data model. Additionally, foreign keys are also found if they are stored in the database. Very practical if you need to build an interface for an existing database, you save a lot of typing work.

Ian Bicking on what's currently happening with SQLObject - it had become quite quiet around one of the nicest SQL object layers for Python, but now it's moving forward again. The most interesting point for me: Tool support for database upgrades. A point that, for example, is still missing in Django.

Coroutines for Python

Philip J. Eby has provided a patch for the implementation of PEP 342. This means that the chances of Python having coroutines in the future are very good.

And that, in turn, means that Python will get a - albeit primitive - form of continuations. Now all that's missing is for something like statesaver to be integrated into Python - for multishot continuations (ok, first just copyable coroutines, but that would be a start at least).

All of this, of course, just to finally be able to work with continuations in web frameworks. Ok, it's already possible with CherryFlow, but it would be nice if all of this would make it into mainstream Python.

Whoever wants to deal with larger Erlang software and try out a Jabber server, might find ejabberd interesting - a Jabber server that uses all the nice features of Erlang to offer, for example, simple clustering and good data distribution.

The Illusive setdefaultencoding

Ian Bicking mentioned a nice trick in his article about setdefaultencoding: simply reload the sys module with reload(sys) to make setdefaultencoding available.

setdefaultencoding is used to set the default encoding for bytestrings. Normally, this is ASCII, but it can be changed to iso-8859-1 or utf-8 - if you have setdefaultencoding at all. Unfortunately, it is deleted when the Python runtime environment starts - because the Python developers want to patronize the users again.

reload(sys) is of course something that does not necessarily inspire confidence - sys is after all not an unimportant module. But in my experiments it has worked so far and it definitely helps with the whole Unicode problem if you can give your programs a different encoding as standard.

It would be nicer if setdefaultencoding were not deleted in the standard distribution. Of course, this can also be achieved by patching site.py, but that is not better than reloading sys ...

Writing a Simple Filesystem Browser with Django

Dieser Artikel ist mal wieder in Englisch, da er auch für die Leute auf #django interessant sein könnte. This posting will show how to build a very simple filesystem browser with Django. This filesystem browser behaves mostly like a static webserver that allows directory traversal. The only speciality is that you can use the Django admin to define filesystems that are mounted into the namespace of the Django server. This is just to demonstrate how a Django application can make use of different data sources besides the database, it's not really meant to serve static content (although with added authentication it could come in quite handy for restricted static content!).

Even though the application makes very simple security checks on passed in filenames, you shouldn't run this on a public server - I didn't do any security tests and there might be buttloads of bad things in there that might expose your private data to the world. You have been warned. We start as usual by creating the filesystems application with the django-admin.py startapp filesystems command. Just do it like you did with your polls application in the first tutorial. Just as an orientation, this is how the myproject directory does look like on my development machine:


.
|-- apps
| |-- filesystems
| | |-- models
| | |-- urls
| | `-- views
| `-- polls
| |-- models
| |-- urls
| `-- views
|-- public_html
| `-- admin_media
| |-- css
| |-- img
| | `-- admin
| `-- js
| `-- admin
|-- settings
| `-- urls
`-- templates
 `-- filesystems

After creating the infrastructure, we start by building the model. The model for the filesystems is very simple - just a name for the filesystem and a path where the files are actually stored. So here it is, the model:


 from django.core import meta

class Filesystem(meta.Model):

fields = ( meta.CharField('name', 'Name', maxlength=64), meta.CharField('path', 'Path', maxlength=200), )

def repr(self): return self.name

def get_absolute_url(self): return '/files/%s/' % self.name

def isdir(self, path): import os p = os.path.realpath(os.path.join(self.path, path)) if not p.startswith(self.path): raise ValueError(path) return os.path.isdir(p)

def files(self, path=''): import os import mimetypes p = os.path.realpath(os.path.join(self.path, path)) if not p.startswith(self.path): raise ValueError(path) l = os.listdir(p) if path: l.insert(0, '..') return [(f, os.path.isdir(os.path.join(p, f)), mimetypes.guess_type(f)[0] or 'application/octetstream') for f in l]

def file(self, path): import os import mimetypes p = os.path.realpath(os.path.join(self.path, path)) if p.startswith(self.path): (t, e) = mimetypes.guess_type(p) return (p, t or 'application/octetstream') else: raise ValueError(path)

admin = meta.Admin( fields = ( (None, {'fields': ('name', 'path')}), ), list_display = ('name', 'path'), search_fields = ('name', 'path'), ordering = ['name'], )

As you can see, the model and the admin is rather boring. What is interesting, though, are the additional methods isdir , files and file . isdir just checks wether a given path below the filesystem is a directory or not. files returns the files of the given path below the filesystems base path and file returns the real pathname and the mimetype of a given file below the filesystems base path. All three methods check for validity of the passed in path - if the resulting path isn't below the filesystems base path, a ValueError is thrown. This is to make sure that nobody uses .. in the path name to break out of the defined filesystem area. So the model includes special methods you can use to access the filesystems content itself, without caring for how to do that in your views. It's job of the model to know about such stuff.

The next part of your little filesystem browser will be the URL configuration. It's rather simple, it consists of the line in settings/urls/main.py and the myproject.apps.filesystems.urls.filesystems module. Fist the line in the main urls module:


 from django.conf.urls.defaults import *

urlpatterns = patterns('',
 (r'^files/', include('myproject.apps.filesystems.urls.filesystems')),
 )

Next the filesystems own urls module:


 from django.conf.urls.defaults import *

urlpatterns = patterns('myproject.apps.filesystems.views.filesystems',
 (r'^$', 'index'),
 (r'^(?P<filesystem_name>.*?)/(?P<path>.*)$', 'directory'),
 )

You can now add the application to the main settings file so you don't forget to do that later on. Just look for the INSTALLED_APPS setting and add the filebrowser:


 INSTALLED_APPS = (
 'myproject.apps.polls',
 'myproject.apps.filesystems'
 )

One part is still missing: the views. This module defines the externally reachable methods we defined in the urlmapper. So we need two methods, index and directory . The second one actually doesn't work only with directories - if it get's passed a file, it just presents the contents of that file with the right mimetype. The view makes use of the methods defined in the model to access actual filesystem contents. Here is the source for the views module:


 from django.core import template_loader
 from django.core.extensions import DjangoContext as Context
 from django.core.exceptions import Http404
 from django.models.filesystems import filesystems
 from django.utils.httpwrappers import HttpResponse

def index(request):
 fslist = filesystems.getlist(orderby=['name'])
 t = templateloader.gettemplate('filesystems/index')
 c = Context(request, {
 'fslist': fslist,
 })
 return HttpResponse(t.render(c))

def directory(request, filesystem_name, path):
 import os
 try:
 fs = filesystems.getobject(name exact=filesystemname)
 if fs.isdir(path):
 files = fs.files(path)
 tpl = templateloader.gettemplate('filesystems/directory')
 c = Context(request, {
 'dlist': [f for (f, d, t) in files if d],
 'flist': [{'name':f, 'type':t} for (f, d, t) in files if not d],
 'path': path,
 'fs': fs,
 })
 return HttpResponse(tpl.render(c))
 else:
 (f, mimetype) = fs.file(path)
 return HttpResponse(open(f).read(), mimetype=mimetype)
 except ValueError: raise Http404
 except filesystems.FilesystemDoesNotExist: raise Http404
 except IOError: raise Http404

See how the elements of the directory pattern are passed in as parameters to the directory method - the filesystem name is used to find the right filesystem and the path is used to access content below that filesystems base path. Mimetypes are discovered using the mimetypes module from the python distribution, btw.

The last part of our little tutorial are the templates. We need two templates - one for the index of the defined filesystems and one for the content of some path below some filesystem. We don't need a template for the files content - file content is delivered raw. So first the main index template:


{% if fslist %}
<h1>defined filesystems</h1> <ul> {% for fs in fslist %}
<li><a href="{{ fs.get_absolute_url }}">{{ fs.name }}</a></li> {% endfor %}
</ul> {% else %}
<p>Sorry, no filesystems have been defined.</p> {% endif %}

The other template is the directory template that shows contents of a path below the filesystems base path:


 {% if dlist or flist %}
 <h1>Files in //{{ fs.name }}/{{ path }}</h1> <ul> {% for d in dlist %}
 <li> <a href="{{ fs.getabsoluteurl }}{{ path }}{{ d }}/" >{{ d }}</a> </li> {% endfor %}
 {% for f in flist %}
 <li> <a href="{{ fs.getabsoluteurl }}{{ path }}{{ f.name }}" >{{ f.name }}</a> ({{ f.type }})</li> {% endfor %}
 </ul> {% endif %}

Both templates need to be stored somewhere in your TEMPLATE PATH. I have set up a path in the TEMPLATE PATH with the name of the application: filesystems . In there I stored the files as index.html and directory.html . Of course you normally would build a base template for the site and extend that in your normal templates. And you would add a 404.html to handle 404 errors. But that's left as an exercise to the reader.After you start up your development server for your admin (don't forget to set DJANGO SETTINGS MODULE accordingly!) you can add a filesystem to your database (you did do django-admin.py install filesystems sometime in between? No? Do it now, before you start your server). Now stop the admin server, change your DJANGO SETTINGS MODULE and start the main settings server. Now you can surf to http://localhost:8000/files/(at least if you did set up your URLs and server like I do) and browse the files in your filesystem. That's it. Wasn't very complicated, right? Django is really simple to use

Who wants to work with PostgreSQL and Frontier, simply install the PostgreSQL Extension for Frontier. For Mac and Windows.

Who believed that ISO time specifications are just YYYY-MM-TT HH:MM:SS.HS, forget it: International standard date and time notation. Of course, it's an ISO standard ...

Bodies in the Basement

Every software has them - some skeletons in the closet that start to stink when you find them. Django unfortunately too. And that is the handling of Unicode. The automatically generated Admin in Django always sends XHTML and utf-8 to the browser. The browsers therefore also send utf-8 back. But there are browsers that use a slightly different format for the data to be sent in such cases - the so-called multipart format. This is used because it is the only guaranteed method in HTTP-POST where you can send a character set along.

Unfortunately, Django parses these multipart-POSTs with the email module from Python. This then diligently produces Unicode strings from the parts marked as utf-8. Which is correct in itself - only in the Django source code there are str() calls scattered everywhere in the source code. And these then naturally crash when they are given unicode with characters above chr(128) in them.

I took a look at the source code, the most realistic approach would be to make sure in Django that Unicode results are then converted back to utf-8, so that only normal Python strings are used internally. That works so far, but then there are still problems with some databases that recognize utf-8 content when saving and then produce Unicode again when reading the content - SQLite is such a database.

Well, this won't be easy to fix. I've already tried it, it's a pretty nasty topic and unfortunately not at all considered in Django - and therefore it crashes at all corners and edges. Let's see if I can't come up with something useful after all ...

What I also noticed: Django sends the Content-type only via a meta tag with http-equiv. That's a nasty hack, it would be much better if the Content-type were set correctly as a header, then nothing can go wrong if, for example, Apache wants to add a Default-Charset. And the browsers would also react in a much more reproducible way.

In any case, this is again the typical case of American programmers. They like to tell you that you just need to switch to Unicode and utf-8 when you report your character encoding problems, but I have never seen software from an American programmer that handled Unicode correctly ...

Otherwise, there are still one or two hinges in Django - especially annoying because they are not documented, but easy to solve: the default time zone in Django is America/Chicago. You just have to write a variable TIME_ZONE with 'Europe/Berlin' as value in your settings file and apply a small patch so that Django can handle the '-' as time zone separator. Oh man, when Americans write software ...

Somehow, my motivation to take a closer look at Ruby on Rails is increasing at the moment, after all, it was Danes who started it and they should at least get such things right (if only that nice automatic administration part of Django were not - that's exactly what I would be after. Why hasn't anyone built something like that for ROR, damn ...)

Update: I have attached a patch to the corresponding ticket for the Unicode problem (just scroll to the very bottom) that at least gets the problem somewhat under control - provided you don't use SQLite, because SQLite always returns Unicode strings and they then cause trouble again. But at least with PostgreSQL, umlauts now work in Django. The solution is not really perfect, but at least it can be brought in with only a little code change. A real solution would probably require larger code changes.

Another patch is attached to the ticket for the time zone problem, with the patch you can then also use TIME_ZONE = 'Europe/Berlin' to get the time specifications, for example in the change history, in the correct time zone.

In such moments you wish you had commit rights to Django, to be able to put such quite manageable patches in yourself

Another update: Adrian was in the chat yesterday and today and the problems with Unicode are largely gone. Only with SQLite there is still trouble, but I already have the patch finished. And the time zone issue is also fixed in the SVN. And he has started unit tests. Very useful, if you can then test the whole framework cleanly in the long run after a patch ...

If you, like me, find yourself in a situation where you don't like the Unicode strings in PySQLite2 and need UTF-8 byte strings: PysqliteFactories are the solution here, not converters. Because converters would have to be registered for every variation of varchar that is in use - the row factories, on the other hand, are quite agnostic and practical. And if you already use your own cursor class: simply set this as the cursor factory, which then assigns a row factory to the instance with self.row_factory.

Abridged guide to HTTP Caching is a description of the most important caching headers in HTTP and how they should be used.

JSAN is what CPAN is for Perl - a central directory and download area for JavaScript sources and packages.

typo is a blog software for Ruby on Rails with seemingly already quite extensive features. Specifically also with good caching (produces static pages) for high-traffic sites, where parts are then kept dynamically via JavaScript. Sounds like I'll take a closer look at it when my ROR book arrives ...

Django, lighttpd and FCGI, second take

In my first take at this stuff I gave a sample on how to run django projects behind lighttpd with simple FCGI scripts integrated with the server. I will elaborate a bit on this stuff, with a way to combine lighttpd and Django that gives much more flexibility in distributing Django applications over machines. This is especially important if you expect high loads on your servers. Of course you should make use of the Django caching middleware, but there are times when even that is not enough and the only solution is to throw more hardware at the problem.

Update: I maintain my descriptions now in my trac system. See the lighty+FCGI description for Django.

Caveat: since Django is very new software, I don't have production experiences with it. So this is more from a theoretical standpoint, incorporating knowledge I gained with running production systems for several larger portals. In the end it doesn't matter much what your software is - it only matters how you can distribute it over your server farm.

To follow this documentation, you will need the following packages and files installed on your system:

[Django][2] itself - currently fetched from SVN. Follow the setup instructions or use python setup.py install .
[Flup][3] - a package of different ways to run WSGI applications. I use the threaded WSGIServer in this documentation.
[lighttpd][4] itself of course. You need to compile at least the fastcgi, the rewrite and the accesslog module, usually they are compiled with the system.
[Eunuchs][5] - only needed if you are using Python 2.3, because Flup uses socketpair in the preforked servers and that is only available starting with Python 2.4
[django-fcgi.py][6] - my FCGI server script, might some day be part of the Django distribution, but for now just fetch it here. Put this script somewhere in your $PATH, for example /usr/local/bin and make it executable.
If the above doesn't work for any reason (maybe your system doesn't support socketpair and so can't use the preforked server), you can fetch [django-fcgi-threaded.py][7] - an alternative that uses the threading server with all it's problems. I use it for example on Mac OS X for development.

Before we start, let's talk a bit about server architecture, python and heavy load. The still preferred Installation of Django is behind Apache2 with mod python2. mod python2 is a quite powerfull extension to Apache that integrates a full Python interpreter (or even many interpreters with distinguished namespaces) into the Apache process. This allows Python to control many aspects of the server. But it has a drawback: if the only use is to pass on requests from users to the application, it's quite an overkill: every Apache process or thread will incorporate a full python interpreter with stack, heap and all loaded modules. Apache processes get a bit fat that way.

Another drawback: Apache is one of the most flexible servers out there, but it's a resource hog when compared to small servers like lighttpd. And - due to the architecture of Apache modules - mod_python will run the full application in the security context of the web server. Two things you don't often like with production environments.

So a natural approach is to use lighter HTTP servers and put your application behind those - using the HTTP server itself only for media serving, and using FastCGI to pass on requests from the user to your application. Sometimes you put that small HTTP server behind an Apache front that only uses mod proxy (either directly or via mod rewrite) to proxy requests to your applications webserver - and believe it or not, this is actually a lot faster than serving the application with Apache directly!

The second pitfall is Python itself. Python has a quite nice threading library. So it would be ideal to build your application as a threaded server - because threads use much less resources than processes. But this will bite you, because of one special feature of Python: the GIL. The dreaded global interpreter lock. This isn't an issue if your application is 100% Python - the GIL only kicks in when internal functions are used, or when C extensions are used. Too bad that allmost all DBAPI libraries use at least some database client code that makes use of a C extension - you start a SQL command and the threading will be disabled until the call returns. No multiple queries running ...

So the better option is to use some forking server, because that way the GIL won't kick in. This allows a forking server to make efficient use of multiple processors in your machine - and so be much faster in the long run, despite the overhead of processes vs. threads.

For this documentation I take a three-layer-approach for distributing the software: the front will be your trusted Apache, just proxying all stuff out to your project specific lighttpd. The lighttpd will have access to your projects document root and wil pass on special requests to your FCGI server. The FCGI server itself will be able to run on a different machine, if that's needed for load distribution. It will use a preforked server because of the threading problem in Python and will be able to make use of multiprocessor machines.

I won't talk much about the first layer, because you can easily set that up yourself. Just proxy stuff out to the machine where your lighttpd is running (in my case usually the Apache runs on different machines than the applications). Look it up in the mod_proxy documentation, usually it's just ProxyPass and ProxyPassReverse.

The second layer is more interesting. lighttpd is a bit weird in the configuration of FCGI stuff - you need FCGI scripts in the filesystem and need to hook those up to your FCGI server process. The FCGI scripts actually don't need to contain any content - they just need to be in the file system.

So we start with your Django project directory. Just put a directory public html in there. That's the place where you put your media files, for example the admin media directory. This directory will be the document root for your project server. Be sure only to put files in there that don't contain private data - private data like configs and modules better stay in places not accessible by the webserver. Next set up a lighttpd config file. You only will use the rewrite and the fastcgi modules. No need to keep an access log, that one will be written by your first layer, your apache server. In my case the project is in /home/gb/work/myproject - you will need to change that to your own situation. Store the following content as /home/gb/work/myproject/lighttpd.conf


 server.modules = ( "mod_rewrite", "mod_fastcgi" )
 server.document-root = "/home/gb/work/myproject/public_html"
 server.indexfiles = ( "index.html", "index.htm" )
 server.port = 8000
 server.bind = "127.0.0.1"
 server.errorlog = "/home/gb/work/myproject/error.log"

fastcgi.server = (
"/main.fcgi" => (
"main" => (
"socket" => "/home/gb/work/myproject/main.socket"
 )
 ),
"/admin.fcgi" => (
"admin" => (
"socket" => "/home/gb/work/myproject/admin.socket"
 )
 )
 )

url.rewrite = (
"^(/admin/.*)$" => "/admin.fcgi$1",
"^(/polls/.*)$" => "/main.fcgi$1"
 )

mimetype.assign = (
".pdf" => "application/pdf",
".sig" => "application/pgp-signature",
".spl" => "application/futuresplash",
".class" => "application/octet-stream",
".ps" => "application/postscript",
".torrent" => "application/x-bittorrent",
".dvi" => "application/x-dvi",
".gz" => "application/x-gzip",
".pac" => "application/x-ns-proxy-autoconfig",
".swf" => "application/x-shockwave-flash",
".tar.gz" => "application/x-tgz",
".tgz" => "application/x-tgz",
".tar" => "application/x-tar",
".zip" => "application/zip",
".mp3" => "audio/mpeg",
".m3u" => "audio/x-mpegurl",
".wma" => "audio/x-ms-wma",
".wax" => "audio/x-ms-wax",
".ogg" => "audio/x-wav",
".wav" => "audio/x-wav",
".gif" => "image/gif",
".jpg" => "image/jpeg",
".jpeg" => "image/jpeg",
".png" => "image/png",
".xbm" => "image/x-xbitmap",
".xpm" => "image/x-xpixmap",
".xwd" => "image/x-xwindowdump",
".css" => "text/css",
".html" => "text/html",
".htm" => "text/html",
".js" => "text/javascript",
".asc" => "text/plain",
".c" => "text/plain",
".conf" => "text/plain",
".text" => "text/plain",
".txt" => "text/plain",
".dtd" => "text/xml",
".xml" => "text/xml",
".mpeg" => "video/mpeg",
".mpg" => "video/mpeg",
".mov" => "video/quicktime",
".qt" => "video/quicktime",
".avi" => "video/x-msvideo",
".asf" => "video/x-ms-asf",
".asx" => "video/x-ms-asf",
".wmv" => "video/x-ms-wmv"
 )

I bind the lighttpd only to the localhost interface because in my test setting the lighttpd runs on the same host as the Apache server. In multi server settings you will bind to the public interface of your lighttpd servers, of course. The FCGI scripts communicate via sockets in this setting, because in this test setting I only use one server for everything. If your machines would be distributed, you would use the "host" and "port" settings instead of the "socket" setting to connect to FCGI servers on different machines. And you would add multiple entries for the "main" stuff, to distribute the load of the application over several machines. Look it up in the lighttpd documentation what options you will have.

I set up two FCGI servers for this - one for the admin settings and one for the main settings. All applications will be redirected through the main settings FCGI and all admin requests will be routed to the admin server. That's done with the two rewrite rules - you will need to add a rewrite rule for every application you are using.

Since lighttpd needs the FCGI scripts to exist to pass along the PATH_INFO to the FastCGI, you will need to touch the following files: /home/gb/work/myprojectg/public_html/admin.fcgi ``/home/gb/work/myprojectg/public_html/main.fcgi

They don't need to contain any code, they just need to be listed in the directory. Starting with lighttpd 1.3.16 (at the time of this writing only in svn) you will be able to run without the stub files for the .fcgi - you just add "check-local" => "disable" to the two FCGI settings. Then the local files are not needed. So if you want to extend this config file, you just have to keep some very basic rules in mind:

every settings file needs it's own .fcgi handler
every .fcgi needs to be touched in the filesystem - this might go away in a future version of lighttpd, but for now it is needed
load distribution is done on .fcgi level - add multiple servers or sockets to distribute the load over several FCGI servers
every application needs a rewrite rule that connects the application with the .fcgi handler

Now we have to start the FCGI servers. That's actually quite simple, just use the provided django-fcgi.py script as follows:


 django-fcgi.py --settings=myproject.work.main
 --socket=/home/gb/work/myproject/main.socket
 --minspare=5 --maxspare=10 --maxchildren=100
 --daemon

django-fcgi.py --settings=myproject.work.admin
 --socket=/home/gb/work/myproject/admin.socket
 --maxspare=2 --daemon

Those two commands will start two FCGI server processes that use the given sockets to communicate. The admin server will only use two processes - this is because often the admin server isn't the server with the many hits, that's the main server. So the main server get's a higher-than-default setting for spare processes and maximum child processes. Of course this is just an example - tune it to your needs.

The last step is to start your lighttpd with your configuration file: lighttpd -f /home/gb/work/myproject/lighttpd.conf

That's it. If you now access either the lighttpd directly at http://localhost:8000/polls/ or through your front apache, you should see your application output. At least if everything went right and I didn't make too much errors.

Eunuchs provides a few functions that were not yet available in Python 2.3. Specifically, socketpair and recvmsg/sendmsg are very important - for server programming with preforked servers, for example.

Higher-Order Perl is a book (currently in paper form, but it is supposed to be freely available online soon) that deals with higher-order functions and Perl - could be quite interesting, Perl offers a lot of features hidden under all those curly braces and other special characters ...

Running Django with FCGI and lighttpd

Diese Dokumentation ist für einen grösseren Kreis als nur .de gedacht, daher das ganze in Neuwestfälisch Englisch. Sorry. Update: I maintain the actually descriptions now in my trac system. See the FCGI+lighty description for Django. There are different ways to run Django on your machine. One way is only for development: use the django-admin.py runserver command as documented in the tutorial. The builtin server isn't good for production use, though. The other option is running it with mod_python. This is currently the preferred method to run Django. This posting is here to document a third way: running Django behind lighttpd with FCGI.

First you need to install the needed packages. Fetch them from their respective download address and install them or use preinstalled packages if your system provides those. You will need the following stuff:

[Django][2] itself - currently fetched from SVN. Follow the setup instructions or use python setup.py install .
[Flup][3] - a package of different ways to run WSGI applications. I use the threaded WSGIServer in this documentation.
[lighttpd][4] itself of course. You need to compile at least the fastcgi, the rewrite and the accesslog module, usually they are compiled with the system.

First after installing ligthttpd you need to create a lighttpd config file. The configfile given here is tailored after my own paths - you will need to change them to your own situation. This config file activates a server on port 8000 on localhost - just like the runserver command would do. But this server is a production quality server with multiple FCGI processes spawned and a very fast media delivery.


 # lighttpd configuration file
 #
 ############ Options you really have to take care of ####################

server.modules = ( "mod_rewrite", "mod_fastcgi", "mod_accesslog" )

server.document-root = "/home/gb/public_html/"
 server.indexfiles = ( "index.html", "index.htm", "default.htm" )

 these settings attch the server to the same ip and port as runserver would do

server.errorlog = "/home/gb/log/lighttpd-error.log"
 accesslog.filename = "/home/gb/log/lighttpd-access.log"

fastcgi.server = (
"/myproject-admin.fcgi" => (
"admin" => (
"socket" => "/tmp/myproject-admin.socket",
"bin-path" => "/home/gb/public_html/myproject-admin.fcgi",
"min-procs" => 1,
"max-procs" => 1
 )
 ),
"/myproject.fcgi" => (
"polls" => (
"socket" => "/tmp/myproject.socket",
"bin-path" => "/home/gb/public_html/myproject.fcgi"
 )
 )
 )

url.rewrite = (
"^(/admin/.*)$" => "/myproject-admin.fcgi$1",
"^(/polls/.*)$" => "/myproject.fcgi$1"
 )

This config file will start only one FCGI handler for your admin stuff and the default number of handlers (each one multithreaded!) for your own site. You can finetune these settings with the usual ligthttpd FCGI settings, even make use of external FCGI spawning and offloading of FCGI processes to a distributed FCGI cluster! Admin media files need to go into your lighttpd document root.

The config works by translating all standard URLs to be handled by the FCGI script for each settings file - to add more applications to the system you would only duplicate the rewrite rule for the /polls/ line and change that to choices or whatever your module is named. The next step would be to create the .fcgi scripts. Here are the two I am using:


 #!/bin/sh
 # this is myproject.fcgi - put it into your docroot

export DJANGOSETTINGSMODULE=myprojects.settings.main

/home/gb/bin/django-fcgi.py


 #!/bin/sh
 # this is myproject-admin.fcgi - put it into your docroot

export DJANGOSETTINGSMODULE=myprojects.settings.admin

/home/gb/bin/django-fcgi.py

These two files only make use of a django-fcgi.py script. This is not part of the Django distribution (not yet - maybe they will incorporate it) and it's source is given here:


 #!/usr/bin/python2.3

def main():
 from flup.server.fcgi import WSGIServer
 from django.core.handlers.wsgi import WSGIHandler
 WSGIServer(WSGIHandler()).run()

if name == 'main':
 main()

As you can see it's rather simple. It uses the threaded WSGIServer from the fcgi-module, but you could as easily use the forked server - but as the lighttpd already does preforking, I think there isn't much use with forking at the FCGI level. This script should be somewhere in your path or just reference it with fully qualified path as I do. Now you have all parts togehter. I put my lighttpd config into /home/gb/etc/lighttpd.conf , the .fcgi scripts into /home/gb/public_html and the django-fcgi.py into /home/gb/bin . Then I can start the whole mess with /usr/local/sbin/lighttpd -f etc/lighttpd.conf . This starts the server, preforkes all FCGI handlers and detaches from the tty to become a proper daemon. The nice thing: this will not run under some special system account but under your normal user account, so your own file restrictions apply. lighttpd+FCGI is quite powerfull and should give you a very nice and very fast option for running Django applications. Problems:

under heavy load some FCGI processes segfault. I first suspected the fcgi library, but after a bit of fiddling (core debugging) I found out it's actually the psycopg on my system that segfaults. So you might have more luck (unless you run Debian Sarge, too)
Performance behind a front apache isn't what I would have expected. A lighttpd with front apache and 5 backend FCGI processes only achieves 36 requests per second on my machine while the django-admin.py runserver achieves 45 requests per second! (still faster than mod_python via apache2: only 27 requests per second) Updates:
the separation of the two FCGI scripts didn't work right. Now I don't match only on the .fcgi extension but on the script name, that way /admin/ really uses the myproject-admin.fcgi and /polls/ really uses the myproject.fcgi.
I have [another document online][6] that goes into more details with regard to load distribution

flup: random Python WSGI stuff - a collection of WSGI server adapters for FCGI, SCGI and Apache Jakarta 1.3 protocols as well as a few WSGI middlewares for authentication, compression and error handling.

Leonardo is a CMS with blog and wiki modules in Python. Currently quite simple as CGI, but it should be migrated to WSGI and Paste and could then be quite interesting as general CMS components in a WSGI solution.

Python Paste is a meta-framework - a framework for creating new web frameworks based on WSGI. Many interesting middleware modules and a reimplementation of WebWare based on WSGI.

Seaside is a flexible and very interesting web framework in Smalltalk. I have already linked to tutorials about it, but not the framework itself - at least not at its new address. Runs on Squeak and Visual Works - and through their wide availability on almost everything that can be called a computer and has a TCP/IP connection to the outside world.

GNU Modula-2 was unknown to me until now. It's nice that Modula-2 is also found in the GNU compiler family. Even though Modula-2 only holds historical interest for me - I much prefer dynamic languages like Python. But there were times when I used to program diligently in Modula-2.

MochiKit is a JavaScript library with a whole range of extensions for JavaScript. Above all, iterators, sensible functional concepts (filter, map, partial application), but also a whole range of new ideas, such as a very nice AJAX integration. Looks quite nice, I have to play around with it.

Django Again

Django - the upcoming web framework for Python - now has SQLite 3 support. This makes setting up a development environment for Django projects extremely simple: you just need Python 2.3 or Python 2.4, SQLite3, and PySQLite2. On a Mac, everything is already there except PySQLite2, which you can get from www.pysqlite.org and install using sudo python setup.py install. And you're ready to start with Django and work through the tutorials. No Apache needed, no PostgreSQL (though it's the nicest of all SQL databases, it's sometimes overkill for a development environment on a notebook), and especially not psycopg - whose installation unfortunately requires almost a full PostgreSQL source tree. So there's no excuse for Pythonistas not to get involved with Django.

Apache modauthtkt is a framework for Single-Signon in Apache-based solutions across technology boundaries (CGI, mod_perl and whatever else exists). I should take a look at it, could be interesting for me.

Jython 2.2 in the works

The Jython website doesn't provide much information, but a few days ago, there was a post in the mailing list announcing a new alpha release for Jython 2.2 - and this time (it was already this far back at the end of 2004), it's one that actually works. Many features of the newer Python versions are included, such as generators/iterators. Therefore, it is not identical to Python 2.2, but rather a good step towards Python 2.3 in terms of features. Since the developer works with OS X and develops there, it is relatively easy to install.

For installation, as this is not mentioned anywhere explicitly:


java -jar [jython .version.elend.langer.name.jar]

Then a graphical installer appears that installs everything on the disk. Then, in the target directory, enter the following commands additionally:


chmod 755 jython
chmod 755 jythonc

Then the two (jython is the interpreter and jythonc is a compiler) are also callable and you can get started. When starting jython for the first time, a whole series of system packages are activated, so don't be surprised by the many messages from the sys-package-mgr.

For those who don't know Jython: it is a reimplementation of Python on the Java Virtual Machine. This allows all Java libraries to be used very elegantly, and the interactive shell of Jython allows you to play interactively with Java classes. Very nice to quickly try things out. But of course also very nice to have the portability of Java, but not the crazy language.

And it's just fun to do things like this:


Jython 2.2a1 on java1.4.2_07 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> import java.lang
>>> dir(java.lang.Number)
['byteValue', 'doubleValue', 'floatValue', 'intValue', 'longValue', 'shortValue']
>>> import java
>>> dir(java)
['__name__', 'applet', 'awt', 'beans', 'io', 'lang', 'math', 'net', 'nio', 'rmi', 'security', 'sql', 'text', 'util']
>>> ```

First Django Tutorials Online

The Django programmers start with the tutorials. The first tutorial primarily deals with creating the database model and the basic code for the objects to be managed, and the second tutorial deals with the automatically generated administration interface. Very nice, all of it.

The system is of course strongly focused on content creation and management - but still general enough so that it can also be used for differently structured content. The entire administration is automatically created from the object model and some hints, so it always aligns with the real data in the system. And the default look is also quite appealing.

Server integration is done simply via mod python - so via Apache. Which is also an advantage, as mod python offers very high performance right out of the box. And for more demanding cases, there's the caching in Django. I must say, what I've seen of Django so far, I like it very much.

An important note is missing in the installation instructions: Apache2 is mandatory, and therefore also ModPython in the corresponding version. However, Mac OS X only provides Apache 1.3, and many other servers also only have the 1.3 Apache available, so Django still has a real drawback here.

By the way, if you want to upgrade from Apache to Apache2 on Debian: if mod perl is in use, forget it. The mod perl2 for Apache2 in Debian Sarge is complete garbage - as if the API changes in mod perl2 compared to the old mod perl weren't annoying enough. In principle, you can no longer get Perl modules to run so easily with it.

Update: By the way, there is currently a lot of activity in the Subversion for Django to eliminate the requirement for Apache. A simple development server is already included, so in the future you will no longer need Apache for initial experiments. And you could also set up the deployment on other legs in the long run - for example, FCGI behind lighttpd.

Update 2: The third tutorial is out and deals with the view for the visitor. They have a pretty intense pace right now with Django.

Foundations of Python Network Programming is a relatively new book about network programming with Python. It covers all possible aspects of network programming you can think of - quite impressive the first impression. I know most of the things already from somewhere, but so compact in one book it is still nice to read. Together with Dive Into Python I would see the two as the ideal pair to learn Python.

HsShellScript is a Haskell library that allows you to solve typical shell script problems with Haskell. So functions for controlling processes and accessing system information etc. Looks very nice, but unfortunately cannot be compiled on OS X due to missing mntent.h.

mod_haskell has unfortunately not been developed further for years - it offers an integration of Hugs and ghc into the Apache server.

PerlPad is a service for Mac OS X that allows you to execute Perl code in any Cocoa text window and collect the output, or send selected text through a Perl script.

Regular Expressions in Haskell is an implementation of regular expressions entirely in Haskell.

Web Authoring System Haskell (WASH) is a collection of Haskell libraries (more precisely DSLs - domain specific languages - in Haskell) for programming web applications. It includes CGI-style programming, HTML generation, mail handling, and database drivers for PostgreSQL.

Django - new web framework for Python

Another web framework for Python, this time with the bold name Django. I am skeptical about yet another web framework - there are already plenty, and I must admit that I have contributed to one or another - but this one offers some interesting approaches.

On the one hand, it addresses similar solutions like Ruby on Rails - but does not mention Ruby on Rails at all. That's already positive; lately, one almost gets the impression that Python programmers are panicking because of ROR and think that everything must only be oriented towards it.

On the other hand, Django offers automatically generated backend pages. This is something I really like and what I find so nice about Zope, for example - you immediately have a way to play with the actual data, even before the actual frontend is ready. Very practical, especially in the initial development phase.

Some of the other ideas are also quite funny - for example, the mapping of URLs to handlers in the Python code via regular expressions. Reminds a bit of mod_rewrite in Apache (where, with such solutions, the question of prioritization of overlapping regular expressions always remains). And an integrated object-relational manager is not bad either, even if you can of course just as well fall back on finished solutions there. And the fact that the developers have already thought about the need for efficient cache systems and then rely on memcached is also nice - many projects die at some point from the load, simply because caching was not thought of in time.

The template language, however, looks a bit unusual and somehow I wonder why there must be almost as many of them as there are web frameworks.

Assign JavaScript Actions to CSS Selectors

Cool stuff: Behaviour is a JavaScript library that allows you to bind JavaScript actions to CSS selectors. The advantage: the actions disappear from the HTML code - making it much slimmer. And the actions can be adapted to new requirements at any time by changing the selectors.

In my first applications of Ajax, I stumbled upon exactly this problem: the JavaScript actions clutter the code that has just been painstakingly reduced to semantic HTML. Exactly what used to annoy me about all the table layouts now annoys me about the whole JavaScript thing. A clean separation of code, semantics, and style is exactly what I need. Actually, something like this should be part of the HTML standard.

I definitely need to try this out, because if it's usable in terms of performance, I should take a closer look at a few of the last Ajax actions and change them ...

Keith Devens - Weblog: I hate PHP - August 13, 2003 - he also doesn't like PHP

Kid is a rather interesting Python library that implements a template engine with a focus on well-formed XML. The result is similar to Zope Page Templates - so an attribute language for XML with Python integration. And it's also fast: an XML template on my machine achieves around 70 hits/sec.

http://n3dst4.com/articles/phpannoyances/ - he doesn't like PHP either.

Spyce is a Python web framework with damn good performance: a simple page with a template behind it delivers over 90 hits per second on my machine (Spyce integrated into Apache via mod_python, memory cache). Take that, PHP!

Spyced: Why PHP sucks - a rather good analysis of what is rather annoying about PHP.

Why PHP sucks - and yet another person who doesn't like PHP.

For those who don't feel comfortable with English as a language for introductory literature, there is an online German-language Haskell course to work through. It looks quite decent - although I find that a bit little is explained.

programmierung - 8.7.2005 - 7.8.2005