Every software has them - some skeletons in the closet that start to stink when you find them. Django unfortunately too. And that is the handling of Unicode. The automatically generated Admin in Django always sends XHTML and utf-8 to the browser. The browsers therefore also send utf-8 back. But there are browsers that use a slightly different format for the data to be sent in such cases - the so-called multipart format. This is used because it is the only guaranteed method in HTTP-POST where you can send a character set along.

Unfortunately, Django parses these multipart-POSTs with the email module from Python. This then diligently produces Unicode strings from the parts marked as utf-8. Which is correct in itself - only in the Django source code there are str() calls scattered everywhere in the source code. And these then naturally crash when they are given unicode with characters above chr(128) in them.

I took a look at the source code, the most realistic approach would be to make sure in Django that Unicode results are then converted back to utf-8, so that only normal Python strings are used internally. That works so far, but then there are still problems with some databases that recognize utf-8 content when saving and then produce Unicode again when reading the content - SQLite is such a database.

Well, this won't be easy to fix. I've already tried it, it's a pretty nasty topic and unfortunately not at all considered in Django - and therefore it crashes at all corners and edges. Let's see if I can't come up with something useful after all ...

What I also noticed: Django sends the Content-type only via a meta tag with http-equiv. That's a nasty hack, it would be much better if the Content-type were set correctly as a header, then nothing can go wrong if, for example, Apache wants to add a Default-Charset. And the browsers would also react in a much more reproducible way.

In any case, this is again the typical case of American programmers. They like to tell you that you just need to switch to Unicode and utf-8 when you report your character encoding problems, but I have never seen software from an American programmer that handled Unicode correctly ...

Otherwise, there are still one or two hinges in Django - especially annoying because they are not documented, but easy to solve: the default time zone in Django is America/Chicago. You just have to write a variable TIME_ZONE with 'Europe/Berlin' as value in your settings file and apply a small patch so that Django can handle the '-' as time zone separator. Oh man, when Americans write software ...

Somehow, my motivation to take a closer look at Ruby on Rails is increasing at the moment, after all, it was Danes who started it and they should at least get such things right (if only that nice automatic administration part of Django were not - that's exactly what I would be after. Why hasn't anyone built something like that for ROR, damn ...)

Update: I have attached a patch to the corresponding ticket for the Unicode problem (just scroll to the very bottom) that at least gets the problem somewhat under control - provided you don't use SQLite, because SQLite always returns Unicode strings and they then cause trouble again. But at least with PostgreSQL, umlauts now work in Django. The solution is not really perfect, but at least it can be brought in with only a little code change. A real solution would probably require larger code changes.

Another patch is attached to the ticket for the time zone problem, with the patch you can then also use TIME_ZONE = 'Europe/Berlin' to get the time specifications, for example in the change history, in the correct time zone.

In such moments you wish you had commit rights to Django, to be able to put such quite manageable patches in yourself

Another update: Adrian was in the chat yesterday and today and the problems with Unicode are largely gone. Only with SQLite there is still trouble, but I already have the patch finished. And the time zone issue is also fixed in the SVN. And he has started unit tests. Very useful, if you can then test the whole framework cleanly in the long run after a patch ...