texte

Generic Functions with Python

PEAK has been offering generic functions similar to CLOS for Python for quite some time. I always wanted to play around with it, but for a long time it was just part of PyProtocols, and the installation was a bit tricky. However, since September of this year, it has been decoupled and much easier to install. So I dove right in.

And I must say: wow. What Phillip J. Eby has accomplished is truly fantastic. The integration with Python (works from Python 2.3 - he even invented his own implementation of decorators for Python 2.3) is superb, even if, of course, some things take a bit of getting used to.

A small example:

import dispatch

[dispatch.generic()]
def anton(a,b):
 "handle two objects"

[anton.when('isinstance(a,int) and isinstance(b,int)')]
def anton(a,b):
 return a+b

[anton.when('isinstance(a,str) and isinstance(b,str)')]
def anton(a,b):
 return a+b

[anton.when('isinstance(a,str) and isinstance(b,int)')]
def anton(a,b):
 return a*b

[anton.when('isinstance(a,int) and isinstance(b,str)')]
def anton(a,b):
 return b*a

[anton.before('True')]
def anton(a,b):
 print type(a), type(b)

This small example simply provides a function called 'anton', which executes different code based on the parameter types. The example is of course completely nonsensical, but it shows some important properties of generic functions:

  • Generic functions are - unlike classic object/class methods - not bound to any classes or objects. Instead, they are selected based on their parameter types.
  • Parameter types must therefore be defined - this usually happens via a mini-language with which the selection conditions are formulated. This is also the only syntactic part that I don't like so much: the conditions are stored as strings. However, the integration is very good, and you get clean syntax errors already when loading.
  • A generic function can be overloaded with any conditions - not just the first parameter is decisive. Conditions can also make decisions based on values - any arbitrary Python expression can be used there.
  • With method combinations (methods are the concrete manifestations of a generic function here), you can modify a method before or after its call without touching the code itself. The example uses a before method that is always (hence the 'True') used to generate debugging output. Of course, you can also use conditions with before/after methods to attach to specific manifestations of the call of the generic function - making generic functions a full-fledged event system.

A pretty good article about RuleDispatch (the generic functions package) can be found at Developerworks.

The example, by the way, shows the Python 2.3 syntax for decorators. With Python 2.4, of course, the @ syntax can also be used. One disadvantage should not be kept secret: the definition of generic functions and their methods is not possible interactively - at least not with the Python 2.3 syntax. Unfortunately, you generally have to work with external definitions in files here.

RuleDispatch will definitely find a place in my toolbox - the syntax is simple enough, the possibilities, however, are gigantic. As an event system, it surpasses any other system in flexibility, and as a general way of structuring code, it comes very close to CLOS. It's a shame that Django will likely align with PyDispatch - in my opinion, RuleDispatch would fit much better (as many aspects in Django could be written as dispatch on multiple parameter types).

Blogcounter, Penis Size Comparisons, and Other Lies

Right now, people are once again wildly discussing hit counts and similar nonsense. Usually, I don't care about these (my server has an absurdly high free allowance that I can never use, and the server load is also low - so why should I care how much comes in?), but with the various announcements of hit counts, page views, and visits, I always have to smile a little.

Just as a small analysis of the whole story. First, the most important part: where do these numbers come from? Basically, there are two possibilities. One relies on the fact that pages contain a small element (e.g., an image - sometimes invisible - or a piece of JavaScript or an iframe - all commonly referred to as a web bug (web bug)) that is counted. The other method goes to the log files of the web server and evaluates them. There is a third one, where the individual visitor is identified via a cookie - but this is rather rarely used, except for some rather unpopular advertising systems.

Basically, there are only a few real numbers that such a system can really provide (with the exception of individualization via cookies): on the one hand, hits, on the other hand, megabytes and transfer. Quite remotely useful, there is also the number of different hosts (IP addresses) that have accessed the site.

But these numbers have a problem: they are purely technical. And thus strongly dependent on technology. Hits go up if you have many external elements. Bytes go up if you have many long pages (or large images or ...). IP addresses go down if many visitors are behind proxies. And they go up if you have many ISDN users - because of the dynamic dial-up addresses. Changes in the numbers are therefore due to both changes in visitors and changes in the pages.

All these numbers are as meaningful as the coffee grounds in the morning cup. That's why people derive other numbers from these - at least technically defined - numbers, which are supposed to say something. Here, the visits (visits to the website), the page impressions (accesses to real page addresses), and the visitors (different visitors) are to be mentioned.

Let's take the simplest number, which at least has a rudimentary connection to the real world: page impressions. There are different ways to get there. You can put the aforementioned web bugs on the pages that are to be counted. Thus, the number is about as reliable as the counting system. Unfortunately, the counting systems are absolutely not, but more on that in a moment. The alternative - going through the web server log files - is a bit better. Here, you simply count how many hits with the MIME type text/html (or whatever is used for your own pages) are delivered. You can also count .html - but many pages no longer have this in the addresses, the MIME type is more reliable.

Significance? Well, rather doubtful. Many users are forced through their providers via proxies - but a proxy has the property of helping to avoid hits. If a visitor has retrieved the page, it may (depending on the proxy configuration) be delivered to other visitors from the cache, not fetched from the server. This affects, for example, the entire AOL - the numbers are clearly distorted there. The more A-list-bloggerish the blogger really is, the more distorted the numbers often are (since cache hits can be more frequent than with less visited blogs).

In addition, browsers also do such things - cache pages. Or visitors do something else - reload pages. Proxies repeat some loading process automatically because the first one may not have gone through completely due to timeout - all of these are distortions of the numbers. Nevertheless, page impressions are still at least halfway usable. Unless you use web bugs.

Because web bugs have a general problem: they are not main pages. But embedded objects. Here, browsers often behave even more stubbornly - what is in the cache is displayed from the cache. Why fetch the little picture again? Of course, you can prevent this with suitable headers - nevertheless, it often goes wrong. JavaScript-based techniques completely bypass users without JavaScript (and believe me, there are significantly more of them than is commonly admitted). In the end, web bugs have the same problems as the actual pages, only a few additional, own problems. Why are they still used? Because it is the only way to have your statistics counted on a system other than your own. So indispensable for global length comparisons.

Well, let's leave page impressions and thus the area of rationality. Let's come to visits, and thus closely related to visitors. Visitors are mysterious beings on the web - you only see the accesses, but who it is and whether you know them, that is not visible. All the more important for marketing purposes, because everything that is nonsense and cannot be verified can be wonderfully exploited for marketing.

Visitors are only recognizable to a web browser via the IP of the access, plus the headers that the browser sends. Unfortunately, this is much more than one would like to admit - but (except for the cookie setters with individual user tracking) not enough for unique identification. Because users share IPs - every proxy will be counted as one IP. Users may use something like tor - and thus the IP is often different than the last time. Users share a computer in an Internet café - and thus it is actually not users, but computers that are assigned. There are headers that are set by caches with which assignments can be made - but if the users behind the cache all use only private IP addresses (the 10.x.x.x or 172.x.x.x or 192.168.x.x addresses that you know from relevant literature), this does not help either.

Visitors can still be assigned a bit if the period is short - but over days? Sorry, but in the age of dynamic IP addresses, that doesn't help at all. The visitors of today and those of tomorrow can be the same or different - no idea. Nevertheless, it is proudly announced how many visitors one had in a month. Of course, this no longer has any meaning. Even daily numbers are already strongly changed by dynamic dial-ups (not everyone uses a flat rate and has the same address for 24 hours).

But to add to the madness, not only the visitors are counted (allegedly), but also their visits. Yes, that's really exciting. Because what is a visit? Ok, recognizing a visitor again over a short period of time (with all the problems that proxies and the like bring about, of course) works quite well - and you also know exactly when a visit begins. Namely, with the first access. But when does it end? Because there is no such thing as ending a web visit (a logout). You just go away. Don't come back so quickly (if at all).

Yes, that's when it gets really creative. Do you just take the time intervals of the hits? Or - because visitors always read the content - do you calculate the time interval from when a hit is a new visit from the size of the last retrieved page document? How do you filter out regular refreshes? How do you deal with the above visitor counting problems?

Not at all. You just suck. On the fingers. Then a number comes out. Usually based on a time interval between hits - long pause, new visit. That's just counted. And it's added to a sum. Regardless of the fact that a visit may have been interrupted by a phone call - and therefore two visits were one visit, just with a pause. Regardless of the fact that users share computers or IP addresses - and thus a visit in reality was 10 interwoven visits.

Oh, yes, I know that some software uses the referrer headers of the browser to assign paths through the system and thus build clearer visits. Which of course no longer works smoothly if the user goes back with the back button or enters an address again without a referrer being produced. Or uses a personal firewall that partially filters referrers.

What is really cute is that all these numbers are thrown on the market without clear statements being made. Of course, sometimes it is said which service the numbers were determined via - but what does that say? Can the numbers be faked there? Does the operator count correctly (at blogcounter.de you can certainly fake the numbers in the simplest way) and does he count sensibly at all? Oh well, just take numbers.

The argument is often brought up that although the numbers cannot be compared directly as absolute numbers across counter boundaries, you can compare numbers from the same counter - companies are founded on this, which make money by renting out this coffee ground technology to others and thus realizing the great cross-border rankings. Until someone notices how the counters can be manipulated in a trivial way ...

It gets really cute when the numbers are brought into line with the time axis and things like average dwell time are derived from this and then, in combination with the page size, it is determined how many pages were read and how many were just clicked (based on the usual reading speed, such a thing is actually "evaluated" by some software).

So let's summarize: there is a limited framework of information that you can build on. These are hits (i.e., retrievals from the server), hosts (i.e., retrieving IP addresses), and amounts transferred (summing the bytes from the retrievals). In addition, there are auxiliary information such as e.g. referrers and possibly cookies. All numbers can be manipulated and falsified - and many are actually falsified by common Internet technologies (the most common case being caching proxies).

These rather unreliable numbers are chased through - partly non-public - algorithms and then mumbo jumbo is generated, which is used to show what a cool frood you are and where the towel hangs.

And I'm supposed to participate in such nonsense?

PS: According to the awstats evaluation, the author of this posting had 20,172 visitors, 39,213 visits, 112,034 page views in 224,402 accesses, and pushed 3.9 gigabytes over the line last month - which, as noted above, is completely irrelevant and meaningless, except that he might look for more sensible hobbies.

Living Data

Funny title, isn't it? Well, I just noticed something while dealing with web frameworks and other applications, specifically in the Ruby and Python environments. Namely, the way mini-data is stored and how configuration data is handled, for example.

In the Java environment, there is an inflation of XML mini-languages - mountains of dead data. Dead because this data only exists in XML format and can only be processed and modified using XML tools. For example, if I have constantly repeating or algorithmically describable configuration blocks (e.g., a mountain of quite similar-looking URL patterns for a web framework), I can only generate these using XML tools - e.g., generate them from simpler formats using XSLT. Or I write small tools for this.

In Ruby, the situation is similar - only that instead of XML, YAML is used here. Ultimately, however, this is not better - the configuration is still a dead file.

But both in the Python environment and in various other dynamic languages, there is a good alternative to this: just use a module in your programming language. For example, Python modules live - if the structure is complex but partially repetitive - simply write a small Python function that helps with the dynamic creation of the config. If the config should partially come from database contents - simply write a Python function that reads this data from the DB at runtime and mixes it into the config. Living configuration data, after all.

Of course, security issues come into play here - we don't want to repeat the PHP mistake with the eternal eval. What is urgently necessary for this would be a clean sandbox for such modules. Unfortunately, there is a massive hole in the implementation right there in Python. There were bytecode hacks in the past, which were also revived - but these are just hacks. The method of building a pseudo-sandbox using restricted imports and proxy objects, as Zope does, is also not the be-all and end-all.

Perl offers a very clean method here - as is usual for all security features in Perl, this is of course used by almost no project - to regulate down to the smallest detail what the code in such a sandbox is allowed to do - and thus a configuration via Perl module is definitely better secured than in languages without such a concept.

Java itself, of course, has a pretty sophisticated security management system - necessarily, as it is also supposed to run in browsers with very restricted rights. This security model is also usable for applications and could be used, for example, for servlets or Java configs - especially since you can also easily translate files at runtime and load them dynamically with Java. Now explain to me why the Java people are so fixated on XML when they have the best foundations for secure living data ...

We will ignore the safe model of PHP here, because it is a soda-or-seltzer model - either every code runs under safemode, or none at all. What we would need is a selective activation of different security classes for a single code block or module import (ok, PHP also doesn't have module imports, only includes - I say, we just ignore it).

So far, you can only work with living configurations in Python if you are sure that the configurations are only edited by users without malicious intent. Django, for example, only uses living configurations - it would therefore be a pretty stupid idea to make the configuration files editable via the web for centrally hosted applications.

We urgently need a clean sandbox for Python. I even believe that this would be a more important subproject than the various syntactic extensions that are repeatedly addressed.

Software Patents - Commentary in the NY Times

The NY Times asks why Bill Gates wants 3,000 new patents and finds a massive siege of the patent office with mountains of software patents, which are often just trivial patents (like the cited patent for adding/removing spaces in documents). The commentator makes a demand in the comment (after considering whether Microsoft should not simply have all the patents it already has revoked):

Perhaps that is going too far. Certainly, we should go through the lot and reinstate the occasional invention embodied in hardware. But patent protection for software? No. Not for Microsoft, nor for anyone else.

And this from the country that has had software patents for a long time and that is repeatedly cited by software patent proponents in the EU as a reason for a necessary worldwide harmonization.

No, software patents are also not popular there and not really useful. Dan Bricklin, known to some as the father of VisiCalc, also thinks so:

Mr. Bricklin, who has started several software companies and defensively acquired a few software patents along the way, says he, too, would cheer the abolition of software patents, which he sees as the bane of small software companies. "The number of patents you can run into with a small product is immense," he said. As for Microsoft's aggressive accumulation in recent years, he asked, "Isn't Microsoft the poster child of success without software patents?"

And why is Microsoft doing this now? The manager responsible gives a reason, as only a business administrator could come up with, it's that stupid:

"We realized we were underpatenting," Mr. Smith explained. The company had seen studies showing that other information technology companies filed about two patents for every $1 million spent on research and development. If Microsoft was spending $6 billion to $7.5 billion annually on its R&D, it would need to file at least 3,000 applications to keep up with the Joneses.

Ok, the idea of patent applications alone being oriented towards numbers from the industry is absurd, but how stupid do you have to be to draw a connection between the number of patents and revenue in the field of research and development?

The NY Times also draws a parallel to the pharmaceutical industry, which - at least according to its own statements - is happy to get a patent for a drug when it invests 20 million in research (which is already critical enough, as can be seen in the fight against AIDS in Africa).

And the fallout is also well summarized in the NY Times:

Last year at a public briefing, Kevin R. Johnson, Microsoft's group vice president for worldwide sales, spoke pointedly of "intellectual property risk" that corporate customers should take into account when comparing software vendors. On the one side, Microsoft has an overflowing war chest and bulging patent portfolio, ready to fight - or cross-license with - any plaintiff who accuses it of patent infringement. On the other are the open-source developers, without war chest, without patents of their own to use as bargaining chips and without the financial means to indemnify their customers.

The question of what Jefferson (the founder of the US patent system) would say about what is now being patented is quite justified. In his sense - which was actually more about protecting real inventive genius from exploitation by corporations - this is definitely not the case.

Writing a Simple Filesystem Browser with Django

Dieser Artikel ist mal wieder in Englisch, da er auch für die Leute auf #django interessant sein könnte. This posting will show how to build a very simple filesystem browser with Django. This filesystem browser behaves mostly like a static webserver that allows directory traversal. The only speciality is that you can use the Django admin to define filesystems that are mounted into the namespace of the Django server. This is just to demonstrate how a Django application can make use of different data sources besides the database, it's not really meant to serve static content (although with added authentication it could come in quite handy for restricted static content!).

Even though the application makes very simple security checks on passed in filenames, you shouldn't run this on a public server - I didn't do any security tests and there might be buttloads of bad things in there that might expose your private data to the world. You have been warned. We start as usual by creating the filesystems application with the django-admin.py startapp filesystems command. Just do it like you did with your polls application in the first tutorial. Just as an orientation, this is how the myproject directory does look like on my development machine:


.
|-- apps
| |-- filesystems
| | |-- models
| | |-- urls
| | `-- views
| `-- polls
| |-- models
| |-- urls
| `-- views
|-- public_html
| `-- admin_media
| |-- css
| |-- img
| | `-- admin
| `-- js
| `-- admin
|-- settings
| `-- urls
`-- templates
 `-- filesystems

After creating the infrastructure, we start by building the model. The model for the filesystems is very simple - just a name for the filesystem and a path where the files are actually stored. So here it is, the model:


 from django.core import meta

class Filesystem(meta.Model):

fields = ( meta.CharField('name', 'Name', maxlength=64), meta.CharField('path', 'Path', maxlength=200), )

def repr(self): return self.name

def get_absolute_url(self): return '/files/%s/' % self.name

def isdir(self, path): import os p = os.path.realpath(os.path.join(self.path, path)) if not p.startswith(self.path): raise ValueError(path) return os.path.isdir(p)

def files(self, path=''): import os import mimetypes p = os.path.realpath(os.path.join(self.path, path)) if not p.startswith(self.path): raise ValueError(path) l = os.listdir(p) if path: l.insert(0, '..') return [(f, os.path.isdir(os.path.join(p, f)), mimetypes.guess_type(f)[0] or 'application/octetstream') for f in l]

def file(self, path): import os import mimetypes p = os.path.realpath(os.path.join(self.path, path)) if p.startswith(self.path): (t, e) = mimetypes.guess_type(p) return (p, t or 'application/octetstream') else: raise ValueError(path)

admin = meta.Admin( fields = ( (None, {'fields': ('name', 'path')}), ), list_display = ('name', 'path'), search_fields = ('name', 'path'), ordering = ['name'], )


As you can see, the model and the admin is rather boring. What is interesting, though, are the additional methods isdir , files and file . isdir just checks wether a given path below the filesystem is a directory or not. files returns the files of the given path below the filesystems base path and file returns the real pathname and the mimetype of a given file below the filesystems base path. All three methods check for validity of the passed in path - if the resulting path isn't below the filesystems base path, a ValueError is thrown. This is to make sure that nobody uses .. in the path name to break out of the defined filesystem area. So the model includes special methods you can use to access the filesystems content itself, without caring for how to do that in your views. It's job of the model to know about such stuff.

The next part of your little filesystem browser will be the URL configuration. It's rather simple, it consists of the line in settings/urls/main.py and the myproject.apps.filesystems.urls.filesystems module. Fist the line in the main urls module:


 from django.conf.urls.defaults import *

urlpatterns = patterns('',
 (r'^files/', include('myproject.apps.filesystems.urls.filesystems')),
 )

Next the filesystems own urls module:


 from django.conf.urls.defaults import *

urlpatterns = patterns('myproject.apps.filesystems.views.filesystems',
 (r'^$', 'index'),
 (r'^(?P<filesystem_name>.*?)/(?P<path>.*)$', 'directory'),
 )

You can now add the application to the main settings file so you don't forget to do that later on. Just look for the INSTALLED_APPS setting and add the filebrowser:


 INSTALLED_APPS = (
 'myproject.apps.polls',
 'myproject.apps.filesystems'
 )

One part is still missing: the views. This module defines the externally reachable methods we defined in the urlmapper. So we need two methods, index and directory . The second one actually doesn't work only with directories - if it get's passed a file, it just presents the contents of that file with the right mimetype. The view makes use of the methods defined in the model to access actual filesystem contents. Here is the source for the views module:


 from django.core import template_loader
 from django.core.extensions import DjangoContext as Context
 from django.core.exceptions import Http404
 from django.models.filesystems import filesystems
 from django.utils.httpwrappers import HttpResponse

def index(request):
 fslist = filesystems.getlist(orderby=['name'])
 t = templateloader.gettemplate('filesystems/index')
 c = Context(request, {
 'fslist': fslist,
 })
 return HttpResponse(t.render(c))

def directory(request, filesystem_name, path):
 import os
 try:
 fs = filesystems.getobject(name exact=filesystemname)
 if fs.isdir(path):
 files = fs.files(path)
 tpl = templateloader.gettemplate('filesystems/directory')
 c = Context(request, {
 'dlist': [f for (f, d, t) in files if d],
 'flist': [{'name':f, 'type':t} for (f, d, t) in files if not d],
 'path': path,
 'fs': fs,
 })
 return HttpResponse(tpl.render(c))
 else:
 (f, mimetype) = fs.file(path)
 return HttpResponse(open(f).read(), mimetype=mimetype)
 except ValueError: raise Http404
 except filesystems.FilesystemDoesNotExist: raise Http404
 except IOError: raise Http404

See how the elements of the directory pattern are passed in as parameters to the directory method - the filesystem name is used to find the right filesystem and the path is used to access content below that filesystems base path. Mimetypes are discovered using the mimetypes module from the python distribution, btw.

The last part of our little tutorial are the templates. We need two templates - one for the index of the defined filesystems and one for the content of some path below some filesystem. We don't need a template for the files content - file content is delivered raw. So first the main index template:


{% if fslist %}
<h1>defined filesystems</h1> <ul> {% for fs in fslist %}
<li><a href="{{ fs.get_absolute_url }}">{{ fs.name }}</a></li> {% endfor %}
</ul> {% else %}
<p>Sorry, no filesystems have been defined.</p> {% endif %}

The other template is the directory template that shows contents of a path below the filesystems base path:


 {% if dlist or flist %}
 <h1>Files in //{{ fs.name }}/{{ path }}</h1> <ul> {% for d in dlist %}
 <li> <a href="{{ fs.getabsoluteurl }}{{ path }}{{ d }}/" >{{ d }}</a> </li> {% endfor %}
 {% for f in flist %}
 <li> <a href="{{ fs.getabsoluteurl }}{{ path }}{{ f.name }}" >{{ f.name }}</a> ({{ f.type }})</li> {% endfor %}
 </ul> {% endif %}

Both templates need to be stored somewhere in your TEMPLATE PATH. I have set up a path in the TEMPLATE PATH with the name of the application: filesystems . In there I stored the files as index.html and directory.html . Of course you normally would build a base template for the site and extend that in your normal templates. And you would add a 404.html to handle 404 errors. But that's left as an exercise to the reader.After you start up your development server for your admin (don't forget to set DJANGO SETTINGS MODULE accordingly!) you can add a filesystem to your database (you did do django-admin.py install filesystems sometime in between? No? Do it now, before you start your server). Now stop the admin server, change your DJANGO SETTINGS MODULE and start the main settings server. Now you can surf to http://localhost:8000/files/(at least if you did set up your URLs and server like I do) and browse the files in your filesystem. That's it. Wasn't very complicated, right? Django is really simple to use

Django, lighttpd and FCGI, second take

In my first take at this stuff I gave a sample on how to run django projects behind lighttpd with simple FCGI scripts integrated with the server. I will elaborate a bit on this stuff, with a way to combine lighttpd and Django that gives much more flexibility in distributing Django applications over machines. This is especially important if you expect high loads on your servers. Of course you should make use of the Django caching middleware, but there are times when even that is not enough and the only solution is to throw more hardware at the problem.

Update: I maintain my descriptions now in my trac system. See the lighty+FCGI description for Django.

Caveat: since Django is very new software, I don't have production experiences with it. So this is more from a theoretical standpoint, incorporating knowledge I gained with running production systems for several larger portals. In the end it doesn't matter much what your software is - it only matters how you can distribute it over your server farm.

To follow this documentation, you will need the following packages and files installed on your system:

  • [Django][2] itself - currently fetched from SVN. Follow the setup instructions or use python setup.py install .
  • [Flup][3] - a package of different ways to run WSGI applications. I use the threaded WSGIServer in this documentation.
  • [lighttpd][4] itself of course. You need to compile at least the fastcgi, the rewrite and the accesslog module, usually they are compiled with the system.
  • [Eunuchs][5] - only needed if you are using Python 2.3, because Flup uses socketpair in the preforked servers and that is only available starting with Python 2.4
  • [django-fcgi.py][6] - my FCGI server script, might some day be part of the Django distribution, but for now just fetch it here. Put this script somewhere in your $PATH, for example /usr/local/bin and make it executable.
  • If the above doesn't work for any reason (maybe your system doesn't support socketpair and so can't use the preforked server), you can fetch [django-fcgi-threaded.py][7] - an alternative that uses the threading server with all it's problems. I use it for example on Mac OS X for development.

Before we start, let's talk a bit about server architecture, python and heavy load. The still preferred Installation of Django is behind Apache2 with mod python2. mod python2 is a quite powerfull extension to Apache that integrates a full Python interpreter (or even many interpreters with distinguished namespaces) into the Apache process. This allows Python to control many aspects of the server. But it has a drawback: if the only use is to pass on requests from users to the application, it's quite an overkill: every Apache process or thread will incorporate a full python interpreter with stack, heap and all loaded modules. Apache processes get a bit fat that way.

Another drawback: Apache is one of the most flexible servers out there, but it's a resource hog when compared to small servers like lighttpd. And - due to the architecture of Apache modules - mod_python will run the full application in the security context of the web server. Two things you don't often like with production environments.

So a natural approach is to use lighter HTTP servers and put your application behind those - using the HTTP server itself only for media serving, and using FastCGI to pass on requests from the user to your application. Sometimes you put that small HTTP server behind an Apache front that only uses mod proxy (either directly or via mod rewrite) to proxy requests to your applications webserver - and believe it or not, this is actually a lot faster than serving the application with Apache directly!

The second pitfall is Python itself. Python has a quite nice threading library. So it would be ideal to build your application as a threaded server - because threads use much less resources than processes. But this will bite you, because of one special feature of Python: the GIL. The dreaded global interpreter lock. This isn't an issue if your application is 100% Python - the GIL only kicks in when internal functions are used, or when C extensions are used. Too bad that allmost all DBAPI libraries use at least some database client code that makes use of a C extension - you start a SQL command and the threading will be disabled until the call returns. No multiple queries running ...

So the better option is to use some forking server, because that way the GIL won't kick in. This allows a forking server to make efficient use of multiple processors in your machine - and so be much faster in the long run, despite the overhead of processes vs. threads.

For this documentation I take a three-layer-approach for distributing the software: the front will be your trusted Apache, just proxying all stuff out to your project specific lighttpd. The lighttpd will have access to your projects document root and wil pass on special requests to your FCGI server. The FCGI server itself will be able to run on a different machine, if that's needed for load distribution. It will use a preforked server because of the threading problem in Python and will be able to make use of multiprocessor machines.

I won't talk much about the first layer, because you can easily set that up yourself. Just proxy stuff out to the machine where your lighttpd is running (in my case usually the Apache runs on different machines than the applications). Look it up in the mod_proxy documentation, usually it's just ProxyPass and ProxyPassReverse.

The second layer is more interesting. lighttpd is a bit weird in the configuration of FCGI stuff - you need FCGI scripts in the filesystem and need to hook those up to your FCGI server process. The FCGI scripts actually don't need to contain any content - they just need to be in the file system.

So we start with your Django project directory. Just put a directory public html in there. That's the place where you put your media files, for example the admin media directory. This directory will be the document root for your project server. Be sure only to put files in there that don't contain private data - private data like configs and modules better stay in places not accessible by the webserver. Next set up a lighttpd config file. You only will use the rewrite and the fastcgi modules. No need to keep an access log, that one will be written by your first layer, your apache server. In my case the project is in /home/gb/work/myproject - you will need to change that to your own situation. Store the following content as /home/gb/work/myproject/lighttpd.conf


 server.modules = ( "mod_rewrite", "mod_fastcgi" )
 server.document-root = "/home/gb/work/myproject/public_html"
 server.indexfiles = ( "index.html", "index.htm" )
 server.port = 8000
 server.bind = "127.0.0.1"
 server.errorlog = "/home/gb/work/myproject/error.log"

fastcgi.server = (
"/main.fcgi" => (
"main" => (
"socket" => "/home/gb/work/myproject/main.socket"
 )
 ),
"/admin.fcgi" => (
"admin" => (
"socket" => "/home/gb/work/myproject/admin.socket"
 )
 )
 )

url.rewrite = (
"^(/admin/.*)$" => "/admin.fcgi$1",
"^(/polls/.*)$" => "/main.fcgi$1"
 )

mimetype.assign = (
".pdf" => "application/pdf",
".sig" => "application/pgp-signature",
".spl" => "application/futuresplash",
".class" => "application/octet-stream",
".ps" => "application/postscript",
".torrent" => "application/x-bittorrent",
".dvi" => "application/x-dvi",
".gz" => "application/x-gzip",
".pac" => "application/x-ns-proxy-autoconfig",
".swf" => "application/x-shockwave-flash",
".tar.gz" => "application/x-tgz",
".tgz" => "application/x-tgz",
".tar" => "application/x-tar",
".zip" => "application/zip",
".mp3" => "audio/mpeg",
".m3u" => "audio/x-mpegurl",
".wma" => "audio/x-ms-wma",
".wax" => "audio/x-ms-wax",
".ogg" => "audio/x-wav",
".wav" => "audio/x-wav",
".gif" => "image/gif",
".jpg" => "image/jpeg",
".jpeg" => "image/jpeg",
".png" => "image/png",
".xbm" => "image/x-xbitmap",
".xpm" => "image/x-xpixmap",
".xwd" => "image/x-xwindowdump",
".css" => "text/css",
".html" => "text/html",
".htm" => "text/html",
".js" => "text/javascript",
".asc" => "text/plain",
".c" => "text/plain",
".conf" => "text/plain",
".text" => "text/plain",
".txt" => "text/plain",
".dtd" => "text/xml",
".xml" => "text/xml",
".mpeg" => "video/mpeg",
".mpg" => "video/mpeg",
".mov" => "video/quicktime",
".qt" => "video/quicktime",
".avi" => "video/x-msvideo",
".asf" => "video/x-ms-asf",
".asx" => "video/x-ms-asf",
".wmv" => "video/x-ms-wmv"
 )

I bind the lighttpd only to the localhost interface because in my test setting the lighttpd runs on the same host as the Apache server. In multi server settings you will bind to the public interface of your lighttpd servers, of course. The FCGI scripts communicate via sockets in this setting, because in this test setting I only use one server for everything. If your machines would be distributed, you would use the "host" and "port" settings instead of the "socket" setting to connect to FCGI servers on different machines. And you would add multiple entries for the "main" stuff, to distribute the load of the application over several machines. Look it up in the lighttpd documentation what options you will have.

I set up two FCGI servers for this - one for the admin settings and one for the main settings. All applications will be redirected through the main settings FCGI and all admin requests will be routed to the admin server. That's done with the two rewrite rules - you will need to add a rewrite rule for every application you are using.

Since lighttpd needs the FCGI scripts to exist to pass along the PATH_INFO to the FastCGI, you will need to touch the following files: /home/gb/work/myprojectg/public_html/admin.fcgi ``/home/gb/work/myprojectg/public_html/main.fcgi

They don't need to contain any code, they just need to be listed in the directory. Starting with lighttpd 1.3.16 (at the time of this writing only in svn) you will be able to run without the stub files for the .fcgi - you just add "check-local" => "disable" to the two FCGI settings. Then the local files are not needed. So if you want to extend this config file, you just have to keep some very basic rules in mind:

  • every settings file needs it's own .fcgi handler
  • every .fcgi needs to be touched in the filesystem - this might go away in a future version of lighttpd, but for now it is needed
  • load distribution is done on .fcgi level - add multiple servers or sockets to distribute the load over several FCGI servers
  • every application needs a rewrite rule that connects the application with the .fcgi handler

Now we have to start the FCGI servers. That's actually quite simple, just use the provided django-fcgi.py script as follows:


 django-fcgi.py --settings=myproject.work.main
 --socket=/home/gb/work/myproject/main.socket
 --minspare=5 --maxspare=10 --maxchildren=100
 --daemon

django-fcgi.py --settings=myproject.work.admin
 --socket=/home/gb/work/myproject/admin.socket
 --maxspare=2 --daemon

Those two commands will start two FCGI server processes that use the given sockets to communicate. The admin server will only use two processes - this is because often the admin server isn't the server with the many hits, that's the main server. So the main server get's a higher-than-default setting for spare processes and maximum child processes. Of course this is just an example - tune it to your needs.

The last step is to start your lighttpd with your configuration file: lighttpd -f /home/gb/work/myproject/lighttpd.conf

That's it. If you now access either the lighttpd directly at http://localhost:8000/polls/ or through your front apache, you should see your application output. At least if everything went right and I didn't make too much errors.

Running Django with FCGI and lighttpd

Diese Dokumentation ist für einen grösseren Kreis als nur .de gedacht, daher das ganze in Neuwestfälisch Englisch. Sorry. Update: I maintain the actually descriptions now in my trac system. See the FCGI+lighty description for Django. There are different ways to run Django on your machine. One way is only for development: use the django-admin.py runserver command as documented in the tutorial. The builtin server isn't good for production use, though. The other option is running it with mod_python. This is currently the preferred method to run Django. This posting is here to document a third way: running Django behind lighttpd with FCGI.

First you need to install the needed packages. Fetch them from their respective download address and install them or use preinstalled packages if your system provides those. You will need the following stuff:

  • [Django][2] itself - currently fetched from SVN. Follow the setup instructions or use python setup.py install .
  • [Flup][3] - a package of different ways to run WSGI applications. I use the threaded WSGIServer in this documentation.
  • [lighttpd][4] itself of course. You need to compile at least the fastcgi, the rewrite and the accesslog module, usually they are compiled with the system.

First after installing ligthttpd you need to create a lighttpd config file. The configfile given here is tailored after my own paths - you will need to change them to your own situation. This config file activates a server on port 8000 on localhost - just like the runserver command would do. But this server is a production quality server with multiple FCGI processes spawned and a very fast media delivery.


 # lighttpd configuration file
 #
 ############ Options you really have to take care of ####################

server.modules = ( "mod_rewrite", "mod_fastcgi", "mod_accesslog" )

server.document-root = "/home/gb/public_html/"
 server.indexfiles = ( "index.html", "index.htm", "default.htm" )

 these settings attch the server to the same ip and port as runserver would do

server.errorlog = "/home/gb/log/lighttpd-error.log"
 accesslog.filename = "/home/gb/log/lighttpd-access.log"

fastcgi.server = (
"/myproject-admin.fcgi" => (
"admin" => (
"socket" => "/tmp/myproject-admin.socket",
"bin-path" => "/home/gb/public_html/myproject-admin.fcgi",
"min-procs" => 1,
"max-procs" => 1
 )
 ),
"/myproject.fcgi" => (
"polls" => (
"socket" => "/tmp/myproject.socket",
"bin-path" => "/home/gb/public_html/myproject.fcgi"
 )
 )
 )

url.rewrite = (
"^(/admin/.*)$" => "/myproject-admin.fcgi$1",
"^(/polls/.*)$" => "/myproject.fcgi$1"
 )

This config file will start only one FCGI handler for your admin stuff and the default number of handlers (each one multithreaded!) for your own site. You can finetune these settings with the usual ligthttpd FCGI settings, even make use of external FCGI spawning and offloading of FCGI processes to a distributed FCGI cluster! Admin media files need to go into your lighttpd document root.

The config works by translating all standard URLs to be handled by the FCGI script for each settings file - to add more applications to the system you would only duplicate the rewrite rule for the /polls/ line and change that to choices or whatever your module is named. The next step would be to create the .fcgi scripts. Here are the two I am using:


 #!/bin/sh
 # this is myproject.fcgi - put it into your docroot

export DJANGOSETTINGSMODULE=myprojects.settings.main

/home/gb/bin/django-fcgi.py

 #!/bin/sh
 # this is myproject-admin.fcgi - put it into your docroot

export DJANGOSETTINGSMODULE=myprojects.settings.admin

/home/gb/bin/django-fcgi.py

These two files only make use of a django-fcgi.py script. This is not part of the Django distribution (not yet - maybe they will incorporate it) and it's source is given here:


 #!/usr/bin/python2.3

def main():
 from flup.server.fcgi import WSGIServer
 from django.core.handlers.wsgi import WSGIHandler
 WSGIServer(WSGIHandler()).run()

if name == 'main':
 main()

As you can see it's rather simple. It uses the threaded WSGIServer from the fcgi-module, but you could as easily use the forked server - but as the lighttpd already does preforking, I think there isn't much use with forking at the FCGI level. This script should be somewhere in your path or just reference it with fully qualified path as I do. Now you have all parts togehter. I put my lighttpd config into /home/gb/etc/lighttpd.conf , the .fcgi scripts into /home/gb/public_html and the django-fcgi.py into /home/gb/bin . Then I can start the whole mess with /usr/local/sbin/lighttpd -f etc/lighttpd.conf . This starts the server, preforkes all FCGI handlers and detaches from the tty to become a proper daemon. The nice thing: this will not run under some special system account but under your normal user account, so your own file restrictions apply. lighttpd+FCGI is quite powerfull and should give you a very nice and very fast option for running Django applications. Problems:

  • under heavy load some FCGI processes segfault. I first suspected the fcgi library, but after a bit of fiddling (core debugging) I found out it's actually the psycopg on my system that segfaults. So you might have more luck (unless you run Debian Sarge, too)

  • Performance behind a front apache isn't what I would have expected. A lighttpd with front apache and 5 backend FCGI processes only achieves 36 requests per second on my machine while the django-admin.py runserver achieves 45 requests per second! (still faster than mod_python via apache2: only 27 requests per second) Updates:

  • the separation of the two FCGI scripts didn't work right. Now I don't match only on the .fcgi extension but on the script name, that way /admin/ really uses the myproject-admin.fcgi and /polls/ really uses the myproject.fcgi.

  • I have [another document online][6] that goes into more details with regard to load distribution

Pass-Chips and their possible misuse

Owl Content

A bit older, but still interesting: Biometrics/BSI Lecture Program at CeBIT 2005. Particularly interesting are the statements about the authorization of the passport chip readers:

The ICAO standard suggests an optional passive authentication mechanism against unauthorized reading (Basic Access Control). Kügler estimated its effectiveness as only minor. However, Basic Access Control would be suitable for the facial image, as this involves only weakly sensitive data.

This is the part currently being discussed regarding the passport - the authentication of the reader by the passport via the data of the machine-readable zone. This method is not protected against copying the key - once it is determined, it can be used to identify a passport. Even from a greater distance.

The contactless chip in the passport according to ISO 14443 will (naturally) be machine-readable and digitally signed as well as contain the biometric data. As the reading distance, Kügler mentioned a few centimeters, but pointed out that with current technology, reading from several meters away is possible. To ensure copy protection, the RFID chip should actively authenticate itself using an individual key pair, which is also signed.

Important here: the copy protection is handled by an active two-way authentication. A passport could therefore only be read with a stored key if it is actively involved. The keys then transmitted are so to speak bound to the respective communication - because both the passport and the reader would have their own key pair. This makes attacks via sniffing of the authentication significantly more complicated, as two key pairs must be cracked to do something with the data. Unfortunately, however, only the basic procedure is currently planned, i.e., only the keys per reader. And it gets worse:

Kügler rated the fingerprint as a highly sensitive feature. Therefore, access protection must be ensured by an active authentication mechanism (Extended Access Control). This was not defined in the ICAO standard and is therefore only usable for national purposes or on a bilateral basis.

Otto Orwell dreams of storing fingerprints - the procedure for how these must be secured is not yet defined and standardized. Such storage would therefore not be usable across the board. It is also important to ensure that only authorized devices are allowed to read. To this end, all readers would receive a key pair, which must be signed by a central authority. Anyone who has ever dealt with a certification authority knows that there must inevitably be a revocation list - a way to withdraw certificates. This is especially important for passport readers if, for example, they are stolen (don't laugh, devices also disappear at border facilities - hey, entire X-ray gates have been stolen from airports). Unfortunately, the experts see it differently:

In the subsequent short discussion, the question was asked whether a mechanism is provided to revoke the keys of the readers. Kügler indicated that this is not the case so far. However, it is currently under discussion to limit the validity of the keys temporally, but this has not yet been decided.

Hello? So there is no way to revoke a device's key. And there is - currently - no expiration of a key. If someone gains access to a reader, they have the key of the device and its technology at their disposal to read every passport in the vicinity. Without the possibility of getting rid of a device used improperly. This is like a computer system where there is no way to change the password and no way to delete a user - even in case of proven misconduct.

And once again, the extended check (and this key technology plus certificate in the reader is probably only intended for this) is only a proposal (which may not even be implemented due to the lack of interest of the Americans in the whole thing):

Kügler then described the BSI's proposal regarding Extended Access Control. According to this, an asymmetric key pair with a corresponding, verifiable certificate is generated for each reader (authorization only per reader). Therefore, the chip must be able to provide computing power for Extended Access Control. [...] Within the EU, access protection by Extended Access Control is currently only to be seen as a proposal, said Kügler. Another (unnamed) BSI colleague agreed with him and added that the Americans do not demand a fingerprint as a biometric feature on the chip at all, but rather the digital facial image would suffice for them. Only within America is a digital recording of the fingerprint planned. For this reason, the technical implementation of Extended Access Control is not urgent.

Only in this proposal is it provided that the devices receive unique key pairs and certificates based on them. Why is all this so critical now? Well, the discussion constantly focuses only on the data and the reading of the data - but these are not even that critical. Because even the stored fingerprints are not the complete fingerprints for reconstruction, but only the relevant characteristics for re-identification (although the discussion is still ongoing as to whether these stored characteristics are really unique - especially in the global context we are talking about - or whether more data does not need to be stored than in a purely national approach).

But what is always possible when we talk about such passports: the authentication and identification of a person. A two-way authentication can alone as authentication already say who is near me. If, for example, I have stored a key of a passport for the simplified procedure, I can then determine at any time without contact whether this passport is nearby - of course only within the framework of the security of the cryptographic algorithms, but that would already be a fairly secure confirmation, because it would be a pretty failure of the whole procedure if two passports with the same key allow an authentication and this has hopefully been excluded by the developers.

I can therefore obtain the keys of persons - for the simplified procedure, the machine-readable line of the passport is sufficient for this - for example, simply through simple mechanical means such as burglary, pickpocketing, social engineering, etc. - and store them. I can then feed a reader with this that, for example, in a defined area simply checks several passport data that interest me when passing through a gate - for example, a revolving door with a predefined speed is very practical for this. Only the passport with the corresponding data in the machine-readable zone will release its data for this, or provide confirmation of the authentication.

I could therefore, for example, determine when a person enters and leaves a building - without the knowledge of that person and fully automatically. With an authentication time of 5 seconds, you can already check several keys while someone walks through the revolving door.

Of course, this is still not the identification of the person - but only of the passport. But especially when the person being monitored does not know about the monitoring, the passport is worn by the person. There is no reason not to have the passport with you. And abroad, it is often a bad idea not to have your passport with you - so it is compulsorily near the person in these cases.

Well, but according to Otto Orwell, all this is just scaremongering and anyway not true and completely wrong. Unfortunately, it is based on statements by employees of the BSI - who are basically his people.

Zum Abschuss freigegeben

In der Zeit : Zum Abschuss freigegeben, ein Dossier über die Opfer der Aufmerksamsgeilheit ala Raab und Bild ...

Das grosse Problem das ich dabei sehe sind nicht mal nur die Bildzeitung und Raab und ähnlicher Medienmüll - das grosse Problem ist die Akzeptanz mit der dieser Mist konsumiert wird. Nach Monaten weiss man nicht mehr wo man etwas gelesen oder gehört hat - und trägt so als Vektor zur Verbreitung des Schwachsinns mit bei.

Wenn ich mir dann vorstelle das der Springerverlag sich die Pro7/Sat.1-Gruppe unter den Nagel reissen will und damit dann wohl dem nächst Bildzeitung und Raab gemeinsam an einem Strick ziehen, wird mir übel ...

Eine demokratische Gesellschaft lebt unter anderem von der Meinungsvielfalt die sich auch in Medienvielfalt niederschlagen muss. Wird die Medienlandschaft aber medienübergreifend von einem Konzern mit klarer politischer Agenda (wer das bezweifelt kann sich ja mal die Berichterstattung der Bildzeitung zur Zeit der letzten Bürgerschaftswahl in Hamburg angucken - bitte Speibeutel bereithalten sonst triffts die Tastatur) dominiert wird, geht ein wichtiger Faktor Demokratie verloren.

Und so bildet sich ein übler Schulterschluss aus Wirtschaftsverbänden und einer Medienkultur bei der man das Wort Kultur nicht mehr in den Mund nehmen mag - und ufert in Hetze gegen Kranke, Arbeitslose, Ausländer und linke Politiker aus, die schon arg an Zeiten erinnert die man eigentlich als vorbei wähnte ...

Off to the police state

Owl Content

German cabinet approves bill to expand DNA analysis:

... DNA analyses of individuals may in future also be stored if they have committed only minor offenses such as property damage or trespassing, or if it is expected that they will commit such offenses in the future. Furthermore, investigators will be granted the right to order DNA analyses in an expedited procedure without a judge having to approve them.

You participate in a demo that someone doesn't like? No problem, your data will be recorded and filed. Trespassing at a demo can happen quickly, property damage can be quickly attributed to you, and if you don't need to ask a judge, you can also move much faster. And so, a small and fine DNA database of all those unpleasant subjects will quickly be collected that a state really doesn't need - namely people who engage publicly and speak up.

What, civil rights are left behind in the process? Forget it, it doesn't interest Otto Orwell nor the combined incompetence in the Ministry of Justice.

Oh, and who believes that I am only paranoid, here is the case example cited by the Ministry of Justice:

A has been convicted because he repeatedly scratched the paint of motor vehicles with a screwdriver. The prognosis is that corresponding criminal offenses are also to be expected from him in the future.

Yes, you are a wheelchair user and you are upset about the idiotically parked drivers and have scratched the paint of one? Hey, you are still in a wheelchair and we simply assume that you will continue to get upset about the idiotic drivers - so off to the DNA file with the murderers, terrorists, and sex offenders. After all, you are at least as threatening to society as they are.

What kind of shit is this red/green puppet theater in Berlin getting us into. It is absolutely unbelievable.

angry face

And if you think it would be better with the Union:

... on the other hand, the proposed amendment to the DNA analysis by the CDU is by no means sufficient. "The bill is a step in the right direction. It is too short," said the deputy chairman of the Union faction, Wolfgang Bosbach. The Union will further tighten the existing legal situation in the event of an election victory, explained the interior and legal politician. There is no right for offenders to remain anonymous.

Who spontaneously thinks of recording every striking worker there is probably on the right track according to their idea ...

And all this from people who, under the guise of neo-liberalism, have written a reduction of the state to its core functions on their banner - and see surveillance, exploitation, and harassment of citizens as core functions.

We are moving straight towards something that can no longer be associated with a democratic society and a rule of law.

How FileVault works

As a follow-up to the previous entry about the problems with backing up FileVaults from an active FileVault account, I took a closer look at what Apple actually does for FileVault. I'm not particularly enthusiastic about the approach.

First of all, a FileVault is nothing more than a so-called Sparse Image - a disk image in which only the actually used blocks are stored. So if it is empty, it doesn't matter how large it was dimensioned - it only takes up a little disk space. With the stored data, this image grows and you can have it cleaned up - in the process, the data blocks that have become free (e.g. through deletions) are also released again in the Sparse Image, so the image then shrinks. Additionally, encryption is enabled for the FileVault images. The shrinking happens semi-automatically when logging out: the system asks the user if it may. If the user agrees, it is cleaned up. But this is only the mechanism of how the files are stored - namely as an HFS+ volume in a special file. But how is it automatically opened at login and how is it ensured that programs find the data in the right places where they look for it? For this, the FileVault image must be mounted. In principle, the process is the same as when double-clicking on an image file - the file is mounted as a drive and is available in the list of drives in the Finder and on the desktop. However, for FileVault images, the desktop icon is suppressed. Instead of the desktop icon and mounting to /Volumes/ as is usually the case, mounting a FileVault image is somewhat modified. And that is, a FileVault image is usually located in the user directory of a user as a single file. So for a logged-out user hugo, there is a hugo.sparseimage in /Users/hugo/. As soon as the user hugo logs in, a number of things happen. First, the Sparse Image is moved from /Users/hugo/ to /Users/.hugo/. And is no longer called hugo.sparseimage but .hugo.sparseimage. Then it is mounted directly to /Users/hugo/ (which is now empty), which is why it must also be pushed out of the user directory, as it would otherwise not be accessible if another file system were mounted over it.

Now the volume is accessible as the user's home directory. Additionally, all programs see the data in the usual place, as it is mounted directly to /Users/hugo and thus, for example, /Users/hugo/Preferences/ is a valid directory in the image. When logging out, the whole thing is reversed: unmounting the image and then moving it back and removing the /Users/.hugo/ directory. Additionally - optionally - compressing the image.

Now it also becomes clear what problem backup programs have: when the backup runs, the home directory is empty and the image is moved to the dot directory. Booting into such a created backup would not find the user's home directory and would present the user with an empty home - it would appear as if all files had been lost. This is also one of the major problems of FileVault: if the computer crashes while you are logged in, the directories and files are moved and renamed. So if you use FileVault and can't access your files after a crash: maybe it helps to log in with another FileVault-free user (which you should also have for backups!) and repair the home directory. I don't know if Apple's disk repair program would do that - so far, none of my FileVault installations have crashed. But for the emergency, you might want to remember this. Overall, the whole thing gives me a rather hacked impression - I would prefer if the whole system could do without renaming and moving. For example, the FileVault could simply lie peacefully next to /Users/hugo as /Users/.hugo.sparseimage and only be mounted - then backups would have no problems, as the structure between logged in and logged out would be identical. I don't know why Apple took this rather complicated form, probably because of the rights to the Sparse Image and the resulting storage location in the user's home directory.

Experts Advocate for VAT Increase

Experts advocate for VAT increase - if you look at these alleged experts, you find IW director Hüther and the chief economist of Deutsche Bank. Completely neutral experts, of course. Why do these allegedly professional journalists write such nonsense? Every idiot from some employers' association or employer-affiliated institute or major bank is called an expert - but if something comes from the employees' camp, they are critics from the unions. This is how the neoliberal crap is beautifully upheld and the citizen is told where to look for his experts - regardless of whether these experts are anything but experts (I still think with horror of the mathematically completely untalented and otherwise quite incompetent financial expert Mertz) or pursue their own political agenda. That in this specific case something must be rotten with the experts should also be noticeable to the dumbest journalist: although the VAT should be increased, but of course only with accompanying measures. Look at these measures. One screams for a reduction in wage-related costs as an accompanying measure and the abolition of the solidarity surcharge - but only the latter is relevant for the consumer. And now look at what someone on social assistance or unemployment benefit II pays in solidarity surcharge - nothing. But this person still fully bears the VAT increase.

The other talks about the fact that the risk of reduced consumption must be accepted, as the advantages of reducing labor costs outweigh - because he also wants to reduce various payments. At least for both sides - at least he did not explicitly talk only from the employers' side, but presumably he simply forgot that there is also an employees' side. And here too: social assistance recipients and unemployment benefit II recipients are not relieved and get the full VAT increase.

None of the so-called experts has spoken about the fact that a VAT increase must be accompanied by an increase in social assistance and unemployment benefit II. Both accept that people who are already impoverished will be even worse off and that more people will fall below the poverty line. They act as if they were experts - but in the end they are only the henchmen of the exploiters and swindlers and want only the same thing that the employers' side has been demanding all along: to squeeze the employees even more.

VAT is the most unsocial tax we have. On the one hand, it is only relevant for consumers, and indeed for domestic consumers. On the other hand, it is based on consumption - and this can of course not fall below a certain level, because everyone has to live and has to pay for it - and thus this tax hits the hardest those who have the least. Because their consumption can hardly be reduced any further.

Contributions will not decrease again

Survey: Health insurance funds' financial situation worsening again - we're all being fooled. By politicians who promise to lower contribution rates and naturally can't. By funds that are supposed to represent our interests but naturally don't. By doctors who promise cooperation in cost reduction but naturally don't want to give up their income (*). By pharmacists who are supposed to serve as a trusted source for patients but have long since lost that trust.

Of course, the contribution reduction for employers - there's always money for that. Only the patients, they have to pay for all of this again. Funds, doctors, and pharmacists, on the other hand, sit on their vested interests and refuse to contribute even minimally to a reduction that would also affect their income.

Funds then do great things like the family doctor model and the in-house pharmacy model - but it doesn't help if the doctors simply refuse to participate (which happens here in Münster quite often). Correct billing of the practice fee is also rarely experienced - if a prescription is simply picked up, without the doctor providing even a bit of service (except for his signature), if the medication has been taken for years - doesn't matter, the practice fee is quickly taken again.

Quality control of doctors? No show - they refuse, that would be too much influence for the patient. So they continue to hide behind the allegedly free choice of doctor - which has long since become laughable only through the emigration of specialists from the associations of statutory health insurance physicians. In some specialties, as a statutory health insurance patient, you only have a chance in the hospital to meet a really qualified doctor - outside you only find quacks ...

At the same time, more and more politicians and functionaries of the various associations are talking about patients taking more responsibility and having to bear more of the costs. Of course, we are supposed to trust the doctors in consultation. We are supposed to trust the pharmacists in choosing the drug manufacturer. We are supposed to trust the funds in billing. How are we supposed to take on more responsibility in such a situation that is based on trust without control? What does taking responsibility mean in this context at all - it's not about responsibility, it's solely about cost shifting. And risk shifting: What, your complaints have worsened because you stopped the treatment too early because of the costs? Your own fault, why do you do such a thing. If patients are asked to take more responsibility, they must also be given the means to do so in the form of possibilities of influence and controls. Otherwise, these are just empty phrases.

Doctors receive preferential treatment from the pharmaceutical industry and then obediently prescribe their results - it's so conveniently practical and comfortable and you benefit from it. The funds sit there and deal more with their own bureaucracy and their own security than with keeping an eye on the doctors and ensuring that this very connection to the pharmaceutical industry does not get out of hand. The pharmacists fight for the preservation of their privileges and go against any alternative form of drug supply and argue with their consulting services - which, however, de facto often no longer exist, if in a pharmacy only one or two trained pharmacists work, the rest are at best better drugstore clerks ... (and the main turnover in pharmacies is made with care products, gummy bears and all kinds of obscure nonsense - hey, why should one trust people who offer homeopathic nonsense and "advise"?)

And the pharmaceutical industry? They are the laughing fifth in the background. Decent profit margins, of course, reduce jobs, because the margins have to increase. In principle, monopolies through absurd patent policy (I recall the nitrogen patent from Linde - which fortunately was overturned) and an increasingly opaque approval bureaucracy. Of course, medicines must be tested before approval - but what the current tests really bring, one has seen in various cases recently (Lipobay, Vioxx and other COX-2 inhibitors - just to name two cases).

What is needed is a much more radical restructuring of the health system, a restructuring designed to enable the patient to actually take responsibility, because he is given the information he needs for this and because he is given advisory facilities that support him in this.

Separation of the billing system and the control function in the funds - the control function is not sufficiently exercised by them anyway, it belongs to independent institutions financed by mandatory contributions from those involved in the health system (doctors, pharmacists, pharmaceutical industry and proportionally health insurance contributions).

The billing procedures should be handled by independent accounting offices for patients and doctors, which should only finance themselves through their billing services - this is already common practice in the economy, where billing services are outsourced to separate companies that are then financed by shares in the cost savings of the parties involved.

More transparency in the pharmaceutical industry - research results must be released if a company wants to obtain approval for medicines. Many research institutions are partly state-financed anyway or are close to universities through their state affiliation. A transparent testing guideline for medicines must be introduced - one that scientists and physicians can understand and in which these people are involved, so that problems can be detected earlier - and cannot be concealed by the company (as was the case with Vioxx).

At the same time, effective cost control for medicines must be introduced - the justifications with research costs are not sufficient here, the whole thing must be traceable. If you add up the alleged research costs of the pharmaceutical industry from various medicines, you eventually reach the point where the gross domestic product is generated alone in the research institutions of the pharmaceutical industry. Here, there must be much greater transparency in order to effectively prevent price gouging for medicines.

And the pharmacists? Sorry, but they simply have to think about what role they still have. This would include that they take their consulting services seriously again and concentrate on what their task would be: the application advice for medicines and the advice on the use of non-prescription medicines. However, a specialist saleswoman with a drugstore education cannot provide this. Justifying one's own existence with a sales monopoly for medicines is certainly not enough. And reading the package leaflet is not enough either.

(*) Here, of course, doctors in hospitals are excluded - their job is then pretty much the last in the health industry and decent working hours cannot be spoken of for them.

Genetic Engineering - It's Not Just About the Sausage

Bundesrat rejects GMO law - the Union wants us to eat GenFood and what the consequences are and whether, for example, organic farming near Gen-fields is no longer possible (because farmers cannot meet the strict requirements, since genetically modified plants do spread after all), they couldn't care less. The fact that most farmers don't value Genshit at all is also irrelevant. The fact that in the end only the big corporations win and are interested in the whole genetic technology - because they can strangle farmers and squeeze them even more - is probably not irrelevant. Because somewhere the donation millions must come from ...

Genetically modified foods serve the combination (forced combination!) of seeds and fertilizers or crop protection products and the patent protection of the use of the seeds. It directly attacks the classic traditional way of working of farmers - for example, the use of fruit for the next sowing is usually not possible (because infertile) or prohibited (by contract). There is no biological reason in Germany - neither do we have to endure extreme climatic conditions nor particularly catastrophic pest attacks. It is solely about the maximization of the companies that produce the genetically modified seeds.

If you then look at who is behind it, something else becomes apparent: another point is the elimination of the classic production sites for seeds - many of the genetic engineering companies are more associated with the pharmaceutical or chemical industry than with classical agriculture (although there are also black sheep among the seed producers - but these also belong more to the industry). Here, industry is simply moving into an area it could not serve before and wants to break into - ultimately with coercive means.

With genetically modified seeds, not only are foods produced whose consumption is rejected by the majority of consumers - an entire economic sector is also being strangled or possibly even destroyed. At least severely damaged.

Agriculture, through its structures with cooperatives, associations, interest groups and political lobbying, has a fairly large power and influence on its fate - so far. But now the bad guys want to play along, whose goal is exactly the takeover of this - previously self-managed - power.

Of course, the Union - which has repeatedly revealed itself to be industry-dependent - hitches itself to the cart. And of course, our industry chancellor performs this balancing act and Minister Künast has to present a law that is already watered down to the extreme - and even that is rejected in the council (which has a Union majority).

PostgreSQL 8.0.2 released with patent fix

Just found: PostgreSQL 8.0.2 released with patent fix. PostgreSQL has therefore received a new minor version in which a patented caching algorithm (arc) was replaced with a non-patented one (2Q). The interesting part: this is one of the patents that IBM has released for open source. And why did they switch anyway? Because IBM has released these patents for open source use, but not for commercial use - PostgreSQL, however, is under the BSD license, which explicitly allows completely free commercial use.

For PostgreSQL itself, this would not have been a problem: as long as it remains BSD, the use of the IBM patent would not have caused any problems. Only a later license change - such as when someone chooses BSD software as the basis for a commercial product - would have been excluded.

A nice example of how even liberally handled software patents cause problems. Because medium-sized companies that build commercial products on open source would have lost a previously available basis - solely due to the patented caching algorithm (efficient storage of and efficient access to data - so patentable according to Clements' idea).

In the case of PostgreSQL, it went smoothly: the patented algorithm is not faster or better than its non-patented counterpart. And for the software itself, nothing really world-shattering has changed. But this does not have to (and will not) always go so smoothly. In the field of audio processing and video processing, the patented minefields are much more extensive and therefore much more critical for free projects.

Okay, one might still argue that this would not have happened with a GPL license. But with a GPL license, certain forms of use as they already exist in PostgreSQL today (e.g., companies building special databases on PostgreSQL without making these special databases open source) are not possible. You can take a stand on this as you like - ideology aside - the PostgreSQL project has chosen the BSD license as its basis.

Even well-intentioned patent handling in the context of open source software would therefore be problematic. Exactly this is the reason why I am generally against software patents.

Police Fear Anonymity and Cryptography on the Internet

The police fear anonymity and cryptography on the internet - and therefore, for example, rail against state-funded anonymization services. However, this is simply the usual conflict of technology: the application can happen in two ways. No one talks about the reasons why anonymization services and encryption systems are quite legitimately used; only criminal use is the topic. Should we ban hammers and sickles, after all, you can kill people with both.

What is worrying about this development is that the use of cryptography will probably be restricted - or as it is called in modern German: regulated - in the short or long term. And at some point, the situation will arise where encrypted emails are already considered suspicious. Suspicion is no longer needed to spy on someone. And what is more obvious than to assume illegality of someone who encrypts their emails?

Every society must deal with abuse of the system and abuse of society - and with those who completely fall out of societal norms. This is annoying and in many cases even tragic - but cannot be changed. However, the problem is not solved by putting the entire society under general suspicion. Ultimately, what remains is a society that is no longer worth living in and preserving because everything is based on surveillance and denunciation. Restricting the rights of ordinary citizens does not result in a single fewer criminal - rather more, because more and more citizens will resist the regulations (and according to the definition of people like Otto Orwell, are then simply criminals).

What is completely ignored here, in my opinion, is the point that crime does not only consist of the perhaps technically difficult-to-access encrypted channel - there must always also be effects outside. Child pornography is not only traded on the internet - it is also produced at some point. Organized crime does not only organize the exchange of PGP keys on the internet - it organizes human smuggling, illegal gambling, drug trafficking, and who knows what else. Every crime therefore always has facets that take place quite openly and recognizably in society. Investigations are primarily carried out in this area to this day - the eavesdropping has not yet brought reproducibly better results than those already achieved through normal investigations. On the contrary: the eavesdropping, dragnet searches, and similar approaches have all failed, especially when considering the immense personnel deployments (and thus costs) of these actions. And no, the genetic sample was not decisive even in the Moshammermord case.

Regulating network technologies will not prevent their use for criminal purposes - it will only make legal use more difficult or stigmatize it. Someone who smuggles people certainly has far fewer scruples about violating cryptography laws than someone who only uses cryptography because they don't like the idea of the state reading everything.

Install grsecurity

I used to play around with grsecurity before, but the installation was a bit tricky - especially, you didn't know what to configure as a start and how to begin a reasonable rule-based security - the whole thing was more of a trial-and-error hopping than an understandable installation. However, for a security solution for an operating system, it is rather negative if you don't get the feeling of understanding what is happening there.

With the current versions of grsecurity, however, this has changed to a large extent. On the one hand, the patches run completely smoothly into the kernel, on the other hand there are two essential features that make the start easier: a Quick Guide and RBACK Full System Learning.

The Quick Guide provides a short and concise installation guide for grsecurity with a starting configuration for all the options that already offer a fairly good basis and excludes problematic options (which could exclude some system services). This way you get a grsecurity installation that offers a lot of protection but usually does not conflict with common system services. This is especially important for people with root servers - a wrong basic configuration could lock themselves out of the system and thus make the system unusable and a service case.

But the Full System Learning is really nice: here the RBAC engine is transformed into a logging system and it is logged which users execute what and what rights are needed for this. The whole thing is still controlled by corresponding basic configs that classify different system areas differently (e.g. ensure that the user can access everything in his home, but not necessarily everything in various system directories). You just let the system run for a few days (to also catch cron jobs) and then generate a starting configuration for RBAC from it. You can of course still fine-tune this (you should also do this later - but as a start it is already quite usable).

RBAC is basically a second security/rights layer above the classic user/group mechanisms of Linux. The root user does not automatically have all rights and access to all areas. Instead, a user must log in to the RBAC subsystem in parallel to his normal login (which happens implicitly through the system start for system services!). Rules are stored there that describe how different roles in the system have different access permissions.

The advantage: even automatically started system services are only allowed to access what is provided for in the RBAC configuration - even if they run under root rights. They only have limited capabilities in the system until they log in to the RBAC subsystem - but for this, a manual password entry is usually required for the higher roles. Attackers from the outside can indeed gain the user rights restricted by RBAC, but usually cannot get to the higher roles and therefore cannot interfere with the system as much as would be possible without RBAC.

The disadvantage (should not be concealed): RBAC is complex. And complicated. If you do something wrong, the system is locked - quite annoying for root servers that are somewhere out there in the network. You should always have fallback strategies so that you can still reach a blocked system. For example, after changes to the RBACs, comment out the automatic activation at system startup so that a reboot puts the system in a more open state in case of problems. Or have an emergency access through which you can still administer a blocked system to some extent. In general, as with all complex systems: Keep your hands off if you don't know what you're doing.

In addition to the very powerful RBAC, grsecurity offers a whole range of other mechanisms. The second major block is pax(important: here a current version must be used, in all older ones there is an evil security hole) - a subsystem that restricts buffer overflow attacks by removing the executability and/or writability from memory blocks. Especially important for the stack, as most buffer overflows start there. Pax ensures that writable areas are not executable at the same time.

A third larger block is the better protection of chroot jails. The classic possibilities for processes to break out of a chroot jail are no longer given, since many functions necessary for this are simply deactivated in a chroot jail. Especially for admins who run their services in chroot jails, grsecurity offers important tools, as these chroot jails were only very cumbersome to make really escape-proof.

The rest of grsecurity deals with a whole collection of smaller patches and changes in the system, many of which deal with better randomization of ports/sockets/pids and other system IDs. This makes attacks more difficult because the behavior of the system is less predictable - especially important for various local exploits, where, for example, the knowledge of the PID of a process is used to gain access to areas that are identified via the PID (memory areas, temporary files, etc.). The visibility of system processes is also restricted - normal users simply do not get access to the entire process list and are also restricted in the /proc file system - and can therefore not so easily attack running system processes.

A complete list of grsecurity features is online.

All in all, grsecurity offers a very sensible collection of security patches that should be recommended to every server operator - the possibility of remote exploits is drastically restricted and local system security is significantly enhanced by RBAC. There is no reason not to use the patch, for example, on root servers as a standard, given the rather simple implementation of the grsecurity patch in an existing system (simply patch the kernel and reinstall, boot, learn, activate - done). Actually, a security patch should be part of the system setup just like a backup strategy.

Now it would of course be even nicer if the actual documentation of the system was a bit larger than the man pages and a few whitepapers - and above all was up to date. This is still a real drawback, because the right feeling of understanding the system does not really set in without qualified documentation ...

mod_fastcgi and mod_rewrite

Well, I actually tried using PHP as FastCGI - among other things because I could also use a newer PHP version. And what happened? Nothing. And there was a massive problem with mod rewrite rules. In the WordPress .htaccess, everything is rewritten to the index.php. The actual path that was accessed is appended to the index.php as PATH INFO. Well, and the PHP then spits out this information again and does the right thing.

But when I had activated FastCGI, that didn't work - the PHP always claimed that no input file was passed. So as if I had called the PHP without parameters. The WordPress administration - which works with normal PHP files - worked wonderfully. And the permission stuff also worked well, everything ran under my own user.

Only the Rewrite-Rules didn't work - and thus the whole site didn't. Pretty annoying. Especially since I can't properly test it without taking down my main site. It's also annoying that suexec apparently looks for the actual FCGI starters in the document root of the primary virtual server - not in those of the actual virtual servers. This makes the whole situation a bit unclear, as the programs (the starters are small shell scripts) are not where the files are. Unless you have created your virtual servers below the primary virtual server - but I personally consider that highly nonsensical, as you can then bypass Perl modules loaded in the virtual server by direct path specifications via the default server.

Ergo: a failure. Unfortunately. Annoying. Now I have to somehow put together a test box with which I can analyze this problem ...

Update: a bit of searching and digging on the net and a short test and I'm wiser: PATH_INFO with PHP as FCGI version under Apache is broken. Apparently, PHP gets the wrong PATH_INFO entry and the wrong SCRIPT NAME. As a result, the interpreter simply does not find its script when PATH INFO is set and nothing works anymore. Now I have to search further to see if there is a solution. cgi.fix_pathinfo = 1 (which is generally offered as a help for this) does not work anyway. But if I see it correctly, there is no usable solution for this - at least none that is obvious to me. Damn.

Update 2: I found a solution. This is based on simply not using Apache, but lighttpd - and putting Apache in front as a transparent proxy. This works quite well, especially if I strongly de-core the Apache and throw the PHP out of it, it also becomes much slimmer. And lighttpd can run under different user accounts, so I also save myself the wild hacking with suexec. However, a lighttpd process then runs per user (lighttpd only needs one process per server, as it works with asynchronous communication) and the PHPs run wild as FastCGI processes, not as Apache-integrated modules. Apache itself is then only responsible for purely static presences or sites with Perl modules - I still have quite a few of those. At the moment I only have a game site running there, but maybe it will be switched in the next few days. The method by which cruft-free URIs are produced is quite funny: in WordPress you can simply enter the index.php as an Error-Document: ErrorDocument 404 /index.php?error=404 would be the entry in the .htaccess, in lighttpd there is an equivalent entry. This automatically redirects non-existent files (and the cruft-free URIs do not exist as physical files) to WordPress. There it is then checked whether there really is no data for the URI and if there is something there (because it is a WordPress URI), the status is simply reset. For the latter, I had to install a small patch in WordPress. This saves you all the RewriteRules and works with almost any server. And because it's now 1:41, I'm going to bed now ...

Back to Camino from Firefox ...

... and back. Odyssey of the web browsers.

After working with Firefox for a few days, I switched back to Camino. Why? Well, under OS X, Firefox is suboptimal. For one, I have the impression that fonts are generally displayed smaller than in Camino or other real Mac programs. It might be an illusion. However, it is not an illusion that Firefox under OS X does not support Services. And that is annoying - what's the point if a bunch of programs hook into the Services menu and provide useful services that build on highlighted text in other programs, if the main application in which I spend my time on the computer does not support it at all?

Just as annoying was the fact that Tab-X is not supported under OS X. This extension attaches a close icon to every tab. I don't know what the UI designer of Firefox was thinking, but I consider neither the mandatory activation of a tab and then clicking on a tiny X at the right edge of the toolbar to be ergonomic, nor closing a tab via the context menu. Okay, you can get used to that if necessary.

Furthermore, I was constantly bothered by the fact that Firefox has its own password manager and does not use the KeyChain. I find it simply practical that all kinds of programs can register at a central location and that I can delete my passwords there if I need to. In addition, this helps to avoid constantly having to re-enter passwords just because you visit a page with a different browser.

Unfortunately, I lose all the nice things that are available via Firefox extensions - for example, the Web Developer Toolbar. Only that it doesn't work on my Mac anyway, who knows why - so I've only ever had it under Linux, and there I continue to use Firefox. I will miss the plugin for the Google PageRank status and the plugin for mozcc, however - both were quite practical. It's somehow stupid that I can't have both - a Firefox with proper integration into OS X, that would be it ...

Due to the pretty broken 0.8.2 of Camino, I downloaded and installed the 0.8.1 again. At least it has functioning tabs and doesn't crash all the time. I have no idea what they did with the 0.8.2, but it was definitely not to the benefit of Camino.

And of course, right after I wrote this, Camino started acting up. I can't believe it. The 0.8.1 had worked flawlessly before. Nevertheless, there were the same problems as with the 0.8.2 - probably triggered by some sites with which I work more frequently now than before? I have no idea - I haven't installed any special tools under OS X, on the contrary, I have uninstalled one.

So, trying other browsers again. Safari 1.0 under OS X 10.2.8 is clearly behind in features - but it would still remain as an alternative, but it crashes on some pages. OmniWeb is basically a souped-up Safari, but it crashes even more frequently. And Opera doesn't get along with the CSS of the WordPress admin at all - it's wildly mixed up. In addition, it always asks multiple times for passwords and Keychain access when I access some protected pages. And it has had this quirk for months - not very confidence-inspiring.

The IE for Mac is not even a desperation option. Netscape? No, sorry, but that's not necessary. Mozilla also not - then rather Firefox, because Mozilla not only does not integrate well into the system, it also looks completely different from OS X applications ...

The only really usable alternative browser under OS X 10.2 is - despite its problems - OmniWeb. As a last resort, Safari, but OmniWeb is more advanced in rendering on some pages. However, it still does not support things like clicking on the label of a checkbox to toggle it - it is used in the WordPress admin and avoids silly target practice. Except in OmniWeb or Safari. Okay, the fact that the QuickTag bar is missing in OmniWeb and Safari is intentional in WordPress - the JavaScript is not quite compatible.

So, back to the whole thing and use Firefox again and complain about the missing services (which, by the way, can also work in Carbon applications - if the programmer has considered this in his program)? Or just play with OmniWeb and see if you can get around the problems?

And what do we learn from this? All browsers suck. Even the good ones.

Neues Spiel, neues Glück: b2evolution

Heute hab ich mir mal b2evolution angeguckt (wie üblich nur ein kurzer oberflächlicher Testflug). Das ist ja mit WordPress verwandt und alleine deshalb schon interessant - mal gucken was andere aus dem gleichen Basiscode gemacht haben. Also das Zeug geholt, das Kubrick Skin geholt (hey, ich mag Kubrick mitlerweile ) und losgelegt.

Was mir sofort auffällt: b2evolution legt wesentlich mehr Wert auf multi-allesmögliche. Multi-Blog (es werden gleich 4 Blogs vorinstalliert mitgeliefert, wovon eines ein "alle Blogs" Blog ist und eines ein Linkblog), Multi-User (mit Berechtigungen für Blogs etc. - also als Bloggerplattform für kleinere Usergruppen geeignet) und Multi-Language (nett: man kann an jedem Posting die Sprache festlegen, Sprachen pro Blog festlegen). Das gefällt schon mal. Das Backend ist leidlich gut zu bedienen und man findet das meiste recht fix wieder.

Aber dann die Dokumentation. Ok, ja, das wichtigste ist dokumentiert und auffindbar. Aber sobald man in die Tiefe geht, ist nahezu nichts selbsterklärend oder dokumentiert. Ok, ich gebs zu, ich hätte mir nicht gleich auf die Fahnen schreiben sollen die URIs auf die komplizierteste Form zu bringen - nämlich über sogenannte Stub-Dateien. Das sind alternative PHP-Files über die alles gezogen wird um darüber spezielle Einstellungen vorzubelegen. Angeblich soll man damit auch eine URI-Struktur wie bei Wordpress hinkriegen - der b2evolution-Standard ist nämlich so, das in der URI immer das index.php vorkommt und die zusätzlichen Elemente hinten drangehängt sind. Das ist hässlich. Das will ich nicht. Das zu ändern geht scheinbar nur mit Apache-Mitteln in Handarbeit (nein, nicht wie bei WordPress die nette und freundliche Unterstützung der automatisch generierten .htaccess Datei) und dann entsprechenden Einstellungen in b2evolution. Ok, kann man machen - ich kenne Apache gut genug. Aber wieso so umständlich, wenns auch einfacher geht?

Nunja, aber der echte Pferdefuss für mich kommt noch: b2evolution kann nur Blogs. Jedenfalls in der Standardausstattung. Genau - nur Listen von Postings die zeitlich geordnet sind. Langweilig. Nicht mal einfache statische Pages - sorry, aber wo pack ich das Impressum hin? Von Hand erstellte Files, die man daneben packt? Möglich, klar. Aber nicht gerade anwenderfreundlich.

Antispam-Mittel gibts auch einige, zum Beispiel eine zentral gepflegte Sperrwortliste (naja, Sperrwortlisten halte ich persönlich nicht für so geeignet) und Benutzerregistrierung. Nicht viel, aber erstmal ausreichend. Mehr kann man sicherlich über Plugins machen. Beim Stichwort Plugins ist eine sehr nette Eigenschaft zu nennen: man kann am Posting unterschiedliche Filter aktiviert haben. Je nach Posting immer wieder neu. Sehr nett - WordPress hat da ein echtes Defizit, die aktivierten Filter gelten für alles über alles - eine Änderung und alte Postings werden plötzlich falsch formatiert (wenn es ein Output-Filter ist).

Ebenfalls nett: die hierarchischen Kategorien verhalten sich wirklich hierarchisch - bei WordPress sind die ja nur hierarchisch gruppiert, aber z.B. wird mit der Hierarchie nicht viel gemacht. Bei b2evolution wandern Postings einer Kategorie automatisch an die übergeordnete, wenn eine Kategorie gelöscht wird. Ausserdem kann man durch die Multiblog-Eigenschaft an einem Posting Kategorien verschiedener Blogs aktivieren und damit sozusagen crossposten - wenn es denn in den Settings erlaubt ist.

Layoutanpassungen gehen über Templates und Skins. Templates sind vergleichbar zum WordPress 1.2 Modus und Skins eher zum WordPress 1.5 Modus. Also bei Templates wird alles durch ein PHP-File gezogen und bei Skins werden mehrere Vorlagen zusammengefasst und dann daraus das Blog gebaut. Spezialanpassungen kann man dann über eigene Stubdateien machen (die gleichen die auch für die hübscheren URIs genommen werden sollen) und darüber z.B. feste Layouts aufbauen mit denen man dann zum Beispiel statische Seiten simulieren könnte.

Alles in allem das Ergebnis des Kurzfluges: nettes System (trotz der etwas barocken Ecken in der URI-Erstellung und der recht spartanischen Dokumentation) für Hacker und Leute die sich in den Code reinwühlen mögen. So zum direkt losstarten finde ich es weniger geeignet - da ist WordPress wesentlich einfacher zu verstehen und zu starten. Und um mit Drupal zu konkurrieren ist b2evolution in den Features zu mager - einfach zu stark auf Blogs ausgerichtet. Man kann es zwar in die passende Richtung verbiegen - aber warum sollte man das machen wollen, wenn man auch was fertiges nehmen kann, das all das schon kann?

Hmm. Klingt relativ ähnlich zu dem was ich vor fast einem Jahr über b2evolution geschrieben habe. Viel Entwicklung hat es dort irgendwie nicht gegeben in der Zwischenzeit.

Und nochmal Logfiles

Da ich ja nun ein interessantes Studienobjekt hatte, wollte ich mal gucken inwieweit ich mit ein bischen Clusteranalyse in meinen Logfiles irgendwas interessantes zutagefördern würde. Ich habe also eine Matrix angelegt aus Referrern und zugreifenden IP-Adressen und mir damit mal einen Überblick über typische Userszenarien gemacht - also wie sehen normale User aus im Log, und wie sehen Referrer-Spammer aus und wie sieht unser Freund aus.

Alle drei Varianten lassen sich gut unterscheiden, auch wenn ich im Moment da noch eher davor zurückschrecken würde das algorithmisch zu fassen - das lässt sich nämlich alles recht gut simulieren. Trotzdem sind ein paar Auffälligkeiten zu sehen. Zuerst mal ein ganz normaler Benutzer:


aa.bb.cc.dd: 7 Zugriffe, 2005-02-05 03:01:45.00 - 2005-02-04 16:18:09.00
 0065*-
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4031994 ...
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4031612 ...
 0001*http://mudbomb.com/archives/2005/02/02/wysiwyg-plugin-for-wo ...
 0001*http://www.heise.de/newsticker/meldung/55992
 0001*http://log.netbib.de/archives/2005/02/04/nzz-online-archiv-n ...
 0001*http://www.heise.de/newsticker/meldung/56000
 0001*http://a.wholelottanothing.org/2005/02/no_one_can_have.html

Man sieht schön wie dieser User von meinem Weblog weggeklickt hat und wieder zurückgekommen ist - die Referrer sind nämlich mitnichten alles Links auf mich, sondern falsche Referrer die die Browser schicken, wenn der Benutzer von einer Site auf eine andere wechselt. Eigentlich sollen Referrer ja nur dann geschickt werden, wenn auch wirklich ein Link geklickt wird - kaum ein Browser macht das aber richtig. Der Besuch war an einem definierten Tag und er ist direkt eingestiegen durch Eingabe des Domainnamens (die "-" Referrer stehen oben und oben steht der früheste Referrer der vorkommt).

Oder hier mal ein Zugriff von mir:


aa.bb.cc.dd: 6 Zugriffe, 2005-02-04 01:11:56.00 - 2005-02-03 08:27:09.00
 0045*-
 0001*http://www.aylwardfamily.com/content/tbping.asp
 0001*http://temboz.rfc1437.de/view
 0001*http://web.morons.org/article.jsp?sectionid=1&id=5947
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4029220 ...
 0001*http://sport.ard.de/sp/fussball/news200502/03/bvb_verpfaende ...
 0001*http://www.cadenhead.org/workbench/entry/2005/02/03.html

Ich erkenne mich daran, das Referrer mit temboz.rfc1437.de vorkommen - das ist mein Online-Aggregator. Sieht ähnlich aus - ne Menge falsch geschickter Referrer. Noch ein anderer User:


aa.bb.cc.dd: 19 Zugriffe, 2005-02-12 14:45:35.00 - 2005-01-31 14:17:07.00
 0015*http://www.muensterland.org/system/weblogUpdates.py
 0002*-
 0001*http://www.google.com/search?q=cocoa+openmcl&ie=UTF-8&oe=UTF ...
 0001*http://blog.schockwellenreiter.de/8136
 0001*http://www.google.com/search?q=%22Rainer+Joswig%22&ie=UTF-8& ...
 0001*http://www.google.com/search?q=IDEKit&hl=de&lr=&c2coff=1&sta ...

Dieser kam öfter (also mehrere Tage) über meine Update-Seite auf muensterland.org und zusätzlich hat er noch nach Lisp-Themen gesucht. Und vom Herrn der Schockwelle ist er auch mal gekommen. Absolut typisches Verhalten.

Jetzt mal im Vergleich ein typischer Referrer-Spammer:


aa.bb.cc.dd 6 Zugriffe, 2005-02-12 17:27:27.00 - 2005-02-02 09:25:22.00
 0002*http://tramadol.freakycheats.com/
 0001*http://diet-pills.ronnieazza.com/
 0001*http://phentermine.psxtreme.com/
 0001*http://free-online-poker.yelucie.com/
 0001*http://poker-games.psxtreme.com/

Alle Referrer sind direkte Domain-Referrer. Keine "-" Referrer - also keine Zugriffe ohne Referrer. Keine sonstigen Zugriffe - würde ich es genauer analysieren nach Seitentyp, würde auffallen das keine Bilder etc. zugegriffen werden. Leicht zu erkennen - sieht einfach mager aus. Typisch ist auch das jede URL nur einmal oder zweimal angegeben ist.

Jetzt unser neuer Freund:


aa.bb.cc.dd: 100 Zugriffe, 2005-02-13 15:06:16.00 - 2005-02-11 07:07:55.00
 0039*-
 0030*http://irish.typepad.com
 0015*http://www208.pair.com
 0015*http://blogs.salon.com
 0015*http://hfilesreviewer.f2o.org
 0015*http://betas.intercom.net
 0005*http://vowe.net
 0005*http://spleenville.com

Was auffällt sind die Referrer ohne abschliessenden / - untypisch für Referrer-Spam. Ausserdem halt ganz normale Sites. Was auch auffällt, es werden Seiten zugegriffen ohne Referrer - dahinter verstecken sich die RSS-Feeds. Auch dieser ist also leicht von Usern zu unterscheiden. Vor allem da ein gewisser Rhythmus drin ist - scheinbar immer 15 Zugriffe mit einem Referrer, dann den Referrer wechseln. Entweder ist die Referrer-Liste recht klein, oder ich hatte Glück das er zweimal den gleichen bei mir probiert hat - einer ist nämlich 30x da.

Normale Bots braucht man nicht gross zu vergleichen - die wenigsten schicken Referrer mit und sind deshalb völlig uninteressant. Ich hatte einen, der mir aufgefallen war:


aa.bb.cc.dd: 5 Zugriffe, 2005-02-13 15:21:26.00 - 2005-01-31 01:01:07.00
 2612*-
 0003*http://www.everyfeed.com/admin/new_site_validation.php?site= ...
 0002*http://www.everyfeed.com/admin/new_site_validation.php?site= ...

Eine neue Suchmaschine für Feeds die ich noch nicht kannte. Scheinbar hat der Admin gerade vorher irgendwo meine Adresse eingetragen und dann hat der Bot losgelegt die Seiten zu sammeln. Danach hat er dann im Administrationsinterface meine von ihm neu gefundenen Feeds freigeschaltet. Scheint ein kleines System zu sein - der Bot läuft von der gleichen IP wie das Administrationsinterface. Die meisten anderen Bots kommen von ganzen Botfarmen, Webspidern ist halt eine aufwändige Sache ...

Zusammenfassend lässt sich also feststellen, das die derzeitige Generation von Referrer-Spammer-Bots und anderen Mal-Bots noch recht primitiv aufgebaut ist. Sie benutzen keine Botnetze um viele unterschiedliche Adressen zu verwenden und sich dadurch zu verstecken, sie benutzen reine Server-URLs statt Seiten-URLs und haben auch sonst recht viele typische Kennzeichen wie z.B. bestimmte Rhythmen. Ausserdem kommen sie fast immer mehrfach.

Leider sind das keine guten Merkmale um sie algorithmisch zu fassen - ausser man lässt seine Referrer in eine SQL-Datenbank laufen und prüft jeden Referrer mit entsprechenden Selects auf die typischen Kriterien. Darüber könnte man dann durchaus die üblichen Verdächtigen erwischen und gleich auf dem Server blocken. Denn normale User-Zugriffe sehen deutlich anders aus.

Allerdings sind auch schon neue Generationen in der Mache - wie mein kleiner Freund, der mit dem fehlenden /, zeigt. Und dank der dämlichen Browser mit ihren falsch erzeugten Referrern (die viel mehr über die History des Browsers aussagen als über tatsächliche Link-Verfolgung) kann man nicht einfach die referenzierten Seiten gegenchecken, da viele Referrer reine Blindreferrer sind.

Weg mit Trackback

Isotopp grübelt anlässlich des Spamtags über Trackback Spam und stellt mehrere Ansätze vor. Einer davon arbeitet mit einer Gegenprüfung der Trackback-URL gegen die IP des einsendenden Rechners - wenn der Rechner eine andere IP hat als der im Trackback beworbene Server, dann wäre das warscheinlich Spam. Ich hab mal meine eigenen Kommentare dazu zusammengeschrieben - und begründet, warum ich Trackback lieber heute als morgen los wäre. Komplett. Und ja, das ist eine komlette 180-Grad Wendung meinerseits zum Thema Trackback.

Der IP-Test-Ansatz kommt mal wieder aus der Sicht der reinen servererstellten Blogs. Es gibt aber dummerweise einen grossen Haufen Trackback-fähiger Softwareinstallationen die nicht auf dem Server laufen müssen (oft auch nicht laufen) auf dem die Blogseiten liegen - alle Tools die statischen Output produzieren zum Beispiel. Grosse Installationen sind Radio Userland Blogs. Kleinere PyDS Blogs. Oder auch Blosxom-Varianten im offline-Modus (sofern es da mitlerweile trackbackfähige Versionen gibt - aber das es typische Hackertools sind, gibts das mit Sicherheit).

Dann gibts noch die diversen Tools die nicht Trackback-fähig sind, wo die User dann einen externen Trackback-Agent benutzen um die Trackbacks abzusetzen.

Und last but not least kommen auch noch die diversen Blogger/MetaWeblogAPI-Clients hinzu, die selber den Trackback absetzen weil z.B. nur MoveableType im MetaWeblogAPI das Triggern von Trackbacks erlaubt, aber andere APIs nicht.

Von daher ist der Ansatz mit der IP entweder nur als ein Filter zu sehen der einen Teil der Trackbacks durchwinkt, oder aber eine Verhinderung von Trackbacks von den oben genannten Usern. Und letzteres wäre ausgesprochen unschön.

Eigentlich ist das Problem ganz einfach: Trackback ist ein krankes Protokoll das mit der heissen Nadel gestrickt wurde, ohne das sich der Entwickler auch nur einen Hauch von Gedanken zu dem ganzen Thema gemacht hat. Und gehört daher IMO auf den Müllhaufen der API-Geschichte. Das ich es hier unterstütze liegt einfach nur daran, das WordPress es standardmäßig implementiert hat. Sobald der manuelle Moderationsaufwand zu hoch wird, fliegt Trackback hier ganz raus.

Sorry, aber in dem Punkt Trackback haben die MoveableType-Macher wirklich Nähe zu Microsoft-Verhalten gezeigt: einen völlig unzureichenden Pseudo-Standard durch Marktdominanz durchgedrückt - ohne sich überhaupt mal über die Sicherheitsimplikationen Gedanken zu machen. Warum wohl bei RFCs immer ein entsprechender Absatz über Sicherheitsprobleme zwingend ist? Leider haben die ganzen Blogentwickler alle fleissig mitgezogen (ja, ich auch - bei Python Desktop Server) und wir haben dieses alberne Protokoll am Hals. Und seine - völlig erwartbaren - Probleme.

Besser jetzt eine bessere Alternative entwickeln und forcieren - z.B. PingBack. Bei PingBack ist definiert, das die Seite die einen PingBack auf eine andere Seite ausführen will auch wirklich diesen Link dort exakt so enthalten muss - im API werden immer zwei URLs übertragen, die eigene und die fremde URL. Die eigene URL muss im Source auf die fremde URL zeigen, nur dann wird der fremde Server den PingBack annehmen.

Für Spammer ist das ziemlich absurd zu handhaben - sie müssten vor jedem Spam die Seite umschiessen oder über entsprechende Servermechanismen dafür sorgen, das die gespammten Weblogs dann beim Test entsprechend eine Seite vorgegaukelt bekommen, in der dieser Link drin ist. Natürlich ist das durchaus machbar - aber der Aufwand ist deutlich höher und durch die nötige Servertechnik ist das nicht mehr mit fremden offenen Proxies und/oder Dialup-Zugang machbar.

Von daher wäre der richtige Weg einfach der Wechsel des Linkprotokolls. Weg mit Trackback. Das Trackback-Loch kann man nicht stopfen. PS: wer sich mal meinen Trackback in Isotopps Posting anguckt sieht gleich das zweite Problem von Trackback: abgesehen vom riesigen Sicherheitsproblem ist nämlich die Zeichensatzunterstützung von Trackbacks schlichtweg ein totales Debakel. Auch hier hat der ursprüngliche Autor des Pseudo-Standards keine Minute über mögliche Probleme nachgedacht. Und dann wundern sich noch manche Leute wenn TypeKey von den Moveable-Type-Leuten nicht so richtig akzeptiert wird - sorry, aber Leute die so bescheidene Standards machen werde ich auch noch gerade die Loginverwaltung übertragen ...

Zope Hosting and Performance - English Version

Somebody asked for an english translation of my article on Zope Hosting and Performance. Here it is - ok, it's not so much a direct translation than a rewrite of the story in english. Enjoy.

Recently the Schockwellenreiter had problems with his blog server. He is using Zope with Plone and CoreBlog. Since I am doing professional Zope hosting for some years now, running systems that range in the 2000-3000 hits per minute scale, I thought I put together some of the stuff I learnt (sometimes the hard way) about Zope and performance.

  • The most important step I would take: slim down your application. Throw out everything you might have in the Zope database that doesn't need to stay there. If it doesn't need content management, store it in folders that are served by Apache. Use mod_rewrite to seemlessly integrate it into your site so that people from the outside won't notice a difference. This can be best done for layout images, stylesheets etc. - Apache is much faster in delivering those.
  • Use Zope caching if possible at any rate. The main parameter you need to check: do you have enough RAM. Zope will grow when using caching (especially the RAMCacheManager). The automatic cleanup won't rescue you - Zope will still grow. Set up some process monitoring that automatically kills and restarts Zope processes that grow above an upper bound to prevent paging due to too large memory consumption. This is even a good idea if you don't use caching at all.
  • There are two noteable cache managers: one uses RAM and the other uses an HTTP accelerator. The RAMCacheManager caches results of objects in memory and so can be used to cache small objects that take much time or much resources to construct. The HTTPCacheManager is for using a HTTP accelerator - most likely people will use Squid, but you can use an appropriately configured Apache, too. The cache manager will provide the right Expires and Cache-Control headers so that most traffic can be delivered our of the HTTP accelerators instead of Zope.
  • Large Zope objects kill Zopes performance. When using caching they destroy caching efficiency by polluting the cache with large blobs of stuff that isn't often required and Zope itself will get a drain in performance by them, too. The reason is that Zope output is constructed in-memory. Constructing large objects in memory takes much resources due to the security layers and architectural layers in Zope. Better to create them with cronjobs or other means outside the Zope server and deliver them directly with Apache. Apache is much faster. A typical situation is when users create PDF documents in Zope instead of creating them outside. Bad idea.
  • Use ZEO. ZEO rocks. Really. In essence it's just the ZODB with a small communication layer on top. This layer is used in Zope instances instead of using the ZODB directly. That way you can run several process groups on your machine, all connecting to the same database. This helps with the above mentioned process restarting: when one is down, the other does the work. Use mod_backhand in Apache to distribute the load between the process groups or use other load balancing tools. ZEO makes regular database packs easier, too: they run on the server and not in the Zope instances - they actually don't notice much of the running pack.
  • If you have, use a SMP machine. Or buy one. Really - that helps. You need to run ZEO and multiple Zope instances, though - otherwise the global interpreter lock of Python will hit you over the head and Zope will just use one of the two processors. That's one reason why you want multiple process groups in the first place - distribution of load on the machine itself, making use of multiple processors.
  • You can gain performance by reducing the architectural layers your code goes through. Python scripts are faster than DTML. Zope products are faster than Python scripts. Remove complex code from your server and move it into products or other outside places. This needs rewriting of application code, so it isn't allways an option to do - but if you do, it will pay back.
  • Don't let your ZODB file grow too large. The ZODB only appends on write access - so the file grows. It grows quite large, if you don't pack regularily. If you don't pack and you have multi-GB ZODB files, don't complain about slow server starts ...
  • If you have complex code in your Zope application, it might be worthwile to put them into some outside server and connect to Zope with some RPC means to trigger execution. I use my |TooFPy| for stuff like this - just pull out code, build a tool and hook it into the Zope application via XMLRPC. Yes, XMLRPC can be quite fast - for example pyXMLRPC is a C-written version that is very fast. Moving code outside Zope helps because this code can't block one of the statically allocated listeners to calculate stuff. Just upping the number of listener threads doesn't pay as you would expect: due to the global interpreter lock still only one thread will run at a time and if your code uses C extensions, it might even block all other threads while using it.
  • If you use PostgreSQL, use PsycoPG as the database driver. PsycoPG uses session pooling and is very fast when your system get's lots of hits. Other drivers often block Zope due to limitations like only one query at a time and other such nonsense. Many admins had to learn the hard way that 16 listener threads aren't really 16 available slots if SQL drivers come into play ...

There are more ways to help performance, but the above are doable with relatively small work and are mostly dependend on wether you have enough memory and maybe a SMP machine. Memory is important - the more the better. If you can put memory into your machine, do so. There is no such thing as too-much-memory (as long as your OS supports the amount of memory, of course).

What to do if even those tips above don't work? Yes, I was in that situation. If you come into such a situation, there is only one - rather brutish - solution: active caching. By that I mean pulling stuff from the Zope server with cronjobs or other means and storing it in Apache folders and using mod rewrite to only deliver static content to users. mod rewrite is your friend. In essence you just take those pages that kill you currently and make them pseudo-static - they are only updated once in a while but the hits won't reach Zope at all.

Another step, of course, is more hardware. If you use ZEO it's no problem to put a farm of Zope servers before your ZEO machine (we currently have 5 dual-processor machines running the Zope instances and two rather big, fat, ugly servers in the background for databases, frontend with two Apache servers that look allmost like dwarves in comparisons to the backend stuff).

Zope is fantastic software - don't mistake me there. I like it. Especially the fact that it is an integrated development environment for web applications and content management is very nice. And the easy integration of external data sources is nice, too. But Zope is a resource hog - that's out of discussin. There's no such thing as a free lunch.

Zope Hosting und Performance

Der Schockwellenreiter hat Probleme mit seinem Zope-Server. Da ich nun schon seit einigen Jahren professionell (in der Firma) Zope-Hosting mache und dabei auch ein paar ziemlich heftige Portale laufen habe (zwischen 2000 und 3000 Hits pro Minute sind da nicht selten - allerdings verteilt auf viele Systeme), hier mal ein paar Tipps von mir zur Skalierung von Zope.

  • Der wichtigste Schritt, den ich jedem empfehlen würde, ist entschlacken. Werft aus dem Zope all das raus was nicht rein muss - was statisch erstellt werden kann, was sich nur selten ändert, wo kein Content-Management nötig ist: raus damit. Packt es in normale Apache-Verzeichnisse. Nehmt Apaches mod_rewrite und sorgt dafür das die alten Adressen weiter funktionieren, aber aus dem Apache kommen. Das betrifft vor allem all die kleinen Scheisserlein wie Layoutgrafiken und so - die brauchen nicht aus dem Zope zu kommen, die werden besser aus dem Apache geliefert.
  • Benutzt das Zope-Caching, wenn irgend möglich. Irgend möglich heisst: genug Speicher auf dem Server, damit auch sich fettfressende Prozesse mal etwas Luft haben. Generell führt das Zope-eigene Caching dazu, das Prozesse immer fetter werden - das Aufräumen im eigenen Cache ist recht unbrauchbar. Also setzt eine Prozessüberwachung ein, die einen Zope-Prozess abschiesst und neu startet wenn er zu viel Speicher braucht. Ja, das ist wirklich sinnvoll und notwendig.
  • Gute Cachingmöglichkeiten gibt es im Zope zwei: den RAMCacheManager und den HTTPCacheManager. Ersterer speichert Ergebnisse von Zope-Objekten im Hauptspeicher und kann daher einzelne Seitenkomponenten cachen - packt das da rein, was komplex zu ermitteln ist.Der zweite (HTTPCache) arbeitet mit einem Squid zusammen. Packt einen Squid als HTTP Accelerator vor euren Zope und konfiguriert den HTTP Cache Manager entsprechend, so das Zope die passenden Expire-Header erzeugt. Dann wird ein Grossteil eures Traffics aus dem Squid betrieben. Der ist schneller als euer Zope. Alternativ zum Squid kann man auch einen Apache als HTTP Accelerator mit lokalem Cache konfigurieren - ideal für die, die keinen Squid installieren können oder wollen, aber durchaus Möglichkeiten zur weitergehenden Konfiguration eines Apache haben.
  • Grosse Zope-Objekte (und das meine ich wirklich im Sinne von KB) machen Zope tot. Beim Caching zerfetzen sie einem die schönste Cache-Strategie und Zope selber wird arschlahm wenn die Objekte zu gross werden. Der Grund liegt in der Zope-Architektur: alle Objekte werden erst mühsam in mehreren Schichten durch mehrere Softwarelayer zusammengedengelt. Im Speicher - und belegen daher im Speicher auch entsprechend Platz. Raus mit den komplexen Objekten mit riesigen KB-Zahlen. Macht sie kleiner. Erstellt sie statisch per Cronjob. Liefert sie aus dem Apache aus - es gibt nix blöderes als seine grossen PDFs alle im Zope in der ZODB abzulegen, oder gar dynamisch dort zu generieren.
  • Installiert ZEO. Das Teil rockt. Im Prinzip ist es nur die ZODB mit einem primitiven Serverprotokoll. Wichtig daran: euer Zope kann auf mehrere Prozessgruppen aufgeteilt werden. Das wollt ihr wenn ihr per Prozessüberwachung einen amoklaufenden Zopeprozess abschiessen wollt, aber das Portal von aussen möglichst unbeschädigt aussehen soll - in dem Fall packt ihr auf den Apache einfach mod_backhand drauf, oder eine andere Balancer-Technik zwischen Apache und Zope. Zusätzlich macht der ZEO auch das Packen der ZODB (sollte täglich laufen) einfacher, da der Pack im Hintergrund auf dem ZEO läuft und die Zope-Server selber nicht gross beeinträchtigt werden.
  • Wenn ihr habt, nehmt einen SMP-Server. Oder kauft euch einen. Wirklich - das bringt ne Menge. Voraussetzung ist allerdings die oben angesprochene Technik mit den mehreren Prozessgruppen - Python hat ein globales Interpreterlock, welches dazu führt das auch auf einer Mehrprozessor-Maschine eh nie mehr als ein Python-Thread gleichzeitig läuft. Von daher wollt ihr mehrere Prozessgruppen.
  • Performance gewinnt man auch durch Ausschaltung von Schichten. Leider ist das oft nur mit Softwareänderungen realisierbar, daher eher für Selberstricker interessant. Packt komplexe Abläufe raus aus dem Zope-Server und packt sie in Zope Products. Zope Products laufen native ohne Beschränkungen im Python Interpreter. Zope Python Scripte und DTML Dokumente werden hingegen durch viele Layer geschleppt die dafür sorgen das ihr euch an die Zugriffsrechte von Zope haltet, nichts böses tut und auch sonst ganz lieb seid. Und sorgen dafür das ihr langsamer werdet. Products lohnen sich - kosten aber Arbeit und sind entgegen der anderen mehr technischen Tipps nicht immer umsetzbar.
  • Zusätzlich hat es sich bewährt nicht zu viel Daten in die ZODB zu packen, vor allem nichts was diese erweitert - die ZODB wird nämlich nur grösser, kleiner wird sie nur beim Packen. Nach einiger Zeit hat man dann locker eine ZODB im GB-Bereich und braucht sich über langsame Serverstarts nicht wundern ...
  • Wenn komplexere Abläufe im System vorkommen kann es Sinn machen die ganz auszulagern. Bei mir kommt dann immer das |TooFPy| zum Einsatz. Einfach die ganzen komplexeren Sachen in ein Tool umwandeln und dort reinhängen - der Code läuft ungebremst schnell ab. Vom Zope dann einfach mit einem SOAP Client oder XMLRPC Client auf den Toolserver zugreifen und die Funktionen dort ausführen. Ja, die mehrfache XML Wandlung ist tatsächlich weniger kritisch als der Ablauf von komplexerem Code im Zope - vor allem wenn dieser Code einiges an Laufzeit beansprucht. Zope blockiert dann nämlich einen seiner Listener - die Anzahl ist statisch. Und einfach hochdrücken bringt nix - dank des globalen Interpreterlocks würden dann nur mehr Prozesse aufeinander warten, das dieses Lock freigegeben ist (z.B. bei jeder C-Erweiterung die zum Einsatz kommt). Für die XMLRPC Kommunikation gibt es eine gute und schnelle C-Implementierung die in Python integriert werden kann, damit ist das XML-Overhead-Problem irrelevant.
  • Wenn ihr PostgreSQL als Datenbank benutzt: nehmt PsycoPG als Datenbanktreiber. Das Session-Pooling bringt Zope erst richtig auf Touren. Generell solltet ihr schauen ob der entsprechende Datenbanktreiber irgendeine Form von Session Pooling unterstützt - notfalls über einen externen SQL Proxy. Denn sonst hängt in Zope bei SQL-Queries unter Umständen das ganze System, weil eine fette Query auf das Ergebnis wartet. Da ist schon mancher drauf reingefallen und hat lernen müssen das 16 Zope-Threads nicht gleichbedeutend mit 16 parallel bearbeiteten Zope-Zugriffen bedeuten muss, wenn SQL-Datenbanken im Spiel sind.

Es gibt natürlich noch eine ganze Menge mehr Sachen die man machen kann, aber die obigen sind weitestgehend aus dem Stand zu bewältigen und hauptsächlich davon abhängig das ihr genug Speicher im Server habt (und evtl. eine Mehrprozessormaschine - geht aber auch ohne). Wichtig ist der Speicher - je mehr desto besser. Wenn geht, steckt halt noch mehr Speicher rein. Man kann nicht zu viel Speicher haben ...

Was machen wenn das alles auch noch nicht reicht (ja, ich hatte das schon - manchmal hilft nur die ganz grobe Axt). Nun, in dem Fall gibt es Abwandungen der obigen Techniken. Meine Lieblingstechnik in dem Bereich ist aktives Caching. Damit meine ich, das im Zope an einer Stelle hinterlegt wird welche Dokumente aktiv gecached werden sollen. Dazu gehört dann auf der Kiste ein Script, das die Seiten vom Zope abruft und in ein Verzeichnis packt. Per Apache Rewrite Rules wird dann dafür gesorgt das von aussen die statischen Inhalte ausgeliefert wird. Im Prinzip sorgt man also dafür das die Seiten die am häufigsten besucht werden und die für diese Technik geeignet sind (also z.B. keine Personalisierungsdaten enthalten) einfach eine statische Seite rausgeht, egal was sonst passiert - die normalen Cachetechniken gehen da einfach nicht brutal genug vor, da geht noch zu viel Traffic durch auf den Server.

Ein weiterer Schritt ist natürlich der Einsatz weiterer Maschinen - einfach weitere Kisten daneben stellen und mit der ZEO-Technik miteinander verbinden.

Zope ist eine fantastische Software - gerade die hohe Integration von Entwicklungsumgebung, CMS und Server ist oft ungemein praktisch und die leichte Integrierbarkeit in externe Datenquellen ist ebenfalls sehr nett. Aber Zope ist ein Ressourcenschwein, das muss man so einfach mal sagen.

Caching für PHP Systeme

Es gibt ja grundsätzlich zwei Wege wie man in einem PHP-basierten System caching betreiben kann. Ok, es gibt viel mehr, aber zwei Hauptwege sind deutlich erkennbar. Ich hab mal zusammengeschrieben was in dem Zusammenhang interessant ist - gerade weil im Moment ein paar Kollegen unter hoher Serverlast zu leiden haben. Das ganze ist allgemein gehalten, betrachtet aber aus begreiflichen Gründen auch die speziellen Folgen für WordPress.

  • Caching von vorkompilierten PHP Seiten
  • Caching von Seitenoutput

Zu beiden Hauptwegen gibt es eine ganze Reihe von Varianten. Die PHP Seiten selber liegen bei Webservern ja als Source vor - unverarbeitet und nicht irgendwie optimiert für den Ladeprozess. Hat man komplexe PHP Systeme laufen, fällt also bei jedem PHP-File das parsen und kompilieren in den internen Code an. Das kann bei Systemen mit vielen Includs und vielen Klassenbibliotheken durchaus beachtlich sein. An diesem Punkt setzt die erste Hauptrichtung des Cachings an: der erzeugte Zwischencode wird einfach weggespeichert. Entweder in shared Memory (Speicherblöcke die vielen Prozessen eines Systems gemeinsam zur Verfügung stehen) oder auf die Festplatte. Hier gibt es eine Reihe von Lösungen - ich selber setze turck-mmcache ein. Der Grund ist hauptsächlich das es nicht im shared Memory cached, sondern auf der Platte (was aber meines Wissens die anderen ähnlichen Lösungen auch tun) und das es für turck-mmcache ein Debian-Paket gibt. Und das ich bisher relativ wenig negative Erfahrungen damit gemacht habe (jedenfalls bei Debian stable - bei Debian testing sieht das anders aus, da fliegt einem die PHP Anwendung um die Ohren). Da WordPress auf einer grösseren Menge von Bibliotheksmodulen mit recht grossem Sourceinhalt basiert, bringt ein solcher Cache eine ganze Menge um die Grundlast von WordPress zu mindern. Da diese Caches in der Regel vollkommen transparent - ohne sichtbare Auswirkungen ausser der Beschleunigung - sind, kann man einen solchen Cache auch generell aktivieren.

Die Zweite Hauptrichtung für das Caching ist das Zwischenspeichern von Seiteninhalten. Hier kommt als Spezialität hinzu, das Seiten ja oft dynamisch abhängig von Parametern erstellt werden - und daher eine Seite nicht immer den gleichen Output erzeugt. Man denke nur an so banale Sachen wie die Anzeige des Benutzernamens wenn ein Benutzer angemeldet ist (und einen Cookie dafür gespeichert ist). Auch aufgrund von HTTP Basic Authentication (die Anmeldetechnik bei der das Popup-Fensterchen für Benutzer und Passwort kommt) können Seiteninhalte unterschiedlich sein. Und POST-Requests (Formulare die ihre Inhalte nicht über die URL mitschicken) produzieren auch wieder Output der von diesen Daten abhängig ist.

Grundsätzlich muss also ein Output-Cache diese ganzen Eingangsparameter berücksichtigen. Eine gute Strategie ist oft die POST-Ergebnisse garnicht zu cachen - denn dort würden auch Fehlermeldungen etc. auftauchen, die abhängig von externen Quellen (Datenbanken) sind und daher sogar bei identischen Eingabewerten unterschiedliche Ausgaben produzieren könnten. Man kann also eigentlich nur GET-Anfragen (URLS mit Parametern direkt in der URL) sinnvoll cachen. Hierbei muss man aber sowohl die gesendeten Cookies als auch die gesendeten Parameter in der URL berücksichtigen. Sofern das eigene System mit basic-Authentication arbeitet, muss auch das mit einfliessen in den Cache-Begriff.

Ein zweites Problem ist, das Seiten selten rein statisch sind - auch statische Seiten enthalten durchaus Elemente die man eigentlich lieber dynamisch haben möchte. Hier muss man also eine wesentliche Entscheidung treffen: reicht rein statischer Output, oder muss ein Mix kommen? Desweiteren muss man noch die Entscheidung treffen, wie sich Seitenaktualisierungen auswirken sollen - wie merkt der Cache, das etwas geändert wurde?

Ein Ansatz den man verfolgen kann ist ein sogenannter reverse-Proxy. Man packt einfach einen normalen Web-Proxy so vor den Webserver, das jeglicher Zugriff auf den Webserver selber technisch durch diesen Webproxy gezogen wird. Der Proxy steht direkt vor dem Webserver und ist somit für alle Benutzer verbindlich. Da Webproxies das Problem der Benutzerauthentifizierung, der Parameter und der POST/GET-Unterscheidung schon gut beherrschen sollten (in der normalen Anwendungssituation für Proxies sind die Probleme ja die gleichen), ist das eine sehr pragmatische Lösung. Auch die Aktualisierung wird von solchen Proxies meist recht gut gelöst - und im Notfall kann der Benutzer durch einen erzwungenen Reload den Proxy ja überreden die Inhalte neu zu ziehen. Leider geht diese Lösung nur dann, wenn man den Server selber unter Kontrolle hat - und der Proxy zieht noch weitere Resourcen, weshalb unter Umständen dafür kein Platz auf dem Server ist. Auch ist es stark von der Anwendung abhängig wie gut sie mit Proxies klar kommt - wobei Probleme zwischen Proxy und Anwendung auch bei normalen Benutzern aufstossen würden und deshalb sowieso gelöst werden müssen.

Der zweite Ansatz ist die Software selber - letzten Endes kann die Software ja genau wissen wann Inhalte neu erstellt werden und was für das Caching berücksichtig werden muss. Hier gibt es wieder zwei Richtungen der Implementierung. MoveableType, PyDS, Radio Userland, Frontier - diese erzeugen alle statische HTML Seiten und haben deshalb das Problem mit der Serverlast bei Seitenzugriffen nicht. Der Nachteil liegt auf der Hand: Datenänderungen erzwingen ein Neuerstellen der Seiten, was bei grossen Sites nervig werden kann (und dazu geführt hat das ich von PyDS auf WordPress gewechselt habe).

Die zweite Richtung ist das Caching aus der dynamischen Anwendung selber: beim ersten Zugriff wird der Output unter einem Cache-Key gespeichert. Beim nächsten Zugriff auf den Cache-Key wird einfach geguckt ob der Output schon vorliegt, wenn ja wird er ausgeliefert. Der Cache-Key setzt sich aus den GET-Parametern und den Cookies zusammen. Wenn Datenbankinhalte geändert werden, werden die entsprechenden Einträge im Cache gelöscht und damit die Seiten beim nächsten Zugriff neu erstellt.

WordPress selber hat mit Staticize ein sehr praktisches Plugin für diesen Einsatzzweck. In der aktuellen Beta ist es schon im Standardumfang mit drin. Dieses Plugin erzeugt wie oben beschrieben für Seiten einen Cache-Eintrag. Und berücksichtigt dabei die Parameter und Cookies - basic Authentication wird bei WordPress ja nicht benutzt. Der Clou ist aber, das Staticize die Seiten als PHP speichert. Die Cache-Seiten sind also selber wieder dynamisch. Diese Dynamik kann jetzt dazu benutzt werden um Teile der Seite mit speziellen Kommentaren zu markieren - wodurch für diese Teile der Seite dann wieder dynamische Funktionsaufrufe eingesetzt werden. Der Vorteil liegt auf der Hand: wärend die grossen Aufwände für die Seitenerstellung wie das Zuladen der diversen Bibliotheksmodule und das Lesen der Datenbank komplett erledigt sind können einzelne Teilbereiche der Site weiterhin dynamisch bleiben. Natürlich müssen die Funktionen dafür dann so aufgebaut sein das sie nicht das ganze Bibliothekswerk von WordPress brauchen - aber zum Beispiel dynamische Counter oder Anzeigen der gerade aktiven Benutzer oder ähnliche Spielereien lassen sich auch in den Cacheseiten somit weiter dynamisch halten. Matt Mullenweg benutzt es zum Beispiel um auch auf Seiten aus dem Cache ein Zufallsbild aus seiner Bibliothek anzuzeigen. Staticize löscht bei Erstellung oder Änderung eines Beitrags einfach den ganzen Cache - sehr primitiv und bei vielen Files im Cache kann das dann schon mal etwas länger dauern, aber dafür sehr wirksam und pragmatisch.

Welche Caches setzt man sinvollerweise jetzt wie ein? Ich würde bei komplexeren Systemen immer gucken ob ich nicht einen PHP Codecache einsetzen kann - also turck mMCache oder Zend Optimizer oder phpAccelerator oder was es sonst noch so gibt.

Den Anwendungscache selber würde ich persönlich nur dann aktivieren wenn er auch wirklich nötig wird durch die Last - bei WordPress kann man ja ein Plugin vorhalten und nur bei Bedarf aktivieren. Denn Caches mit statischer Seitenerstellung haben nunmal so ihre Probleme - Layoutänderungen sind erst nach Löschung des Cache aktiv etc.

Wenn man einen Reverse-Proxy einsetzen kann und die Resourcen auf der Maschine dafür auch ausreichen ist er sicherlich immer zu empfehlen. Alleine schon weil man so selber die Probleme mitbekommt, die in der eigenen Anwendung eventuell bezüglich Proxies drin sind - und die jedem Benutzer hinter einem Webproxy auch Ärger machen würden. Gerade wenn man zum Beispiel Zope benutzt gibt es sehr gute Möglichkeiten in Zope die Kommunikation mit dem Reverse Proxy zu verbessern - dazu ist ein Cache Manager in Zope verfügbar. Auch andere Systeme bieten dafür gute Grundlagen - aber im Endeffekt sollte jedes System das saubere ETag und Last-modified Header produziert und korrekt conditional GET (bedingte Zugriffe die mitschicken welche Version lokal schon vorliegt und dann nur aktualisierte Inhalte sehen wollen) behandeln geeignet sein.

Verzögerte Ausführung mit Python

Der ursprüngliche Text ist auf das PyDS Weblog umgezogen. Der Grund ist das ich mit der neuen Software den Text nicht vernünftig verwalten kann, weil die nötigen Tools hier nicht verfügbar sind (speziell das Sourcecodeformatieren klappt hier nicht, ausserdem ist der Text zu riesig - jedenfalls wenn er als XHTML gespeichert ist).

Visual Studio Magazine - Guest Opinion - Save the Hobbyist Programmer

Ein schon älterer, aber interessanter Artikel der auf ein wichtiges Problem hinweist: Hobby-Programmierer werden immer mehr ausgeschlossen aus der Erstellung von kleinen Hacks und einfachen Lösungen durch die immer komplexeren Systeminterfaces und ständigen Wechsel von APIs und Programmierwerkzeugen in der Windows Welt. Und das ist nicht nur die Windows Welt, die davon betroffen ist. Linux und OS X leiden teilweise genauso darunter.

Klar, es gibt immer noch kleine Helfer mit einfacher Programmiermöglichkeit. Oder Scriptsprachen, die leicht zu lernen und zu benutzen sind - z.B. Python. Aber das ist nicht wirklich eine Lösung für diese Bastler. Was für Hobby-Bastler früher das omnipräsente Basic war, oder zum Beispiel die - wenn auch kranke - Sprache in DBase, das fehlt heute. Kaum noch eine Programmierumgebung die nicht ohne Objektorientierten Ansatz daher kommt. Kaum noch Lösungsansätze die nicht versuchen gleich eine allgemeine Entwicklungsumgebung für komplette Programme zu sein.

Nette Ausnahmen gibts noch - FileMaker auf dem Mac versucht immer noch den Hobby-Hacker anzusprechen. Aber es stimmt trotzdem: die einfachen Einstiege werden weniger.

Selbst AppleScript ist auf dem Mac mitlerweile dermaßen komplex und überfüllt worden, das es kaum noch einem Seiteneinsteiger möglich ist damit gleich mal loszulegen. Einige Ecken von AppleScript sind selbst für alte Programmierhasen wie mich obskur und kompliziert. Dazu kommt natürlich, das es für die ganzen Scriptsprachen zwar viele tolle Möglichkeiten der Integration von Anwendungen gibt - aber gerade diese Teile ausgesprochen mies dokumentiert sind.

Um beim Beispiel AppleScript zu bleiben: zwar gibt es die Anwendungsdictionaries, in denen die AppleScript-Fähigkeit eines Programms dokumentiert wird, aber nahezu alle dort abgelegten Beschreibungen die ich bisher gelesen habe gingen davon aus, das der Nutzer schon komplettes und weitgehendes Wissen über AppleScript und die AppleScript-Strukturen hat (was sind Objekte in AppleScript, wie arbeitet man mit Containern etc.). Obwohl also diese Dictionaries genau dem Hobby-Programmierer als Startpunkt dienen könnten, werden sie von den Erstellern (und das sind die Profi-Programmierer in den Softwarehäusern) so gestaltet, das oftmals sogar nur sie selber damit was anfangen können.

Ähnlich in der Linux-Welt. TCL war mal die Standardscriptsprache für den einfachen Einstieg mit simpler Struktur, geradezu primitivster Erweiterungsschnittstelle und der Möglichkeit selbst für Nicht-Programmierer schnell zu Lösungen zu kommen. Heute besteht TCL in der Standardauslieferung (die dann netterweise oft Batteries-Included heisst - nur dummerweise fehlt die verständliche Anleitung) schon aus Bergen von Paketen, von denen eine ganze Reihe sich mit Metasprachenaspekten beschäftigen (z.B. incrTCL und den darauf und auf TK aufbauenden Widget-Libraries - herrjeh, allein in diesem kurzen Hinweis auf den Inhalt sind mehr für einen Einsteiger unverständliche Wörter drin als Füllwörter), die ein Einsteiger nie kapieren wird.

Und auf die desolate Situation unter Windows mit dem Scripting-Host und den OLE Automation Schnittstellen (oder wie auch immer die Dinger heute heissen werden) brauche ich garnicht eingehen - wer da einmal einen Versionswechsel einer Anwendung mitgemacht hat und seine komplette Lösung komplett neu schreiben durfte dank des totalen Wechsels des Scripting-Modells von zum Beispiel Access, der weiss wovon ich rede.

Letzten Endes nehmen wir (wir == professionelle Programmierer) damit den Endanwendern ein Stück Freiheit weg - die Freiheit rumzuspielen und ja, auch die Freiheit sich selbst in den Fuss zu schiessen. Und ich denke, das gerade in der Welt der freien Software die Programmierer anfangen sollten hieran wieder mal ein paar Gedanken zu verschwenden. Es ist ja nett, das nahezu jedes grössere Programm irgendeine Scripting-Sprache einbettet. Aber weniger nett ist, das nahezu garkeine dieser Einbettungen eine vernünftige Dokumentation ihrer Möglichkeiten hat und nur primitivste Beispiel sowie Komplettlösungen sehr komplexer Anwendungen als Startpunkt fürs Lernen da sind. Gerade Hobby-Programmierer lernen nun mal am einfachsten durch das Lesen vorhandener Tools. Und ja, auch ich bin da nicht unbedingt ein gutes Beispiel, denn der Python Desktop Server hat zwar eine Reihe von Erweiterbarkeiten, die auch gerade für Endanwender gedacht sind - aber auch ich hab da viel zu wenig Dokumentation geschrieben. Irgendwie schade, denn so werden viele Projekte zu inzestuösen Veranstaltungen, weil die eigentlichen Endanwender aussen vor bleiben. Nein, eine echte Lösung hab ich keine - denn gerade bei freien Projekten ist nunmal die Dokumentationserstellung ein oftmals nerviger und unbeliebter Anteil des Projektes und wird deshalb stiefmütterlich behandelt. Ausserdem sind die meisten Programmierer sowieso nicht in der Lage allgemeinverständliche Dokumentationen zu erstellen. Vielleicht ist das aber auch eine Chance für Projekte, die versuchen die Aktivität in Grossprojekten der Open Source Gemeinschaft auf bisher geringer beteiligt sind. Mir fällt da spontan debian-women ein (da Jutta sich damit derzeit beschäftigt). Denn einer stärkeren Beteiligung von Frauen wäre sicherlich auch Dokumentation und Information die nicht unbedingt einen vollausgebildeten Master-Hacker voraussetzt hilfreich. Schliesslich hat nicht jeder Mensch Lust sein ganzes Leben auf das Lernen von immer neuen APIs und Tools zu verwenden ... Hier gibts den Originalartikel.

Kaktusmilbe

112-640-640.jpeg

Zum Grössenvergleich: der Abstand zwischen den Mitten der beiden dunkleren Balken beträgt einen Millimeter! Das Bild ist nicht optimal scharf, da ich das ganze relativ primitiv aufnehmen musste - zum Beispiel hatte ich zum Zeitpunkt der Aufnahme keinen Makro-Einstellschlitten für die Fokussierung, sondern musste das manuell machen. Trotzdem schon beeindruckend was mit relativ geringem Aufwand an Bild herumkommen kann.

Python Community Server

Muensterland.org arbeitet mit dem Python Community Server, daher hier mal ein bischen was von mir dazu. Der Python Community Server ist eine Open Source Implementation des xmlStorageSystem. Hierbei handelt es sich um ein Protokoll auf der Basis von XML-RPC zur Speicherung von statischen Inhalten. Im Prinzip ist der Python Community Server also nichts weiter als ein Webserver mit einem etwas eigenwilligen Upload-Protokoll und ein paar wenigen vorgefertigten CGI's - es gibt Kommentare auf Artikel, es gibt ein Mailformular und ein paar einfache Möglichkeiten eine Website als RDF-Channel zu abonnieren.

Was soll das ganze also? Wieso der Hype über dieses Tool? Wirklich interessant wird es erst durch den Einsatz von Radio UserLand als Client. Denn xmlStorageSystem ist das Protokoll das Radio Userland als Hintergrundsystem benutzt. Darüber werden also die Radio Communities gesteuert.

Radio Userland ist eine Kombination eines News-Aggregators, eines Website-Designers und eines Weblogger-Tools. Der News-Aggregator sammelt News im Internet ein und bietet sie lokal an. Der Benutzer kann dann einzelne Artikel rüberposten in sein eigenes Weblog. Ausserdem kann er mit recht mächtigen Funktionen seine Websitegestaltung vereinfachen. Das besondere an Radio Userland ist, das es im Prinzip eine lokale Website auf dem Rechner des Benutzers ist. Und von dieser Website kann eine Replikation auf andere Server stattfinden. Das kann über Standardprotokolle wie das Blogger-API geschehen (hier werden nur die Weblog-Inhalte transportiert, das Layout bleibt beim Serverbetreiber), über FTP (hier werden statische HTML-Exporte erstellt, im Prinzip ist Radio dann nur ein überdrehtes Mirror-Script, interaktive Features sind demnach stark eingeschränkt) und eben über xmlStorageSystem. Und hier schliesst sich wieder der Kreis zum Python Community Server, denn dieser ist nichts weiter als die Implementation des letzteren.

Es gibt übrigens auch für Linux ein Tool, aber dieses ist eher an den klassischen Weblogger Tools orientiert und bietet keine weitergehenden Layout-Tools an, wie es Radio Userland macht. Und natürlich gibt es jetzt ja auch den Python Desktop Server, der im Prinzip wie Radio funktioniert. Er ist für fast jede Posix-Plattform auf der Python läuft verfügbar.

Ansonsten: einfach mal ein Weblog hier registrieren und einsetzen. Und das ganze mal ausprobieren. Muensterland.org ist bis auf weiteres frei, jeder kann dort ein Weblog aufsetzen. Es ist - an der Domain leicht erkennbar - natürlich primär für die Region Münsterland gedacht, aber auch andere können teilnehmen. Gibt ja auch Exil-Münsteraner

Vergleich von Rollei 6008 und Hasselblad System

Da ich gerade gesehen habe das jemand mit der Suche nach "Vergleich Rollei und Hasselblad" hier vorbeigekommen ist, habe ich mir überlegt, warum ich eigentlich eine Rollei 6008 habe und keine Hasselblad. Zur M6 würde die Hasselblad wesentlich besser passen - beide mechanisch. Die Rollei dagegen ist ein Hightech-Monster. Ok, ein Grund war das die Rollei da stand im Fenster und der Preis gut war, klar. Aber ich hätte sie ja auch stehen lassen können und auf eine Hasselblad warten. Warum also Rollei?

Für mich ist die Rollei in vielen Dingen die Krone der Entwicklung von Kameras mit manuellem Fokus. Ich wüsste nicht mehr viel was man da noch reinstecken könnte. Die Rollei hat eine ganze Reihe von Besonderheiten gegenüber vielen anderen MF-Kameras. An vorderster Stelle steht da die Belichtungsmessung auch bei Lichtschacht. Allerdings ist es das nicht alleine, sondern auch die Art wie die Belichtung gemessen und gesteuert wird. Genau so habe ich mir das immer vorgestellt: eine freie Wahl der Belichtungsmessart, beliebig kombinierbar mit Blendenautomatik, Zeitautomatik oder manueller Nachführmessung. Ok, eine Programmautomatik für den hektischen Einsatz hat sie auch noch. Einfach die Einstellungen auf Automatik stellen, die automatisch sein sollen - steht Blende und Zeit auf Automatik, ist es eine Programmautomatik. Kein alberner Moduswahlschalter.

Dazu kommen natürlich noch die weiteren Features der Rollei, die mich überzeugt haben: eingebauter Motor (der ist nicht schnell, aber er ist eingebaut und dadurch kompakt). Das Rollo an den Magazinen ist ebenfalls eine klasse Sache, damit gibt es keine verlorenen Schieber mehr. Der lange Filmweg bei den Magazinen hilft gegen das lästige Filmplanlageproblem der klassischen Hasselblad und Zenzamagazine. Die elektronische Übermittlung der Empfindlichkeit vom Magazin an die Kamera macht den Magazinwechsel mit verschiedenen Empfindlichkeiten praktikabel und schnell.

Und dann hat die Rollei natürlich noch die "Kür-Features": die 1000stel Sekunde bei den PQS-Objektiven zum Beispiel. Die rein elektronische Übertragung der Signale, was selbst mit den neuen AF-Objektiven keine Änderung im Bajonett erforderte. Die absolut klasse Zeiss-Rechnungen, die wirklich feine Objektive ergeben - auch wenn ich nur ein einziges Objektiv habe (das 2.8/80 PQS). Und ein robustes Gehäuse hat das ganze auch noch.

Mein Fazit: natürlich hätte eines der grossen Hasselblad-Modelle mit der integrierten Belichtungsmessung und einem zusätzlichen Winder viele der Features der Rollei, aber definitiv nicht alle. Und nicht in dieser sehr angenehm zu bedienenden Form. Und schon garnicht zu dem Gebrauchtpreis den ich bezahlt habe.

Hmm. Ich muss dringend mal wieder mit der Rollei losziehen