Microsoft restricts Windows XP activation via the Internet - People, get decent operating systems whose manufacturers do not have such absurd notions of ownership. Or take the opportunity to buy a real computer.

Virtualized Servers under Linux

rHype is an IBM project that was recently published under an Open Source license (GPL). This project is essentially a virtualization machine for Linux. Comparable to IBM's LPARs for mainframes, but naturally designed for much smaller machines.

It could be the ideal complement to Xen - another GPL project for virtualization based on Linux. Taken together, both could become an interesting open source alternative to VMWare.

Virtualized servers are very interesting for many purposes, as usually only a virtual machine is lost in case of problems and the migration of services on virtual machines is easier than moving around real hardware. Better to have a few large boxes with virtualized servers on them than many smaller boxes with dedicated systems.

Virtualized servers in real use can be done with User Mode Linux today. In this case, a Linux kernel is operated as its own process under the actual hardware kernel via special APIs in user mode instead of directly on the hardware. Each virtualized machine has its own user mode kernel, its own memory, and its own virtual disk areas.

Caution with free SSL certificates

Beware of free SSL certificates - the criticism of the unchecked certificates is indeed correct. But the experts are sitting on a misconception here: why should I trust the CAs randomly delivered with my browser more than any other CA?

Of course, if I try to get a certificate from them (e.g., at the Trustcenter), I have to jump through all sorts of hoops to get the certificate. That seems very secure. But who guarantees that all certificates from this CA were issued according to the same pattern? That someone didn't feel like checking and simply confirmed a certificate without verification? Or that something was rigged?

Exactly. There is only the guarantee of the issuer. The company that issues me the certificate essentially checks itself. Of course, in Germany there are regulations for certificate authorities and, as far as I know, these include audits - but who guarantees that everything runs smoothly there? Given the level of corruption going on ...

I don't want to accuse the Trustcenter of anything here - on the contrary, we use their services in the company. But central certification authorities have a serious problem: the security and trustworthiness depend solely on the trustworthiness of the central authority. And browsers come with various certification authorities deemed trustworthy by the browser manufacturer - I don't decide that, someone else does.

This is the classic conflict between centralized certification and decentralized certification via a Web of Trust as it exists with OpenPGP or GPG. Of course, I can't trust everyone there either - but if I trust someone, I set that locally for myself. And this trust is not dependent on whether it is a large company with great boilerplate documents.

Without a Web of Trust structure, certification is still more of a facade than substance. Alongside the pearls, there are also pigs - and that's exactly what ct has found out. Great insight - we've been saying this from the PGP camp for years.

Dialer Madness - the next phase

In Dialerwahn - the next phase Isotopp reports on an IP payment system that generates paid page requests based on logged IPs and the association of these IPs with a user. So far only in use in Austria - but highly stupid. They probably have never heard of IP spoofing, but also not of anonymous proxies and tor ...

IP-based paid services must be based on some form of authorization. Either the classic password technique or better on client certificates. Anything else is highly nonsense and doomed to fail. Anyone who bases billing to end customers on the basis of the logged IP address simply does not understand TCP/IP and the Internet.

Free multidimensional OLAP server for Linux announced - could be interesting if it changes from the status announced to the status implemented.

Apache2, php5-fcgi, php4-fcgi, mod_fastcgi HowTo

Apache2, php5-fcgi, php4-fcgi, mod_fastcgi HowTo provides everything you need to know to run PHP as an FCGI process. And even in German. The little bit of Apache2 in there can be mentally converted to Apache 1.3, the Apache is actually hardly affected.

FCGI offers, in combination with suexec, the possibility to run PHP per virtual host under a dedicated user and thus the possibility in shared hosting environments to set up files in a virtual host so that another user with his PHP cannot read them. You could even run the FCGI-PHPs in a chroot jail to isolate them even more.

In addition, FCGI is often significantly more resource-efficient for PHP, as fewer PHP processes can run than Apache processes and the Apache processes do not become so bloated. If you have many virtual hosts, this can lead to the FCGI processes catching up in number - but then you should consider whether the FCGI processes should not run better on a dedicated machine.

This would be exactly the right thing for simon, especially since I could then also allow PHP for the other users.

Ape can transparently map Python objects in Zope to filesystem objects or PostgreSQL databases. Could be very interesting for work. Can also be used standalone (without Zope).

mod_fastcgi and mod_rewrite

Well, I actually tried using PHP as FastCGI - among other things because I could also use a newer PHP version. And what happened? Nothing. And there was a massive problem with mod rewrite rules. In the WordPress .htaccess, everything is rewritten to the index.php. The actual path that was accessed is appended to the index.php as PATH INFO. Well, and the PHP then spits out this information again and does the right thing.

But when I had activated FastCGI, that didn't work - the PHP always claimed that no input file was passed. So as if I had called the PHP without parameters. The WordPress administration - which works with normal PHP files - worked wonderfully. And the permission stuff also worked well, everything ran under my own user.

Only the Rewrite-Rules didn't work - and thus the whole site didn't. Pretty annoying. Especially since I can't properly test it without taking down my main site. It's also annoying that suexec apparently looks for the actual FCGI starters in the document root of the primary virtual server - not in those of the actual virtual servers. This makes the whole situation a bit unclear, as the programs (the starters are small shell scripts) are not where the files are. Unless you have created your virtual servers below the primary virtual server - but I personally consider that highly nonsensical, as you can then bypass Perl modules loaded in the virtual server by direct path specifications via the default server.

Ergo: a failure. Unfortunately. Annoying. Now I have to somehow put together a test box with which I can analyze this problem ...

Update: a bit of searching and digging on the net and a short test and I'm wiser: PATH_INFO with PHP as FCGI version under Apache is broken. Apparently, PHP gets the wrong PATH_INFO entry and the wrong SCRIPT NAME. As a result, the interpreter simply does not find its script when PATH INFO is set and nothing works anymore. Now I have to search further to see if there is a solution. cgi.fix_pathinfo = 1 (which is generally offered as a help for this) does not work anyway. But if I see it correctly, there is no usable solution for this - at least none that is obvious to me. Damn.

Update 2: I found a solution. This is based on simply not using Apache, but lighttpd - and putting Apache in front as a transparent proxy. This works quite well, especially if I strongly de-core the Apache and throw the PHP out of it, it also becomes much slimmer. And lighttpd can run under different user accounts, so I also save myself the wild hacking with suexec. However, a lighttpd process then runs per user (lighttpd only needs one process per server, as it works with asynchronous communication) and the PHPs run wild as FastCGI processes, not as Apache-integrated modules. Apache itself is then only responsible for purely static presences or sites with Perl modules - I still have quite a few of those. At the moment I only have a game site running there, but maybe it will be switched in the next few days. The method by which cruft-free URIs are produced is quite funny: in WordPress you can simply enter the index.php as an Error-Document: ErrorDocument 404 /index.php?error=404 would be the entry in the .htaccess, in lighttpd there is an equivalent entry. This automatically redirects non-existent files (and the cruft-free URIs do not exist as physical files) to WordPress. There it is then checked whether there really is no data for the URI and if there is something there (because it is a WordPress URI), the status is simply reset. For the latter, I had to install a small patch in WordPress. This saves you all the RewriteRules and works with almost any server. And because it's now 1:41, I'm going to bed now ...

All Clear: Mozilla is not disabling Umlaut Domains - although the solution is equivalent to disabling them: the browser simply displays the Punycode notation. So you can enter a Umlaut domain and land on the correct server, but that's it.

rdiff-backup and duplicity

rdiff-backup and duplicity are very practical backup tools that use the rsync algorithm to efficiently copy over the network and, unlike normal rsync, also store historical versions. rdiff-backup uses a mirror+reverse-delta format and duplicity uses a base-version+forward-delta format. The latest version of librsync, on which both projects are based, should also transport Mac OS X metadata, so it could also be useful for OS X. I have already successfully used rdiff-backup for Linux backups.

mod_dosevasive is an Apache module that attempts to detect DOS attacks and then can hand them off to other mechanisms to block the attacker. It can, for example, generate firewalling rules that block this attacker. However, you should not run it on an SVN host, as an SVN update may under certain circumstances look like a scripted attack ...

In case of side effects, contact your software manufacturer ...

Microsoft vs. Wine: Deja Vu on the FUD Front describes how Microsoft's WGA stuff - checking a legal system software according to Microsoft's definition even for updates for normal applications - makes the update of applications that run under Wine or Crossover Office (Windows emulators under Linux) impossible.

Let me spell that out for you: You can have a legal copy of Microsoft Office, and because you choose to run it on a Linux box using Wine, you won't be able to update it.

Who wants to know how secure or rather insecure the T-Mobile pages in the USA are (they were recently hacked), here is a small analysis: Ethical Hacking and Computer Forensics: Secret Service hacker, how did he do it? The result is that the hacker apparently used normal SQL injection or something similar and that it is quite easy to insert false information due to the system structure of their server.

heise Security - Know-how - Consequences of the successful attacks on SHA-1 explains quite well what hash algorithms mean in security technology and how the current situation regarding SHA1 is to be assessed. Worth reading.

Alternative Rewrite Rules result in a significantly simpler .htaccess, especially one that doesn't constantly need to be updated by WordPress. This is particularly practical if you also use the .htaccess for other purposes. Additionally, Apache is not necessarily faster with the complex Rewrite-Rules from WordPress. I have activated them myself, let's see how WordPress 1.5 performs with these entries. If there are no problems, they will stay that way, because I like them much better than the other variant. And they don't have the problems that the others have - old mod_rewrite can only do greedy matching, which makes creating complex lists of rewrites quite hairy ...

heise online - When Computer Oldies No Longer Want to Work [Update]. Great, the C64 was a duck and in reality it's something much worse ...

Cryptographic method SHA-1 cracked - ouch. If Bruce Schneier's assessment is correct, then that's it for SHA-1. A switch to SHA-256 or SHA-512 seems to be in order (though this had been hinting at it recently anyway).

sohu-search is a weird bot

The Sohu.com Search Bot Is Acting Strange

The search bot from sohu.com is currently crawling my pages. So far, so good. It uses robots.txt, which is already a good sign. But there are two things that really puzzle me:

First, it accesses every page twice. Once with a HEAD request and once with a GET request. That's pretty stupid for several reasons. On one hand, you can handle it directly using Conditional GET, and on the other hand, it provokes double page generation for dynamically generated pages — because even though the HEAD request only fetches the header lines, for example to calculate the Content-Length, the page still has to be generated anyway (of course, this depends on how the generating system is written).

Second, every few pages it accesses a page called abcdefghijklmn.htm. And I really don't understand what that nonsense is supposed to be. Some kind of keep-alive check? No idea. Very strange.

Workaround for IDN Spoofing Issue - Simply block all URIs that contain name components outside of 7bit-ASCII using the AdBlock extension.

Internet Explorer 7 beta due out this summer - and apparently only for Windows XP SP2. Great. This means all those heaps of broken Windows systems out there will continue running around with the messed-up IE versions. On the other hand - if you look at how IE has developed, do you even want a new version to spread?

Mozilla removes support for umlaut domains

Mozilla removes support for umlaut domains - in my opinion, the only right reaction. The IDN stuff is just nonsense without any real sense anyway. Sorry, but umlaut domains that only work on the web but not in email are just a disaster waiting to happen. And the technical implementation - the fact that only a small subset of Unicode can even be mapped - is also ridiculous. All just to boost domain marketing and stroke the egos of some idiots...

Neohapsis Archives - Full Disclosure List - #0258 - [Full-Disclosure] Advisory: Awstats official workaround flaw - I've now put that part behind password protection and that's the end of exploits. Without proper security measures, you can pretty much forget about awstats.pl - it seems to be a classic Swiss cheese...

News.Individual.DE no longer free from 1.4.

The news server news.individual.de will soon be a paid service because no sponsors could be found. I learned about this through Rabenhorst. It's really a shame that it can't continue to be operated for free. Well, the server's performance is so good that 10 euros is definitely worth it to me.

phpOpenTracker is a live access analyzer for websites. It can be integrated directly into PHP applications or data can be collected from static websites via web bugs (small invisible graphics). You can use it to learn quite a lot about user behavior on websites. And Asymptomatic is currently working on a WordPress plugin for it, which will allow you to see the corresponding evaluations in the WP backend...

Cooperative Linux is a Linux kernel that runs as a normal process within Windows. Weird.

Etomite Content Management System

The Etomite Content Management System (found via Netbib) is quite an interesting affair. What I don't like so much about the CMS: the default theme. Sorry, but it's colorful and looks to me like Windows. Besides, it uses a table layout, which I also don't like so much. But otherwise I have to say, this thing has something to it. The backend in particular is very interesting - it uses JavaScript and DHTML extensively, which of course isn't so great if you don't like JavaScript. But it offers a whole lot of interactive features that are quite nice - for example, feedback on ongoing actions, automatic updating of various interface elements, and overall quite smooth operation.

I also like the idea of snippets - something like nuggets in PyDS. Small code snippets that you simply store in the database and then retrieve in templates via tags. Very practical, as you can often build simple smaller extensions this way without having to reinvent the wheel.

The automatic caching is also quite interesting - nothing really new, but in this case a nice idea: you can specify for the elements themselves whether they should be cached or not. And for each element individually. Significantly better than the usual all-or-nothing approaches of other CMS.

Overall, Etomite is much more full-CMS-oriented than blog-oriented. Functionally, that puts it more in a group with Drupal than, say, WordPress. There are already a number of snippets for easy extension, as well as themes. Various language files already exist too. Documentation exists as well, and even at first glance it's quite usable for getting started.

The license is GPL, which is good. However, a special notice appears on first login that cannot be removed - actually, something like that conflicts with the GPL, because the GPL specifically says that I can do pretty much anything with the package, as long as I make the modified source available. Ok, I can't claim it's from me and I must preserve original internal copyright notices, but otherwise I can change everything. And that normally includes notice texts. Forced links and forced notices are simply incompatible with the GPL. Either you have to explicitly extend the GPL to include this notice - which then makes it a GPL+addendum that becomes incompatible with the standard GPL - or you refrain from forced notices. This is a not unknown problem for people with the GPL, but something like this can definitely be troublesome in commercial use.

Has anyone ported Kubrick to Etomite? I'd need a somewhat nicer theme than the one supplied for my experiments.

How to determine the geographic location from a dynamic IP address. Ouch. Sure, the ISPs have names for their dynamic dial-in nodes and routers, etc., so the information must be retrievable from that somehow. So much for the idea of being anonymous through dynamic dial-in ...

And log files again

Since I had an interesting study object, I wanted to see how much I could uncover in my logfiles with a bit of cluster analysis. So I created a matrix from referrers and accessing IP addresses and got an overview of typical user scenarios - how do normal users look in the log, how do referrer spammers look, and how does our friend look.

All three variants can be distinguished well, even though I'd currently rather shy away from capturing it algorithmically - all of it can be simulated quite well. Still, a few peculiarities are noticeable. First, a completely normal user:


aa.bb.cc.dd: 7 accesses, 2005-02-05 03:01:45.00 - 2005-02-04 16:18:09.00
 0065*-
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4031994 ...
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4031612 ...
 0001*http://mudbomb.com/archives/2005/02/02/wysiwyg-plugin-for-wo ...
 0001*http://www.heise.de/newsticker/meldung/55992
 0001*http://log.netbib.de/archives/2005/02/04/nzz-online-archiv-n ...
 0001*http://www.heise.de/newsticker/meldung/56000
 0001*http://a.wholelottanothing.org/2005/02/no_one_can_have.html

You can nicely see how this user clicked away from my weblog and came back - the referrers are by no means all links to me, but incorrect referrers that browsers send when switching from one site to another. Referrers are actually supposed to be sent only when a link is really clicked - hardly any browser does that correctly. The visit was on a defined day and they got in directly by entering the domain name (the "-" referrers are at the top and the earliest referrer that appears is at the top).

Or here's an access from me:


aa.bb.cc.dd: 6 accesses, 2005-02-04 01:11:56.00 - 2005-02-03 08:27:09.00
 0045*-
 0001*http://www.aylwardfamily.com/content/tbping.asp
 0001*http://temboz.rfc1437.de/view
 0001*http://web.morons.org/article.jsp?sectionid=1&id=5947
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4029220 ...
 0001*http://sport.ard.de/sp/fussball/news200502/03/bvb_verpfaende ...
 0001*http://www.cadenhead.org/workbench/entry/2005/02/03.html

I recognize myself by the referrer with temboz.rfc1437.de - that's my online aggregator. Looks similar - a lot of incorrectly sent referrers. Another user:


aa.bb.cc.dd: 19 accesses, 2005-02-12 14:45:35.00 - 2005-01-31 14:17:07.00
 0015*http://www.muensterland.org/system/weblogUpdates.py
 0002*-
 0001*http://www.google.com/search?q=cocoa+openmcl&ie=UTF-8&oe=UTF ...
 0001*http://blog.schockwellenreiter.de/8136
 0001*http://www.google.com/search?q=%22Rainer+Joswig%22&ie=UTF-8& ...
 0001*http://www.google.com/search?q=IDEKit&hl=de&lr=&c2coff=1&sta ...

This one came more often (across multiple days) via my update page on muensterland.org and also searched for Lisp topics. And they came from the shock wave guy once. Absolutely typical behavior.

Now in comparison, a typical referrer spammer:


aa.bb.cc.dd 6 accesses, 2005-02-12 17:27:27.00 - 2005-02-02 09:25:22.00
 0002*http://tramadol.freakycheats.com/
 0001*http://diet-pills.ronnieazza.com/
 0001*http://phentermine.psxtreme.com/
 0001*http://free-online-poker.yelucie.com/
 0001*http://poker-games.psxtreme.com/

All referrers are direct domain referrers. No "-" referrers - so no accesses without a referrer. No other accesses - if I analyzed it more precisely by page type, it would be noticeable that no images, etc. are accessed. Easy to recognize - just looks sparse. Typical is also that each URL is listed only once or twice.

Now our new friend:


aa.bb.cc.dd: 100 accesses, 2005-02-13 15:06:16.00 - 2005-02-11 07:07:55.00
 0039*-
 0030*http://irish.typepad.com
 0015*http://www208.pair.com
 0015*http://blogs.salon.com
 0015*http://hfilesreviewer.f2o.org
 0015*http://betas.intercom.net
 0005*http://vowe.net
 0005*http://spleenville.com

What stands out are the referrers without a trailing slash - atypical for referrer spam. Also, just normal sites. Also noticeable is that pages are accessed without a referrer - hidden behind these are the RSS feeds. This one is also easily distinguishable from users. Especially since there's a certain rhythm to it - apparently always 15 accesses with one referrer, then switch the referrer. Either the referrer list is quite small, or I was lucky that it tried the same one with me twice - one of them is there 30 times.

Normal bots don't need much comparison - few of them send referrers and are therefore completely uninteresting. I had one that caught my attention:


aa.bb.cc.dd: 5 accesses, 2005-02-13 15:21:26.00 - 2005-01-31 01:01:07.00
 2612*-
 0003*http://www.everyfeed.com/admin/new_site_validation.php?site= ...
 0002*http://www.everyfeed.com/admin/new_site_validation.php?site= ...

A new search engine for feeds that I didn't know yet. Apparently the admin had just entered my address somewhere beforehand and then the bot started collecting pages. After that, he activated my newly found feeds in the admin interface. Seems to be a small system - the bot runs from the same IP as the admin interface. Most other bots come from entire bot farms, web spidering is an expensive affair after all ...

In summary, it can be concluded that the current generation of referrer spammer bots and other bad bots are still quite primitive in structure. They don't use botnets to use many different addresses and hide that way, they use pure server URLs instead of page URLs and have other quite typical characteristics such as certain rhythms. They also almost always come multiple times.

Unfortunately, these are not good features to capture algorithmically - unless you run your referrers into a SQL database and check each referrer with appropriate queries against the typical criteria. This way you could definitely catch the usual suspects and block them right on the server. Because normal user accesses look quite different.

However, new generations are already in the works - as my little friend shows, the one with the missing slash. And thanks to the stupid browsers with their incorrectly generated referrers (which say much more about the browser's history than about actual link following), you can't simply counter-check the referenced pages, since many referrers are pure blind referrers.

Apparently disguised bot in the logs

I just found some referrers in my logs that I absolutely couldn't find anything on that would point back to me. Nothing unusual so far - referrer spam would be the first suspicion. But the sites mentioned in the referrers are perfectly normal weblogs and other sites - no one who would have reason to spam their site (for example, a blog with about 1 post per month, or an Irish site and a few other strange referrers). The numbers are also different than with normal referrer spam: that usually comes either only 1-2 times or if so with many addresses and each one then about 100x or similar. This one comes about 15 times.

So I dug around in the logs a bit to see if I could find something. And sure enough, the referrers have unusual characteristics: they don't end with a /. Normally an address that doesn't end with / is automatically redirected to the /-variant. Referrers are thus normally /-terminated or direct HTML pages or something comparable. Pure site specifications without a / at the end are rather rare.

Something else also stands out: the pages were actually accessed - or at least downloaded. And the pages belonging to one referrer are quite randomly mixed - with normal users you'd actually expect some form of consistency in what comes through as a referrer. Above all, it's rare for 15 links to come to one page all at once...

And the essential criterion: the IP of the accessing computer is always the same across the different ones. An analysis then produced the following picture:


 15 betas.intercom.net
 15 blogs.salon.com
 15 hfilesreviewer.f2o.org
 30 irish.typepad.com
 5 spleenville.com
 5 vowe.net
 15 www208.pair.com

All clearly fake referrers. Additionally, 34 accesses to my RSS feeds without a referrer. Accesses were only to direct posts and RSS feeds - not to overview pages or archive pages. It looks very much like the bot is proceeding as follows: search for RSS feeds, grab them, then search for permalinks to articles in them and download them to access comment forms, for example. The whole thing nicely disguised as supposed visitors, including forged referrers that seem unsuspicious. Also not too many accesses from one referrer, rather switch it up more often.

Actually nothing new - with email spam, forged real senders are quite common and usual to be harder to filter. But with scraper bots, I'm seeing this kind of mimicry live for the first time - I've only been observing these symptoms for about 1-2 weeks now.

For admins, this whole thing is quite annoying, since you can use referrer logs even less than you could before. Previous referrer spam was certainly a nuisance, but due to the pretty dumb names of the referrers it was easy to recognize. This form of log phenomenon also falsifies the referrers - but is much less noticeable. Could be interesting for weblogs that display their referrers directly in the post.

And of course the problem remains that I still don't know what the bot wants to do with the collected information. Although I'm strongly suspecting spam, but that's just a guess - could also be a bot searching for typical security holes. In any case it's a bot and in any case it has no good intentions - because otherwise it wouldn't need to hide.

Matching my previous, longer, text: Weblog Tools Collection suffers from Referer Spam DoS. Such birds - that is, referrer spammers going into the thousands in terms of accesses - have (yet?) not shown up in my log analysis.

Don't be surprised about the content of my blog...

... there's just a rogue admin with a stupid script that messed everything up and destroyed all the content. Somehow everything is being reconstructed and repaired and ironed out and folded back together. Somehow. And afterwards I'm going to stand in the corner and flog myself ...

Update: now everything has been largely restored. What happened: I switched from Exhibit to my own plugin for images. And in doing so, I rewrote all posts with image entries via script. But in the generated UPDATE, I stupidly forgot the WHERE clause ...

Losses: my post about the DGB and the wording in today's posts. Everything else was reconstructed from a backup. And a few nerves. And I've written it behind my ears for the x-th time that I should make a current backup before tinkering in the future. Which of course won't do any good, because I can't read behind my ears without two mirrors ...

Update 2: and of course I was so great during the weblog reconstruction that I also overwrote the changed image posts, so now all posts in the picture blog are without photos. I can't believe it. It's either a full moon or something today ...

Which means I have to get creative again to pull the images back into the posts, because of course I deleted all the mapping tables, since I don't need them anymore. But I still have them all in the backup, so it won't be as bad as before

Update 3: now everything should be largely back the way it was. And the last repair actually went without major catastrophes

What to expect when updating MySQL 4.0 to 4.1. Okay, database version upgrades are never easy and can always cause problems.

Buffer Overflow in numerous Symantec products - Ouch!

Don't reset existing password on request, prevent DoS password reset abuse | drupal.org

Don't reset existing password on request, prevent DoS password reset abuse - well, I noticed exactly this problem too and couldn't believe that someone actually built something like that into a CMS. In Drupal, you can change the password for a user - any user at all. The new password is then sent to that user by email. So you can't gain illegal access through this, unless you can intercept the user's emails (which shouldn't normally be the case). But you can lock out an admin: simply set up a job that resets the admin's password every minute. And then use this forced absence of the admin to completely spam the Drupal site, for example.

That's really an embarrassing oversight. Unfortunately, it's made far too often and far too frequently. So if you operate Drupal, the patch is recommended (be careful, the author submitted two patches, the first one was still buggy). It installed without any problems and at least fixes the admin lockout. Of course, you still get annoying emails in the process.

Yep, Drupal is going to drive me crazy

Clearly. I don't know what it has against me, but it hates me today. Really massively.

I simply copied the kubrick-theme under my own name so I could customize it without changing the Kubrick-theme itself. Funnily enough, it's now not using the phptemplate-engine anymore. Or more precisely: the entry in the system table (type='theme' and then for the page.tpl.php) doesn't point to phptemplate.engine, but to phptemplate - the .engine is missing. When I add it via update, it works exactly once. After that, this entry in the system table gets overwritten and .engine is gone and the template is broken. Of course, Kubrick doesn't do that. And of course, you can't find any information anywhere about where the heck the theme says which template engine should be used - and how this entry in the system table is created. No, simply grepping for phptemplate.engine doesn't help.

Ok, now it's clear to me that the engine creates the entries - at least after I took a closer look at the engine source. It searches for page.tpl.php and when it finds it, it connects it with the phptemplate.engine. But why would the engine enter its own name incorrectly? Especially since it does it correctly with Kubrick. I just unpacked that into the themes directory.

Alright, so let's keep investigating. A grep -r for INSERT in combination with system then finds the function system_obtain_theme_info in the system.module, where these statements are written. But how and where exactly something is done with it there - sorry, but you can't figure that out without longer study. Somehow the description attribute gets filled with a value that ends with .engine for the Kubrick-theme and doesn't for all others. Kubrick references the theme engine exactly and correctly, but an arbitrarily named copy of Kubrick with identical content references a theme engine without .engine in the name and doesn't work. Great. But renaming Kubrick works. Huh?

Ok, next approach. Rename my template to something else and rename Kubrick to my actually desired name. Complete confusion: my template doesn't work, but the now-kubrick-named one that didn't work before doesn't work either. Uh... So I renamed the Kubrick to something else. And tried my temporarily stored one. That works now. Under a name that isn't Kubrick. Huh? Shell game? Should I just rename the themes around until I eventually have a working one under the name I want and then call it done?

So I tried to resolve the shell game. Computers are deterministic machines after all, that should be possible. Ok, both templates (original Kubrick and my Hugo) renamed. To aa and bb. And which one works? The one called bb. Did the whole thing again, just this time swapped the roles. aa becomes bbb and bb becomes aaa. Which one works now? The one called "bbb". When two phptemplate.engine-based themes are installed in the system, only the last one found in the system at the time themes are being searched works. The others break.

So now I first have to figure out what's wrong with the old themes, why they can't be made to work. First approach: make a database dump and grep to see where all my friends show up. While doing that, I found what's up with the mysterious phptemplate without .engine: the corresponding entries contain a chr(0) instead of the period. Ascii-null. MySQL stores it, but PHP cuts it off on access. And for all the old templates, there are all these broken entries. Also, the engine remembered in the phptemplate extra_templates entry in the variable table which themes it had already seen.

Another clean room test: throw out the entries in the system table with type='theme' and description like 'themes/engine/phptemplate%'. Then it knows nothing more about the themes and their names. Then only have my desired template and activate it. And behold, it works right away. Then unpacked Kubrick. And it works. But after that, my own theme doesn't work anymore. As expected - Kubrick comes after hugo alphabetically. Delete Kubrick again and my own theme works again - after appropriate refresh.

So investigate where the heck this is happening and why. It only happens with the phptemplate.engine themes. The xtemplate.engine themes work without problems. Although it turns out they do it despite the bug - it affects them too. Because in system.module in system_theme_data (how I figured that out I'll spare the readers - it was just successive inserting of echo statements to see when and where things break) it gets destroyed in the last step - in the call to system_obtain_theme_info - on the files the description element. And that's what gets saved in the system table to reference the theme engine. Only the last theme of an engine keeps the correct entry, all others are broken.

Hmm. The basename call on line 336 is the only suspect - it basically only delivers the theme engine without the .engine suffix. But it shouldn't change the actual field, so I hadn't paid much attention to it before - the PHP documentation says nothing about side effects of this function. But when I comment out the entry, my theme works and Kubrick too - simultaneously. But the PHP manual says nothing about basename changing the original string.

So I wrote a small test script that just makes a basename call. Ugh. Yes, that's it - basename changes the original string, and it puts a chr(0) in place of the period. And behold, there's a bug report from 2002 about it - yes, I'm running an old PHP 4.1.2 version, since Debian Stable. The bug report has a workable solution for my problem - just put the variable in "" and work with string interpolation. And behold, problem solved. And make a note to remember: in 4.1.2 basename breaks the source variable.

And a programmer spends debugging time on crap like this (I mean the bug, not Drupal)

I could have learned a decent job. Whisky barrel keeper at Jack Daniels, for example...

New phishing attack possible in many web browsers

Read on golem: New Phishing Attack Possible in Many Web Browsers. Great. Once again, a sloppily implemented solution and a sloppy standard. The whole umlaut domain stuff is nonsense anyway, and you have to wonder why it was implemented in the first place - the mere fact that this garbage only works for websites and IDNs can't be meaningfully used for anything else should have made anyone realize what a ridiculous idea it is. And now it's also a phishing hole.

Doing the GNUstep two-step

Doing the GNUstep two-step is an older report about the GNUStep Live CD. I'm linking it only because it describes a problem that has also annoyed me: the CD doesn't boot. Which is pretty stupid for a live CD. And no, the argument that it doesn't boot on some old computers doesn't hold up — the computer having the problem here is just one year old.

The GNUStep Live CD developers really should tackle and solve this problem — because if all kinds of Knoppix variants boot on a computer, and even the Gnoppix CD boots — then there's no reason why the GNUStep CD shouldn't boot. And no, floppies are not an alternative — the computer has no floppy drive. It's just too new for that...

If I ever want to take a look at web-based project management, dotproject - Open Source Project and Task Management Software looks quite usable.

Oralux: Linux for the Blind

Since we're on the topic of live CDs: Oralux is a live CD with a Debian-based distribution that is specifically designed for people with visual impairments. It asks very early for a speech interface and is overall designed so that you can control it by voice.

Bill Gates commits to interoperability and makes the insightful observation that open source leads to too many similar solutions, which is why interoperability needs to be tested more and that's a problem. Translation: Bill Moppelkotze thinks open source is annoying because it nibbles away at his monopoly

Cosmic near miss in 24 years - so if it does hit us in 2037, we could manage to be wiped out before the Unix epoch overflow gets us

fallback-reboot is a small daemon that locks itself into memory (so it cannot be swapped out) and then waits for a password on a port. When the password arrives, the machine is rebooted without any security precautions or disk sync. Interesting as a last resort when the machine still responds to pings and similar, but you can't get a shell prompt anymore.

Zope.org - FileStorageBackup is a description of many useful tips on how to handle ZODB database files. Specifically replication, backup, repair - basically everything that will bite you in the ass sooner or later when running larger Zope systems.

Microsoft: Error in Buffer-Overflow Protection is Not a Vulnerability - sure. Bill Gates wants to make the net more secure. So a bug is simply not declared as a security vulnerability. Never mind that you can catch a Trojan or virus on Windows faster than you can say boo - and that this one can use exactly this hole to knock down the whole nice security system. What nonsense again...

fast small web servers

lighttpd is a small, fast web server with a quite impressive feature set and the clear goal of being faster and more resource-efficient than Apache. CGI, FastCGI, and PHP (via FastCGI) are also supported, making it suitable for dynamic pages as well. Maybe I should take a closer look at it.

leahhttpd is another small web server with a focus on low resource usage and high performance. Here too, there's quite an impressive feature spectrum.

boa is the grandfather of web servers with a performance and resource focus. However, it only offers CGI as an option for dynamic content. So it's better suited for serving purely static content.

Of all three, lighttpd looks the most interesting, among other things because of its good support for interfaces for dynamic content. Especially since the server already has a built-in FastCGI load balancer, making it designed for larger loads right out of the box. And the focus on FastCGI instead of built-in modules offers additional possibilities for security - the FastCGI process can run under a different (restricted) user.

Strange Business Ideas at Providers

As much as I like Hetzner as a provider, sometimes they come up with weird ideas. Now you can also get additional IPs for your entry server (their starter package). However, these cost a monthly fee per IP - which is actually pretty strange, since IP addresses aren't supposed to be sold according to RIPE - but okay, whatever. I'd be willing to pay a moderate amount for an additional IP.

But the idea that my 250 GB free volume only applies to the main IP and every started GB on the additional IPs has to be paid for, even if there's still plenty of free volume on the main IP - sorry, but that's just plain stupid. That way you end up paying double and triple for the additional IPs. No way. A second IP address for test installations or an isolated chroot jail with isolated software setup isn't that important to me.

In fairness, it should be mentioned that the next larger root server solution from Hetzner does have IPs as needed without a monthly fee. But how the free volume is distributed there, I don't know - it's not clear from their websites anyway.

Well, up until now Hetzner has always surprised me by eventually just dropping strange and absurd ideas and replacing them with sensible solutions (like the long overdue emergency boot system that's now available, or the option for hardware upgrades on the entry server). I'm not giving up hope on the additional IPs either.

Heise.de down due to DDOS

Der Schockwellenreiter has the press release from Heise about it. Something like that is really awful and I'm keeping my fingers crossed for the Heise technicians that they get it under control as soon as possible. As a sysadmin, you always suffer along with something like this.

IBM drags Intel into SCO case

GROKLAW has reported that IBM is pulling Intel into the proceedings against SCO with a subpoena and wants to force them to testify. Interesting - because as far as I know, Intel hasn't been part of the discussion so far as to whether they could have anything to do with it. The fact that IBM is bringing them in by means of a subpoena certainly suggests that IBM believes Intel knows something that Intel is unwilling to disclose voluntarily.

Nuclear Elephant: DSPAM

Nuclear Elephant: DSPAM is a Bayesian spam filter. However, it's one that doesn't just run for a single user, but typically for an entire group of users. I have it running on simon.bofh.ms to scan all the mailboxes there - it integrates well and has a whole range of interesting features. On one hand, there's the web interface for managing the spam filter, and on the other hand, there's the quite pragmatic method for reporting false detections to the filter. Also nice is the quite broad support for databases (MySQL, PostgreSQL, SQLite, and several db* types). Overall, it makes a really well-rounded impression - the only downside is the lack of translation for the interface.

Whether it actually filters well, I of course can't say yet due to lack of volume - the emails first need to accumulate and be trained. User reports are, however - typical for Bayesian spam filters - quite positive.

Found at Schneier on Security: the weakest link. So much for the topic of security.

sysadmin - 1.2.2005 - 25.2.2005

The Sohu.com Search Bot Is Acting Strange

Clearly. I don't know what it has against me, but it hates me today. Really massively.