Archive 7.2.2005 - 14.2.2005

Wacom Cintiq 21UX Touch Screen Flat-Panel - awesome. 21-inch display with touch screen and graphics pad functionality. Finally, paint directly on the display with styluses. Does anyone have $2500 to spare for me?

How to determine the geographic location from a dynamic IP address. Ouch. Sure, the ISPs have names for their dynamic dial-in nodes and routers, etc., so the information must be retrievable from that somehow. So much for the idea of being anonymous through dynamic dial-in ...

For a given reason ...

... I point out that I simply delete trackbacks from blogs if their sole purpose is to promote some obscure Amazon shops. Sorry, but just because advertising junk is stored in a weblog software doesn't mean I let every inappropriate trackback through. And no, just because a keyword from the post also appears in one of my posts doesn't make it an interesting trackback—it's just spam.

CSS and IE and Safari 1.0

I always post source snippets and log file excerpts and stuff like that. For this I use the PRE tag so the stuff is displayed preformatted and in a monospaced font. It works well with all browsers. But a couple of browsers are giving me quite a bit of trouble. First of all Safari 1.0 - ok, that's inevitably dying out and is only a problem in that the horizontal scrollbar obscures the bottom line. You can work around that if necessary with a blank line.

But IE for Windows is also acting up - users tell me that the width is always complete, without a scrollbar. I don't have Windows here, I can't test it here, but that would be annoying of course - I can't use PRE on the front page, otherwise it messes up the layout.

Really extreme is IE 5.5 Mac: it hides the PRE completely. And I don't understand why. They simply aren't displayed. The page validates of course. Well, IE Mac 5.5 will hopefully soon be extinct too and the poor folks still using it have my sympathy, but no source code.

But for Windows IE I'd be grateful for a tip on the CSS problem. If you can fix it with normal CSS means and without too heavy-handed hacks, I could build that in. Here's an example article with PRE blocks.

Gravatars in the Comments

So, I've added Gravatars to the comments. Anyone who has one will now be displayed with a picture. At the moment though, the distribution of Gravatars is still a bit sparse - I find them kind of fun, as they make commenters somewhat more personally recognizable. Not just anonymous names in the background.

Since Gravatars are pulled based on the email address entered: this will definitely not be published by me. Gravatars use an MD5 hash of the email address, so the address cannot be reconstructed from the link. And besides, WordPress doesn't publish the email anywhere else anyway.

But if you still don't want to enter your regular address: I have 50 Google Mail invites left over. If you send me a message via my feedback form, you can get one and use that instead. Google Mail has a pretty decent spam filter and with 1 GB of storage space it takes a very long time to fill up if you don't empty it. Perfect as a throwaway account...

And if you don't want that either, you'll just get my default Gravatar and then you'll just look a bit pale.

Jens Voigt dominates at the Mediterranean tour - and demonstrates that we can probably count on great performances from him at the Tour again this year. A fantastic start to the season.

mozdev.org - conkeror

That's what I call dedication - in the documentation for a purely keyboard-controlled Mozilla:

You should never have to reach for your mouse. To make sure Conkeror remains pure, I do not own a mouse.

So if you're a mouse-phobic, you might find some relief with this browser.

And because I'm an experimentally inclined fellow, I naturally had to try it out right away. Ok, Emacs key bindings are terrible (hey, I'm a VI guy) but still the whole thing is quite usable - you could get used to it if only the other applications on your system had similar controls. And here's a tip for Mac users: yes, the whole thing works for you too. However, you do need to start the browser with a parameter, but that's not supported by Firefox.App. Instead, just enter the following command in the terminal (warning, one line!): /Applications/Firefox.App/Contents/MacOS/firefox -chrome chrome://conkeror/content

You may need to adjust the path to Firefox.App. After that, a small window opens with a rather spartan help file. Read it thoroughly, because if you don't at least remember how to open the help page, you'll be stuck. The big B goes back in the history, so if you get lost, you can always get back to the help with it. Oh yes, and to quit doesn't work with Apple-Q - after all it's Emacs. So press Ctrl-X and C one after the other.

If search engine promoters find nothing...

And log files again

Since I had an interesting study object, I wanted to see how much I could uncover in my logfiles with a bit of cluster analysis. So I created a matrix from referrers and accessing IP addresses and got an overview of typical user scenarios - how do normal users look in the log, how do referrer spammers look, and how does our friend look.

All three variants can be distinguished well, even though I'd currently rather shy away from capturing it algorithmically - all of it can be simulated quite well. Still, a few peculiarities are noticeable. First, a completely normal user:


aa.bb.cc.dd: 7 accesses, 2005-02-05 03:01:45.00 - 2005-02-04 16:18:09.00
 0065*-
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4031994 ...
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4031612 ...
 0001*http://mudbomb.com/archives/2005/02/02/wysiwyg-plugin-for-wo ...
 0001*http://www.heise.de/newsticker/meldung/55992
 0001*http://log.netbib.de/archives/2005/02/04/nzz-online-archiv-n ...
 0001*http://www.heise.de/newsticker/meldung/56000
 0001*http://a.wholelottanothing.org/2005/02/no_one_can_have.html

You can nicely see how this user clicked away from my weblog and came back - the referrers are by no means all links to me, but incorrect referrers that browsers send when switching from one site to another. Referrers are actually supposed to be sent only when a link is really clicked - hardly any browser does that correctly. The visit was on a defined day and they got in directly by entering the domain name (the "-" referrers are at the top and the earliest referrer that appears is at the top).

Or here's an access from me:


aa.bb.cc.dd: 6 accesses, 2005-02-04 01:11:56.00 - 2005-02-03 08:27:09.00
 0045*-
 0001*http://www.aylwardfamily.com/content/tbping.asp
 0001*http://temboz.rfc1437.de/view
 0001*http://web.morons.org/article.jsp?sectionid=1&id=5947
 0001*http://www.tagesschau.de/aktuell/meldungen/0,1185,OID4029220 ...
 0001*http://sport.ard.de/sp/fussball/news200502/03/bvb_verpfaende ...
 0001*http://www.cadenhead.org/workbench/entry/2005/02/03.html

I recognize myself by the referrer with temboz.rfc1437.de - that's my online aggregator. Looks similar - a lot of incorrectly sent referrers. Another user:


aa.bb.cc.dd: 19 accesses, 2005-02-12 14:45:35.00 - 2005-01-31 14:17:07.00
 0015*http://www.muensterland.org/system/weblogUpdates.py
 0002*-
 0001*http://www.google.com/search?q=cocoa+openmcl&ie=UTF-8&oe=UTF ...
 0001*http://blog.schockwellenreiter.de/8136
 0001*http://www.google.com/search?q=%22Rainer+Joswig%22&ie=UTF-8& ...
 0001*http://www.google.com/search?q=IDEKit&hl=de&lr=&c2coff=1&sta ...

This one came more often (across multiple days) via my update page on muensterland.org and also searched for Lisp topics. And they came from the shock wave guy once. Absolutely typical behavior.

Now in comparison, a typical referrer spammer:


aa.bb.cc.dd 6 accesses, 2005-02-12 17:27:27.00 - 2005-02-02 09:25:22.00
 0002*http://tramadol.freakycheats.com/
 0001*http://diet-pills.ronnieazza.com/
 0001*http://phentermine.psxtreme.com/
 0001*http://free-online-poker.yelucie.com/
 0001*http://poker-games.psxtreme.com/

All referrers are direct domain referrers. No "-" referrers - so no accesses without a referrer. No other accesses - if I analyzed it more precisely by page type, it would be noticeable that no images, etc. are accessed. Easy to recognize - just looks sparse. Typical is also that each URL is listed only once or twice.

Now our new friend:


aa.bb.cc.dd: 100 accesses, 2005-02-13 15:06:16.00 - 2005-02-11 07:07:55.00
 0039*-
 0030*http://irish.typepad.com
 0015*http://www208.pair.com
 0015*http://blogs.salon.com
 0015*http://hfilesreviewer.f2o.org
 0015*http://betas.intercom.net
 0005*http://vowe.net
 0005*http://spleenville.com

What stands out are the referrers without a trailing slash - atypical for referrer spam. Also, just normal sites. Also noticeable is that pages are accessed without a referrer - hidden behind these are the RSS feeds. This one is also easily distinguishable from users. Especially since there's a certain rhythm to it - apparently always 15 accesses with one referrer, then switch the referrer. Either the referrer list is quite small, or I was lucky that it tried the same one with me twice - one of them is there 30 times.

Normal bots don't need much comparison - few of them send referrers and are therefore completely uninteresting. I had one that caught my attention:


aa.bb.cc.dd: 5 accesses, 2005-02-13 15:21:26.00 - 2005-01-31 01:01:07.00
 2612*-
 0003*http://www.everyfeed.com/admin/new_site_validation.php?site= ...
 0002*http://www.everyfeed.com/admin/new_site_validation.php?site= ...

A new search engine for feeds that I didn't know yet. Apparently the admin had just entered my address somewhere beforehand and then the bot started collecting pages. After that, he activated my newly found feeds in the admin interface. Seems to be a small system - the bot runs from the same IP as the admin interface. Most other bots come from entire bot farms, web spidering is an expensive affair after all ...

In summary, it can be concluded that the current generation of referrer spammer bots and other bad bots are still quite primitive in structure. They don't use botnets to use many different addresses and hide that way, they use pure server URLs instead of page URLs and have other quite typical characteristics such as certain rhythms. They also almost always come multiple times.

Unfortunately, these are not good features to capture algorithmically - unless you run your referrers into a SQL database and check each referrer with appropriate queries against the typical criteria. This way you could definitely catch the usual suspects and block them right on the server. Because normal user accesses look quite different.

However, new generations are already in the works - as my little friend shows, the one with the missing slash. And thanks to the stupid browsers with their incorrectly generated referrers (which say much more about the browser's history than about actual link following), you can't simply counter-check the referenced pages, since many referrers are pure blind referrers.

Apparently disguised bot in the logs

I just found some referrers in my logs that I absolutely couldn't find anything on that would point back to me. Nothing unusual so far - referrer spam would be the first suspicion. But the sites mentioned in the referrers are perfectly normal weblogs and other sites - no one who would have reason to spam their site (for example, a blog with about 1 post per month, or an Irish site and a few other strange referrers). The numbers are also different than with normal referrer spam: that usually comes either only 1-2 times or if so with many addresses and each one then about 100x or similar. This one comes about 15 times.

So I dug around in the logs a bit to see if I could find something. And sure enough, the referrers have unusual characteristics: they don't end with a /. Normally an address that doesn't end with / is automatically redirected to the /-variant. Referrers are thus normally /-terminated or direct HTML pages or something comparable. Pure site specifications without a / at the end are rather rare.

Something else also stands out: the pages were actually accessed - or at least downloaded. And the pages belonging to one referrer are quite randomly mixed - with normal users you'd actually expect some form of consistency in what comes through as a referrer. Above all, it's rare for 15 links to come to one page all at once...

And the essential criterion: the IP of the accessing computer is always the same across the different ones. An analysis then produced the following picture:


 15 betas.intercom.net
 15 blogs.salon.com
 15 hfilesreviewer.f2o.org
 30 irish.typepad.com
 5 spleenville.com
 5 vowe.net
 15 www208.pair.com

All clearly fake referrers. Additionally, 34 accesses to my RSS feeds without a referrer. Accesses were only to direct posts and RSS feeds - not to overview pages or archive pages. It looks very much like the bot is proceeding as follows: search for RSS feeds, grab them, then search for permalinks to articles in them and download them to access comment forms, for example. The whole thing nicely disguised as supposed visitors, including forged referrers that seem unsuspicious. Also not too many accesses from one referrer, rather switch it up more often.

Actually nothing new - with email spam, forged real senders are quite common and usual to be harder to filter. But with scraper bots, I'm seeing this kind of mimicry live for the first time - I've only been observing these symptoms for about 1-2 weeks now.

For admins, this whole thing is quite annoying, since you can use referrer logs even less than you could before. Previous referrer spam was certainly a nuisance, but due to the pretty dumb names of the referrers it was easy to recognize. This form of log phenomenon also falsifies the referrers - but is much less noticeable. Could be interesting for weblogs that display their referrers directly in the post.

And of course the problem remains that I still don't know what the bot wants to do with the collected information. Although I'm strongly suspecting spam, but that's just a guess - could also be a bot searching for typical security holes. In any case it's a bot and in any case it has no good intentions - because otherwise it wouldn't need to hide.

What are you looking at?

A reposting of an old image from 2002 - near Husum. I'm currently playing around with my new photo plugin for WordPress and needed test material.

Matching my previous, longer, text: Weblog Tools Collection suffers from Referer Spam DoS. Such birds - that is, referrer spammers going into the thousands in terms of accesses - have (yet?) not shown up in my log analysis.

How to CSS DropShadows erzeugt. I could imagine using that for my photos. But it acts up in IE 5.5 for Mac OS X. Besides, drop shadows are only for wimps and softies anyway.

DGB Chief Accepts Restructuring of Welfare State

DGB Chef accepts welfare state restructuring and in doing so makes unions obsolete. I had written a longer text here, but somehow after my recent content deletion I no longer have it available. If anyone still has it in their RSS reader, please let me know, otherwise it's just gone.

New Polaroid 600 SE

I got myself a new Polaroid 600 SE. My old one was pretty beat up — from a bargain bin, broken shutter release, dents, dings, etc. The new one is in pristine condition. And it even came with the 127 lens, which I had been missing so far. Nice optics, especially with a much more sensible minimum focus distance than the 150. And the results look great — I just love Polaroids. However, my scanner was pretty dusty after not being used for a long time, and of course the dust didn't just disappear after a few wipes. As a result, there were lots of dust streaks visible in the image. Well, Photoshop and the Polaroid Dust & Scratch Removal Plugin saved the image pretty well. But anyway, I don't usually scan my Polaroids — I make them for the photo album. Really old-fashioned, with cardboard pages, tissue overlays and such …

Don't be surprised about the content of my blog...

... there's just a rogue admin with a stupid script that messed everything up and destroyed all the content. Somehow everything is being reconstructed and repaired and ironed out and folded back together. Somehow. And afterwards I'm going to stand in the corner and flog myself ...

Update: now everything has been largely restored. What happened: I switched from Exhibit to my own plugin for images. And in doing so, I rewrote all posts with image entries via script. But in the generated UPDATE, I stupidly forgot the WHERE clause ...

Losses: my post about the DGB and the wording in today's posts. Everything else was reconstructed from a backup. And a few nerves. And I've written it behind my ears for the x-th time that I should make a current backup before tinkering in the future. Which of course won't do any good, because I can't read behind my ears without two mirrors ...

Update 2: and of course I was so great during the weblog reconstruction that I also overwrote the changed image posts, so now all posts in the picture blog are without photos. I can't believe it. It's either a full moon or something today ...

Which means I have to get creative again to pull the images back into the posts, because of course I deleted all the mapping tables, since I don't need them anymore. But I still have them all in the backup, so it won't be as bad as before

Update 3: now everything should be largely back the way it was. And the last repair actually went without major catastrophes

What to expect when updating MySQL 4.0 to 4.1. Okay, database version upgrades are never easy and can always cause problems.

WordPress Localization describes how to create your own translations for WordPress.

Answering machines accept collect calls. Quick check of my answering machine greeting...

Microsoft Interoperability

Ian Bicking describes what Microsoft Interoperability really means. A quote from a Microsoft support employee:

Microsoft isn't in the business of integrating with non-Microsoft software.

Schily's New Initiative for Refugee Camps in Africa

Schily's new initiative for refugee camps in Africa - I'll refrain from commenting on this, as most of them would probably lead to defamation suits.

Tough times for Kofi Annan - through smear campaigns and denials from conservative NGOs in the United States. But others are also throwing mud around industriously.

Finding Deep Links in Log Files

I asked Pepino about it recently, so I put my Deep Link Finder Script online. It's a simple Python script. Should run on Python 2.2 and up, possibly even Python 2.1 (but that hasn't been tested). The script is configured in the source code (I've added comments for it) and then simply called with multiple logfiles as parameters. It extracts from Apache Combined Logs which sites deep link to specified file types (configurable, some image types are set by default) and how often. It outputs an HTML fragment that you can add headers and footers to in order to put it online - for example, that's how my Zeitgeist page for deep links is created. The other pages have similarly structured scripts, except they collect search terms and general referrers instead.

I take a look at the deep linker list now and then, and if someone shows up there who deep links quite a lot and isn't an aggregator or news service, they get shown a corresponding replacement image. But really only those sites. It bothers me too when my feed reader suggests I'm an image thief or traffic robber.

Make Me the Ackermann

Hotel Falckenstein: Make me the Ackermann - and with what? With reason! Making these top rip-off artists also honorary professors for economics at Frankfurt's Goethe University is simply just an audacity.

Podcasts? No.

More on Drupal

What I also noticed while tinkering with Drupal: unlike WordPress, the database model is quite complex. WordPress is pretty straightforward - just a few tables with data in them, most of it quite straightforward. If you want to change something, you can always fiddle with it at the SQL level and write import scripts, repair scripts, etc. Everything is always calculated on the fly - counters, lists, etc.

Drupal, on the other hand, uses quite elaborate caching mechanisms in the database. Things from the filesystem are also cached. This means that with small scripts you have more work to do, because you have to account for much more - at least remove the cache so it gets reconstructed. Also, the data model is much more denormalized. That's certainly good from a design perspective - but for small scripts it's more cumbersome, since you have to handle more places.

This is now not a judgment, just an observation - both have advantages and disadvantages. The advantages of the Drupal approach seem to show up in performance, which seems to be somewhat better than WordPress not only because of the somewhat cleaner PHP structure (though I don't have hard numbers on that - first I need a workable import script for my posts to work with the same approaches).

What I also noticed: PostgreSQL support in Drupal (yes, I finally got it running!) is definitely behind that of MySQL. In some cases there are error messages with PostgreSQL that don't occur with MySQL. For example, there were problems with password changes because a non-existent field was being accessed. Or in the overview of sources subscribed to in the newsreader there was a message because a non-aggregated field of a complex SELECT was not listed as a grouping field. Or on first access, where no value was specified for the uid field in the sessions table, even though it was declared as NOT NULL. PostgreSQL is definitely more picky than the card file. With PostgreSQL you'll definitely have to modify the PHP code. I'll see - once I'm done, maybe I'll prepare a patch that fixes these issues. So far they're just minor things, but they can certainly be a hurdle for non-programmers. Some of them are certainly based on the somewhat older versions from Debian Stable (for example, the PgSQL API in PHP is named quite differently in newer versions than in 4.1.2).

WordPress Files and Load Order

Wordpress file loading describes the order in which WordPress loads which PHP files to produce a blog page. Quite interesting if you're planning to change files - it provides an initial guide to where you might find things. However, what also stands out: given that WordPress actually produces quite lean pages, WordPress itself already has quite a bit of bloat.

Wordpress to Drupal Migration Script

Wordpress to Drupal Migration Script - but it currently seems to only be able to migrate from flat file to flat file - you may need to first set up a proper database as the target.

Update: well, the script really only transfers the posts. No post slugs (so no URL preservation), no categories, nothing. You might be able to use it if you only had a very simple WP blog, but otherwise it's pretty bare-bones. And with large blogs it crashes after a while with an error - the available memory is exhausted. This is because PHP uses the CGI settings when run from the command line - and the memory limit is restricted there too. On top of that, it has no duplicate detection, so it cheerfully imports everything again on the second run.

Sounds like I'd probably need to write my own solution if I wanted to try this seriously.

wp-style-switcher is a simple CSS switcher for WordPress that works without JavaScript or similar.

Employers want new study financing

Employers want new study financing - actually, I generally assume that those who want to change something either directly manage that something, or at least are prepared to finance it soon. In other words: if employers want to change study financing, they should first pay for study financing, child allowance, etc. themselves. Because as it stands, this is just hollow rhetoric - employers have nothing to do with the things mentioned and are just shouting populist demands into the room again.

It's really fascinating how they want to regulate payments they themselves don't make and then want to use a credit institution they don't finance. If Hundt had said that employers want to establish and fund a social fund to support study financing, then that would have been an innovative proposal. As it is, it's just blah-blah.

Our employer associations really have gone to the dogs.

Buffer Overflow in numerous Symantec products - Ouch!

China: Executions for social peace? - will certainly not greatly dampen our industrial chancellor's enthusiasm for the large Chinese market ...

Google Search: gemölter

I just wanted to point out that I'm ranked first on Google.de when searching for gemölter

HP-Chefin tritt überraschend zurück

A mini Loch Ness Monster washed up in Parton. And now they're puzzling over what kind of creature it actually is.

liquid design on em/ex basis

From the CSS Zen Garden: a liquid design that is based on em and ex units and therefore grows and shrinks in layout along with a changed font size. That might be a usable basis for my blog, because it's precisely the fact that a fixed design doesn't respond particularly well to font size changes that bothers me about Kubrick.

Now I just need to figure out how to implement it properly. Above all, I'll probably have to incorporate the header graphic quite differently — images just don't scale sensibly with this approach. Let's see if I feel like tinkering with it at some point.

Who is to blame for the brown man?

Who is Responsible for the Brown Man?

If not now, when does the Union want to win back these little sheep that have strayed beyond the right edge of reason and humanity into Nazi filth? Put more objectively: what makes the bourgeois opposition currently so repellent to those disappointed by the government that they would rather follow runaway criminals and those stuck in the past? That is the great political question of our time, far more important than the question of whether parties like the NPD or DVU should be banned or not.

Chief Economist Walter Reads Germans the Riot Act

Chief economist lectures Germans - oh yes, when the henchmen of the money bags complain it's analysis and supposedly constructive criticism. What comes out in the end - which is what matters, as we've all known since Kohl - is just garbage. But what else would you expect from the chief economist of one of the biggest rip-off companies (let's remember: they just planned to lay off a few thousand employees despite record profits - which will surely be great for the economy) anyway.

angry face

The poor film industry and the triviality threshold

Film industry mobilizes against "piracy clause" - when I look at all this whining, I lose the desire to watch films anyway. I hardly go to the cinema anymore since there are only megaplexes left, where you feel about as comfortable as in a train station hall. And DVDs - sorry, but what am I supposed to do with films that torture me in my home cinema with 15 minutes of advertising for other film garbage I don't want - and if I wanted it, I would have gotten it long ago anyway.

Instead of actually thinking about how to respond sensibly to modern technology, the film industry prefers to think about how to further cement an outdated business model. And it cries out for help from the state. What a load of nonsense.

Don't reset existing password on request, prevent DoS password reset abuse | drupal.org

Don't reset existing password on request, prevent DoS password reset abuse - well, I noticed exactly this problem too and couldn't believe that someone actually built something like that into a CMS. In Drupal, you can change the password for a user - any user at all. The new password is then sent to that user by email. So you can't gain illegal access through this, unless you can intercept the user's emails (which shouldn't normally be the case). But you can lock out an admin: simply set up a job that resets the admin's password every minute. And then use this forced absence of the admin to completely spam the Drupal site, for example.

That's really an embarrassing oversight. Unfortunately, it's made far too often and far too frequently. So if you operate Drupal, the patch is recommended (be careful, the author submitted two patches, the first one was still buggy). It installed without any problems and at least fixes the admin lockout. Of course, you still get annoying emails in the process.

Firefox - IDN - 0 Info - 0 Transparency

Kai is ranting about Firefox - IDN - 0 Info - 0 Transparency - and he's right with his rant. You're used to this security secrecy from commercial providers, but with open-source projects it annoys me every single time as well. When will people finally understand that only early disclosure gives users a chance to protect themselves? Keeping bugs secret is based on the absurd assumption that you're the first to notice this bug. Which is simply silly: a blackhat who notices this bug will certainly not broadcast it but instead exploit this bug for as long as possible. And so only those benefit from keeping it secret for too long - the ones we shouldn't be helping anyway.

User security needs to be the focus of security considerations - and specifically the informed user who is capable of turning information into meaningful action. The uninformed user doesn't care anyway, they click on everything. But a sysadmin who knows about a problem can at least contribute through educating their own users so that they maybe act more cautiously for a certain period of time. An uninformed sysadmin doesn't even have a trace of a chance to do that.

Gene plant research in Münster

The university has built a greenhouse for research on genetically modified plants. I can't really say I'm particularly thrilled about it being in the neighborhood. Not necessarily because of the greenhouse itself — but where there's a greenhouse, eventually someone wants to conduct field trials.

confused face

Yep, Drupal is going to drive me crazy

Clearly. I don't know what it has against me, but it hates me today. Really massively.

I simply copied the kubrick-theme under my own name so I could customize it without changing the Kubrick-theme itself. Funnily enough, it's now not using the phptemplate-engine anymore. Or more precisely: the entry in the system table (type='theme' and then for the page.tpl.php) doesn't point to phptemplate.engine, but to phptemplate - the .engine is missing. When I add it via update, it works exactly once. After that, this entry in the system table gets overwritten and .engine is gone and the template is broken. Of course, Kubrick doesn't do that. And of course, you can't find any information anywhere about where the heck the theme says which template engine should be used - and how this entry in the system table is created. No, simply grepping for phptemplate.engine doesn't help.

Ok, now it's clear to me that the engine creates the entries - at least after I took a closer look at the engine source. It searches for page.tpl.php and when it finds it, it connects it with the phptemplate.engine. But why would the engine enter its own name incorrectly? Especially since it does it correctly with Kubrick. I just unpacked that into the themes directory.

Alright, so let's keep investigating. A grep -r for INSERT in combination with system then finds the function system_obtain_theme_info in the system.module, where these statements are written. But how and where exactly something is done with it there - sorry, but you can't figure that out without longer study. Somehow the description attribute gets filled with a value that ends with .engine for the Kubrick-theme and doesn't for all others. Kubrick references the theme engine exactly and correctly, but an arbitrarily named copy of Kubrick with identical content references a theme engine without .engine in the name and doesn't work. Great. But renaming Kubrick works. Huh?

Ok, next approach. Rename my template to something else and rename Kubrick to my actually desired name. Complete confusion: my template doesn't work, but the now-kubrick-named one that didn't work before doesn't work either. Uh... So I renamed the Kubrick to something else. And tried my temporarily stored one. That works now. Under a name that isn't Kubrick. Huh? Shell game? Should I just rename the themes around until I eventually have a working one under the name I want and then call it done?

So I tried to resolve the shell game. Computers are deterministic machines after all, that should be possible. Ok, both templates (original Kubrick and my Hugo) renamed. To aa and bb. And which one works? The one called bb. Did the whole thing again, just this time swapped the roles. aa becomes bbb and bb becomes aaa. Which one works now? The one called "bbb". When two phptemplate.engine-based themes are installed in the system, only the last one found in the system at the time themes are being searched works. The others break.

So now I first have to figure out what's wrong with the old themes, why they can't be made to work. First approach: make a database dump and grep to see where all my friends show up. While doing that, I found what's up with the mysterious phptemplate without .engine: the corresponding entries contain a chr(0) instead of the period. Ascii-null. MySQL stores it, but PHP cuts it off on access. And for all the old templates, there are all these broken entries. Also, the engine remembered in the phptemplate extra_templates entry in the variable table which themes it had already seen.

Another clean room test: throw out the entries in the system table with type='theme' and description like 'themes/engine/phptemplate%'. Then it knows nothing more about the themes and their names. Then only have my desired template and activate it. And behold, it works right away. Then unpacked Kubrick. And it works. But after that, my own theme doesn't work anymore. As expected - Kubrick comes after hugo alphabetically. Delete Kubrick again and my own theme works again - after appropriate refresh.

So investigate where the heck this is happening and why. It only happens with the phptemplate.engine themes. The xtemplate.engine themes work without problems. Although it turns out they do it despite the bug - it affects them too. Because in system.module in system_theme_data (how I figured that out I'll spare the readers - it was just successive inserting of echo statements to see when and where things break) it gets destroyed in the last step - in the call to system_obtain_theme_info - on the files the description element. And that's what gets saved in the system table to reference the theme engine. Only the last theme of an engine keeps the correct entry, all others are broken.

Hmm. The basename call on line 336 is the only suspect - it basically only delivers the theme engine without the .engine suffix. But it shouldn't change the actual field, so I hadn't paid much attention to it before - the PHP documentation says nothing about side effects of this function. But when I comment out the entry, my theme works and Kubrick too - simultaneously. But the PHP manual says nothing about basename changing the original string.

So I wrote a small test script that just makes a basename call. Ugh. Yes, that's it - basename changes the original string, and it puts a chr(0) in place of the period. And behold, there's a bug report from 2002 about it - yes, I'm running an old PHP 4.1.2 version, since Debian Stable. The bug report has a workable solution for my problem - just put the variable in "" and work with string interpolation. And behold, problem solved. And make a note to remember: in 4.1.2 basename breaks the source variable.

And a programmer spends debugging time on crap like this (I mean the bug, not Drupal)

I could have learned a decent job. Whisky barrel keeper at Jack Daniels, for example...

Some projects want to drive me insane

or at least that's what one could think. Today's program: Drupal 4.5.2. Nice package, I especially like it because there's now also Kubrick as a theme for Drupal and because it's quite powerful while still being reasonably manageable. But every time I deal with it again after a longer break, I fall into the same pitfalls: for example, enabling translations. It's great that translations exist. But when there's not even the slightest hint on the website about what you need to do, you end up feeling pretty stupid. Ok, yes, you just have to activate the locale.module. But where on earth is that documented? In the x-th hierarchy of the administration menu. Equally annoying: a database connection for PostgreSQL is included. Unfortunately, it's only usable from PHP 4.3 onwards - older versions aren't supported, even though Drupal runs from 4.1. After I've edited everything to use the old function names, it still doesn't work: apparently a default value was missing for the uid column in the sessions table. After I set that, PHP hung when accessing the site. Ok, fine, use MySQL instead (but I don't like MySQL...). Alright, now I'm in, I also have Kubrick as the layout and German translations. Ok, part of the system in German - but there are tons of missing strings. So I know what I'll be doing again soon. Great. Just as great as the default value for the file directory, which is simply "files". Which doesn't work if you want to allow users to upload images, because then "files" and "pictures" get concatenated without a /. And no, the / can't be before "pictures", it has to be after "files". And that with Kubrick the menu in the right column obviously has to be selected as "links" when activating blocks - I probably don't need to mention that separately. And the fact that the manual is anything but up to date - sorry, but that's just ridiculous. It still talks about directory structures in places that don't even exist anymore. No, the settings aren't in sites/default/settings.php - they're in includes/conf.php.

Ugh. This is such a nice project. And the whole system is really powerful and stable. But the documentation is really a joke. Sometimes I get the feeling that people aren't documenting Drupal at all, but something else entirely.

Still, it's nice, so I won't complain too loudly. Others don't really do much better either. Still - it could be so nice if the reference to the online manual would actually help instead of confuse...

Rat der EU ignoriert Forderung des Parlaments - well, that was almost to be expected. Why bother with democracy, it only slows things down anyway ...

Spreeblick: Sweety Records

Spreeblick explains the music industry to us: Sweety Records

Teufelsgrinsen

In "On the GPL" Isotopp writes about the GPL and what is actually in it and how one can understand it. A pretty good explanation, I think. Should be recommended reading for anyone who believes the nonsense that Microsoft, SCO and some others spread about the GPL.

Hartz IV Urban Legend

Urban Legends Reference Pages: Media Matters (Hot Jobs) describes how a hypothetical consideration from TAZ and a report about a brothel operator who cannot find prostitutes for his brothel (and is not allowed to search for them through the employment office, because it refuses to advertise such jobs) then becomes factual reporting in the English-language press, in which it is then claimed that women who refuse jobs as prostitutes would lose their unemployment benefits.

So much for professional journalism

Devil's grin

Although I would certainly hope that our supposedly great legislators don't subsequently turn this newspaper hoax into truth...

Microsoft receives patent on coordinates in URLs - what utter nonsense. Yet another proof that patents on algorithms are simply rubbish and at best serve to rip people off for money, but certainly not the innovation drive that defenders like to cite over and over again.

New phishing attack possible in many web browsers

Read on golem: New Phishing Attack Possible in Many Web Browsers. Great. Once again, a sloppily implemented solution and a sloppy standard. The whole umlaut domain stuff is nonsense anyway, and you have to wonder why it was implemented in the first place - the mere fact that this garbage only works for websites and IDNs can't be meaningfully used for anything else should have made anyone realize what a ridiculous idea it is. And now it's also a phishing hole.