Artikel - 12.1.2005 - 20.1.2005

heise online - EU Council to make another attempt at software patents

heise online - EU Council to make another attempt at software patents and continues to trample on the opinion of the population and parliaments. And our government in Berlin sits on its fat ass, greased by the economy, and does nothing. Never mind if such nonsense will cause problems for the software mid-market, never mind if it only benefits the big software giants, never mind if it's just brown-nosing America. Nobody really cares about the issue, after all it's just a bunch of software nerds making a fuss, who cares about them anyway.

And eventually even the dumbest minister will realize that software patents don't create jobs.

angry face

Despicable Idea of the Day

The Bangkok Hilton is installing cameras in the death row to publicly broadcast the lives of prisoners waiting for execution - it doesn't help that they don't plan to show the execution itself. That convicted criminals also have human rights is unfortunately repeatedly ignored. And in doing so, the state ultimately puts itself on the same level as the criminals. It is therefore not surprising that similar practices in the USA - at least in part - are already commonplace. That capital punishment itself is one of the most inhumane ideas a society can have (unlike all other common punishments, it is not reversible or at least compensable in case of error) is beyond question anyway. I cannot regard states with capital punishment (and thus ultimately a legal system based on ideas of revenge rather than protection of society) as particularly civilized...

SCO vs. Linux: SCO Gets More Material

The seemingly strange decision by the judge in the SCO vs. IBM case is — as usual — explained by Groklaw. The judge's role is not to clarify who is right — that's a different judge's job. Her job is only to ensure that all parties put all relevant material on the table. So it's solely about the investigation documents. Still, this is of course the annoying delay tactic by SCO at work. But it's not the big interim victory for SCO as one might possibly see it.

Strange Posture of Planetopia

At Spreeblick, the Planetopia journalist requested removal of the recording of the questions - what fascinates me about it: he had no qualms about publishing his distorted conclusions on air to an audience of millions. But he objects to publishing the questions he asked. Can't he take his own medicine?

Insurance companies want access to genetic test results

Insurers Want Access to Gene Test Results. Was predictable that something like this would come. After all, it's the best way for these rip-off companies to get out of the few remaining situations where they might actually have to pay. And that's exactly what insurance is about: selling people something they're not actually willing to provide in an emergency. It's easy too, politics forces citizens to do it if necessary.

But there are no risks whatsoever in genetic engineering and building gene sample collections, and we're all just way too paranoid not to believe these liars and fraudsters. Yeah. Right. And pigs can fly.

Post 4000

Wow. This here is - in the current database - the 4000th article

erstauntes Gesicht

Got New Spam Tactic Figured

Asymptomatic » Got New Spam Tactic Figured reports on a new tactic used by blog spammers. Relatively harmless comments appear on blogs that don't contain a single link. When spammers find these comments again via Google, they know they can likely post further comments there—bypassing the filters that automatically approve comments from visitors who have previously had a comment approved under their email address. So it could be that after a "Hey, I think your site is great" comment, a flood of blog spam suddenly appears...

RSS 1.1: RDF Site Summary (DRAFT)

RSS 1.1: RDF Site Summary (DRAFT) - how sensible. Someone designed an update to RSS 1.0 (yes, that unloved RDF-based format that hardly anyone really knows). Because, there aren't enough feed formats yet, so you definitely have to add one more.

Along those lines, I also stumbled upon the HTML Syndication Format - another thing like that, where someone thought it would be a good idea to deliver the feed as specially tagged XHTML source.

And unlike RSS 3.0 (which is based on YAML instead of XML), the other two seem to be more or less serious (with the HTML Syndication Format I'm still hoping it was just an elaborate April Fools' joke ...)

Schwarzenegger rejects clemency petition

Der Terminator macht seinen Filmfiguren alle Ehre - and sends another person to their death.

Subtraction: New, Improved Original Flavor!

Khoi Vinh presents his new design. It's somehow cool how he creates — for me — quite an interesting impression through omission. I really like it anyway.

I wish I could come up with my own design and didn't always have to steal my designs from somewhere ... (one of the real advantages of WordPress - the Kubrick design may be quite widespread by now, but at least it can be adapted even by design dummies like me without everything falling apart)

WordPress NoFollow Plugin

The WordPress NoFollow Plugin adds rel="nofollow" to links in comments to remove their Google ranking. While I personally find it a shame that links in comments are generally not followed, thus removing the useful opportunity for smaller blogs to promote their own through active discussion in other blogs. Okay, in the end it's not that bad, but somehow a small piece of the "one link washes the other" mentality of blogs is lost... A small handicap is that the author has directly linked the plugin and unfortunately his server executes the PHP directly. At the moment, you can't download it, you only get an empty HTML page.

Zope Hosting and Performance - English Version

Somebody asked for an english translation of my article on Zope Hosting and Performance. Here it is - ok, it's not so much a direct translation than a rewrite of the story in english. Enjoy.

Recently the Schockwellenreiter had problems with his blog server. He is using Zope with Plone and CoreBlog. Since I am doing professional Zope hosting for some years now, running systems that range in the 2000-3000 hits per minute scale, I thought I put together some of the stuff I learnt (sometimes the hard way) about Zope and performance.

  • The most important step I would take: slim down your application. Throw out everything you might have in the Zope database that doesn't need to stay there. If it doesn't need content management, store it in folders that are served by Apache. Use mod_rewrite to seemlessly integrate it into your site so that people from the outside won't notice a difference. This can be best done for layout images, stylesheets etc. - Apache is much faster in delivering those.
  • Use Zope caching if possible at any rate. The main parameter you need to check: do you have enough RAM. Zope will grow when using caching (especially the RAMCacheManager). The automatic cleanup won't rescue you - Zope will still grow. Set up some process monitoring that automatically kills and restarts Zope processes that grow above an upper bound to prevent paging due to too large memory consumption. This is even a good idea if you don't use caching at all.
  • There are two noteable cache managers: one uses RAM and the other uses an HTTP accelerator. The RAMCacheManager caches results of objects in memory and so can be used to cache small objects that take much time or much resources to construct. The HTTPCacheManager is for using a HTTP accelerator - most likely people will use Squid, but you can use an appropriately configured Apache, too. The cache manager will provide the right Expires and Cache-Control headers so that most traffic can be delivered our of the HTTP accelerators instead of Zope.
  • Large Zope objects kill Zopes performance. When using caching they destroy caching efficiency by polluting the cache with large blobs of stuff that isn't often required and Zope itself will get a drain in performance by them, too. The reason is that Zope output is constructed in-memory. Constructing large objects in memory takes much resources due to the security layers and architectural layers in Zope. Better to create them with cronjobs or other means outside the Zope server and deliver them directly with Apache. Apache is much faster. A typical situation is when users create PDF documents in Zope instead of creating them outside. Bad idea.
  • Use ZEO. ZEO rocks. Really. In essence it's just the ZODB with a small communication layer on top. This layer is used in Zope instances instead of using the ZODB directly. That way you can run several process groups on your machine, all connecting to the same database. This helps with the above mentioned process restarting: when one is down, the other does the work. Use mod_backhand in Apache to distribute the load between the process groups or use other load balancing tools. ZEO makes regular database packs easier, too: they run on the server and not in the Zope instances - they actually don't notice much of the running pack.
  • If you have, use a SMP machine. Or buy one. Really - that helps. You need to run ZEO and multiple Zope instances, though - otherwise the global interpreter lock of Python will hit you over the head and Zope will just use one of the two processors. That's one reason why you want multiple process groups in the first place - distribution of load on the machine itself, making use of multiple processors.
  • You can gain performance by reducing the architectural layers your code goes through. Python scripts are faster than DTML. Zope products are faster than Python scripts. Remove complex code from your server and move it into products or other outside places. This needs rewriting of application code, so it isn't allways an option to do - but if you do, it will pay back.
  • Don't let your ZODB file grow too large. The ZODB only appends on write access - so the file grows. It grows quite large, if you don't pack regularily. If you don't pack and you have multi-GB ZODB files, don't complain about slow server starts ...
  • If you have complex code in your Zope application, it might be worthwile to put them into some outside server and connect to Zope with some RPC means to trigger execution. I use my |TooFPy| for stuff like this - just pull out code, build a tool and hook it into the Zope application via XMLRPC. Yes, XMLRPC can be quite fast - for example pyXMLRPC is a C-written version that is very fast. Moving code outside Zope helps because this code can't block one of the statically allocated listeners to calculate stuff. Just upping the number of listener threads doesn't pay as you would expect: due to the global interpreter lock still only one thread will run at a time and if your code uses C extensions, it might even block all other threads while using it.
  • If you use PostgreSQL, use PsycoPG as the database driver. PsycoPG uses session pooling and is very fast when your system get's lots of hits. Other drivers often block Zope due to limitations like only one query at a time and other such nonsense. Many admins had to learn the hard way that 16 listener threads aren't really 16 available slots if SQL drivers come into play ...

There are more ways to help performance, but the above are doable with relatively small work and are mostly dependend on wether you have enough memory and maybe a SMP machine. Memory is important - the more the better. If you can put memory into your machine, do so. There is no such thing as too-much-memory (as long as your OS supports the amount of memory, of course).

What to do if even those tips above don't work? Yes, I was in that situation. If you come into such a situation, there is only one - rather brutish - solution: active caching. By that I mean pulling stuff from the Zope server with cronjobs or other means and storing it in Apache folders and using mod rewrite to only deliver static content to users. mod rewrite is your friend. In essence you just take those pages that kill you currently and make them pseudo-static - they are only updated once in a while but the hits won't reach Zope at all.

Another step, of course, is more hardware. If you use ZEO it's no problem to put a farm of Zope servers before your ZEO machine (we currently have 5 dual-processor machines running the Zope instances and two rather big, fat, ugly servers in the background for databases, frontend with two Apache servers that look allmost like dwarves in comparisons to the backend stuff).

Zope is fantastic software - don't mistake me there. I like it. Especially the fact that it is an integrated development environment for web applications and content management is very nice. And the easy integration of external data sources is nice, too. But Zope is a resource hog - that's out of discussin. There's no such thing as a free lunch.

DNA Analysis: Bavaria Launches Federal Council Initiative

Owl Content

DNA Analyses: Bavaria launches federal council initiative - who else if not Bavaria? Current incidents are seen as a welcome opportunity to quickly push through some changes. Never mind that these changes enable far more than fingerprints allow - and that there are many more possibilities for abuse (e.g. genetic analysis for assessing suitability).

The ruling of the Federal Constitutional Court that explicitly restricted DNA analyses to particularly serious crimes is being ignored as well. Who cares about the Federal Constitutional Court when populism works so well for stirring up sentiment...

Grafedia

Grafedia is something like links for hardware out there. You use a word as an email address under a domain and you receive a file in response. The concept is simple — it becomes interesting through the use of mobile phones to send the mail — and getting the result back on your mobile. And the fact that you can put the words somewhere out there on walls and similar places. Somehow crazy, somehow beautiful. (Found at Spreeblick)

Organizer Overload

Spent 5 minutes pondering which device had just sounded an alarm and what on earth it was supposed to remind me of. After checking several devices and programs, I realized it was an alarm that had actually been deleted long ago on my (not synced, therefore outdated) PDA. Insufficient alarm signal recognition capability due to alarm tone diversity overload...

QuickSilver: Act Without Doing

Brian Mastbrook describes very nicely how Quicksilver combines the best of keyboard-driven interfaces and graphical interfaces. Unfortunately, QuickSilver only runs on 10.3 and later, which is why I'm still stuck with LaunchPad - which, however, in the latest versions (aside from the really extremely slow startup) can keep up quite well.

In general, I find this slowly developing idea of combining graphical and keyboard-driven interfaces very pleasant. Graphical interfaces are good at presenting complex structures (a directory structure becomes clear to me graphically faster than from the shell), but they are often quite cumbersome to use. Tools like QuickSilver and LaunchPad help tremendously. Apple's Universal Key Access would probably help me too - if I had 10.3...

Save Think Secret's Nicholas Ciarelli Petition

Save Think Secret's Nicholas Ciarelli Petition is worth considering signing if you're an Apple user. In any case, this lawsuit from Apple is neither positive nor sensible - after all, the Apple world also lives in part from its rumors. Found at Spreeblick.

The Temboz RSS aggregator

The Temboz RSS aggregator is a very nicely made aggregator in Python. It uses the Ultraliberal Feedparser for parsing and can import OPML. I find the interface nicely designed and the administration quite straightforward. And it has some nice features like the two-column layout and the fairly simple integrated filtering capability as well as quite useful feed list sorting options. I'm playing around with it a bit right now - even if that will probably reduce my motivation to write my own aggregator.

Kills Schnappi

Tötet Schnappi

Teufelsgrinsen

Working with Automator

Working with Automator describes how the new automation tool in Mac OS X 10.4 works. Makes you curious ... (Found at Schockwellenreiter)

Zope Hosting and Performance

Shockwave Rider is having problems with his Zope server. Since I've been doing professional Zope hosting in my company for several years now and run quite a few massive portals (between 2000 and 3000 hits per minute are not uncommon - though distributed across many systems), here are some tips from me on scaling Zope.

  • The most important step I would recommend to everyone is to streamline. Remove from Zope everything that doesn't need to be there - what can be created statically, what rarely changes, where no content management is needed: get rid of it. Put it in regular Apache directories. Use Apache's mod_rewrite to ensure the old URLs still work, but are served from Apache. This especially applies to all those little nuisances like layout graphics - they don't need to come from Zope, they're better served from Apache.
  • Use Zope caching whenever possible. Whenever possible means: enough memory on the server so that even memory-hungry processes have some breathing room. Generally, Zope's built-in caching causes processes to get fatter and fatter - the cleanup in its own cache is quite useless. So implement process monitoring that shoots down and restarts a Zope process when it uses too much memory. Yes, that really is sensible and necessary.
  • There are two good caching options in Zope: the RAMCacheManager and the HTTPCacheManager. The former stores results of Zope objects in main memory and can therefore cache individual page components - put the complex stuff in there. The second (HTTPCache) works together with Squid. Put a Squid in front of your Zope as an HTTP accelerator and configure the HTTP Cache Manager accordingly so that Zope generates the appropriate Expire headers. Then a large part of your traffic will be handled by Squid. It's faster than your Zope. Alternatively, you can configure an Apache as an HTTP accelerator with local cache - ideal for those who can't or don't want to install Squid, but do have options for further Apache configuration.
  • Large Zope objects (and I mean really large in terms of KB) kill Zope. With caching they destroy your best cache strategy, and Zope itself becomes incredibly slow when objects get too large. The reason lies in Zope's architecture: all objects are first laboriously pieced together through multiple layers by various software layers. In memory - and therefore take up corresponding space in memory. Get rid of complex objects with huge KB numbers. Make them smaller. Create them statically via cron job. Serve them from Apache - there's nothing dumber than storing all your large PDFs in Zope in the ZODB, or even generating them dynamically there.
  • Install ZEO. That thing rocks. Basically it's just the ZODB with a primitive server protocol. What's important: your Zope can be split into multiple process groups. You want this when you're using process monitoring to kill a rogue Zope process, but want the portal to appear as undamaged as possible from the outside - in that case just add mod_backhand to Apache, or another balancing technique between Apache and Zope. Additionally, ZEO also makes packing the ZODB (which should run daily) easier, since the pack runs in the background on the ZEO and the Zope servers themselves aren't greatly affected.
  • If you have it, use an SMP server. Or buy one. Really - it brings a lot. The prerequisite is the aforementioned technique with multiple process groups - Python has a global interpreter lock, which means that even on a multiprocessor machine, never more than one Python thread runs at a time. Therefore you want multiple process groups.
  • Performance is also gained by disabling layers. Unfortunately this often can only be realized with software changes, so it's more interesting for those who build it themselves. Move complex processes out of the Zope server and put them in Zope Products. Zope Products run natively without restrictions in the Python interpreter. Zope Python scripts and DTML documents, on the other hand, are dragged through many layers that ensure you respect Zope's access rights, don't do anything bad, and are generally well-behaved. And they make you slower. Products are worthwhile - but cost work and, unlike the other technical tips, aren't always feasible.
  • Additionally, it has proven useful not to put too much data in the ZODB, especially nothing that expands it - the ZODB only gets bigger, it only gets smaller when packing. After some time you easily have a ZODB in the GB range and shouldn't be surprised by slow server starts...
  • If more complex processes occur in the system, it can make sense to outsource them completely. I always use TooFPy for that. Simply convert all the more complex stuff into a tool and stick it in there - the code runs at full speed. Then simply access the tool server from Zope with a SOAP client or XMLRPC client and execute the functions there. Yes, the multiple XML conversion is actually less critical than running complex code in Zope - especially if that code demands considerable runtime. Zope then blocks one of its listeners - the number is static. And simply pushing it up doesn't help - thanks to the global interpreter lock, only more processes would wait for this lock to be released (e.g., for every C extension that's used). There's a good and fast C implementation for XMLRPC communication that can be integrated into Python, making the XML overhead problem irrelevant.
  • If you use PostgreSQL as a database: use PsycoPG as the database driver. Session pooling really gets Zope going. Generally you should check whether the corresponding database driver supports some form of session pooling - if necessary via an external SQL proxy. Otherwise, Zope might hang the entire system during SQL queries because a heavy query waits for its result. Many have already fallen into this trap and learned that 16 Zope threads doesn't necessarily mean 16 parallel processed Zope accesses when SQL databases are involved.

Of course there's a lot more you can do, but the above are largely manageable on the fly and mainly depend on you having enough memory in the server (and possibly a multiprocessor machine - but it works without one too). Memory is important - the more the better. If you can, just put more memory in. You can't have too much memory...

What to do if even all that's not enough (yes, I've had that - sometimes only the really heavy-handed approach helps). Well, in that case there are variations of the above techniques. My favorite technique in this area is active caching. By this I mean that Zope is configured at one point for which documents should be actively cached. This then requires a script on the machine that fetches the pages from Zope and puts them in a directory. Apache rewrite rules then ensure that the static content is served from the outside. Basically you're ensuring that the pages most frequently visited and suitable for this technique (i.e., for example, containing no personalization data) simply go out as a static page, no matter what else happens - the normal caching techniques just aren't brutal enough, too much traffic still goes through to the server.

Another step is of course the use of additional machines - simply put more machines alongside and connect them using the ZEO technique.

Zope is fantastic software - especially the high integration of development environment, CMS, and server is often incredibly practical, and the easy integration of external data sources is also very nice. But Zope is a resource hog, you have to put it that simply.

Cyclic Dependencies

Debian has a wonderful package system. And it has a whole range of very useful tools to make backports easier - for example, by using debootstrap to set up a chroot environment where you can safely gather the packages you need for the build and then create a corresponding package. I've used the whole thing several times, it's really great.

However, it can sometimes drive you crazy. I wanted to install the latest SQLite from Debian Testing. To do that, I first need the necessary tools to build the package. Since I had just set up a new chroot environment, not everything was there yet - for example, I was missing cdbs, a very powerful (and by now widely used) tool for easy creation of Debian packages. I had ported it once before, but I thought the opportunity was good to build a current version.

Or so I thought. It started off quite harmlessly - for the documentation it needs springgraph - a tool for formatting graphs. The tool itself actually has no build dependencies (except for the mandatory debhelpers). Fine. It also builds very quickly. When installing it, it complains about missing Perl modules for the GD2 integration. Okay, porting Perl modules is often tedious, but this one actually looked quite simple. A series of build dependencies, sure, but otherwise harmless. Except for the fact that it needs cdbs to build.

Aaaaarghl!!!!

Okay, I know what you have to do. Still. Sometimes I get the feeling that the Debian maintainers secretly get together to drive me crazy.

D Programming Language

The reference manual for the D Programming Language (a successor to C and C++ with various high-level ideas) contains a very fascinating clause on the first page of the language description:

Note: all D users agree that by downloading and using D, or reading the D specs, they will explicitly identify any claims to intellectual property rights with a copyright or patent notice in any posted or emailed feedback sent to Digital Mars.

I have the impression that such a clause — which automatically becomes binding for a user merely by reading the documentation and demands something that the user may not even be able to deliver — is a bit absurd. I would be interested in the opinion of the blawgers on this.

Emails must not be filtered

Ruling: Emails must not be filtered - an important decision with potentially far-reaching consequences. For strengthening the right to email confidentiality, as well as the filtering issue, will certainly lead to conflicts in companies. Most importantly, it establishes the applicability of postal and telecommunications secrecy to email - a point that has often been problematic until now.

Of course, it remains to be seen what counts as a plausible reason for filtering in court - probably somewhat more than just virus protection. Still, a step in the right direction.

The Non-Word 2004

No matter what is announced tomorrow - my non-word is reform. How every brain fart of a minister was titled as reform is simply unspeakably stupid.

Planteopia - Knowledge Bending Magazine

No TV Logo

The science bending magazine has put out a piece about weblogs - and a couple of bloggers provided input for it. The result was of course a total disaster - what else would you expect from a SAT.1 magazine? The reaction from Spreeblick to the nonsense is very amusing. And I can well understand the outrage from Schockwellenreiter.

Why Jörg didn't just do the whole thing for a fee in the first place, at least to get some positive result out of the nonsense, isn't entirely clear to me. Did he really expect that SAT.1 could produce anything meaningful on the topic of grassroots journalism? Factual falsification and manipulative editing are standard practice at such magazines - it's all about sensationalist garbage, it has nothing to do with real reporting or journalism.

And Planetopia itself? Well, unlike the tabloid press (whose escapades sometimes overflow with unintentional humor), the whole thing is so poorly done and the segments held so superficially that it doesn't even work as trashy entertainment...

Clark Dalton deceased?

Clark Dalton reportedly died in Salzburg - I confess that in my youth I also read large quantities of Perry Rhodan magazines ... Update: it is true. Walter Ernsting is dead. It's strange how I only associate the writer with his pseudonym and how unfamiliar his real name sounds.

Kallisys | Newton | Einstein Project

The Einstein Project is an emulator for the Apple Newton hardware. While this doesn't make your own Mac any smaller (and thus isn't a replacement for the PDA), you can at least play around with one of the most interesting PDA operating systems. Found in Rainer Joswig's Lispnews.

Longhand

Longhand is a nice little formula evaluator. You could also call it a calculator, but it's a program for the desktop computer. Something like a graphical variant of bc - it also supports arbitrarily large numbers. Nice and simple for quick calculations in between.

MonkeyTyping - The PEAK Developers' Center

MonkeyTyping is Phillip J. Eby's approach to optional static typing in Python. The idea looks very interesting. What always fascinates me about Phillip is his ability to look beyond the horizons of the language - just think of his work on generic functions in Python. Python urgently needs more of these kinds of breakthroughs - some discussions around Python show the first signs of language inbreeding (for example, these almost hateful reactions to mentions of Lisp and Lisp features by some Python advocates).

TextWrangler now free as in freeware

When I look at the feature comparison between TextWrangler, which is now freely available, and BBEdit, there's really only one feature I would miss: Shell Sheets. These Shell Sheets are absolutely brilliant - at least when you've worked on old Macs with MPW like I have and gotten used to the workflow. Basically like a shell window in Emacs, except the editor around it is usable.

Otherwise, the only other limitation worth mentioning is not being able to build TextFactories with TextWrangler (though you can run them), everything else I personally consider absolutely dispensable - especially all those HTML tools I've never really used.

It's great that BareBones is finally providing a noteworthy free version of their editor.

The Axis of the Pious: Is old Darwin still alive?

No Church!

Creationism is nonsense. Gross nonsense. Being carried out on the backs of students who are being pushed back with such rubbish into times that one thought had been overcome with the apology of the Catholic Church to Galileo Galilei (okay, he was already dead by then for a few centuries and the apology wasn't entirely genuine anyway).

And it is alarming how close this wave of stupidity is coming to us—with Italy and Serbia practically on our doorstep. And I'm not even that sure that we are really as safe from this plague spreading as we believe.

argh!

I can only agree with argh! on that, the comment about Annett Louisan.

Blogs - the new money machine?

A plugin I certainly won't install:

BlogMine enables content targeted ads in both feeds and web pages, simplifies and increases revenue generation for bloggers. The service provides a universal way to monetize all blog related content, regardless of whether it is published to the web or as an RSS feed.

DNS Stuff: DNS tools, WHOIS, tracert, ping, and other network tools.

A whole bag full of tools around nameservers, reachability etc. Very practical when you want to quickly check whether the reverse resolution of the server address also works reliably from outside. Or when you want to test a whole set of RBLs against an IP (I found rbls.org for that recently). Email tests. Routing information. And more ...

ESA - Cassini-Huygens - First image from Titan

ESA - Cassini-Huygens - First image from Titan - there it is, the first image from Titan, captured from 16 kilometers altitude. I want more of it

And Heise has more of it. Great. The landing site - wow.

Court stops advertising campaign by cancer doctor Rath

It's embarrassing that such a charlatan can only be stopped due to formal errors in the first place. You'd think you could shut down such nonsense much earlier.

Google receives patent on search term highlighting

Google Gets Patent on Search Term Highlighting - and this means my website violates exactly this patent. Thanks to the Search Highlight plugin for WordPress (which comes as standard), search terms are highlighted in color when visitors come to my pages from a search engine. Well, sue me then, Google ...

Patents are problematic enough as it is, but such trivial patents are just infuriating.

The search term zeitgeist is back

I've recreated a search terms zeitgeist for this blog. Previously, this was created by the Community Server, but now I've written a Python script that generates this zeitgeist automatically. I then integrated the whole thing into Wordpress's template system so the layout is correct too.

Of course, I've excluded this page in my robots.txt for the search engines, so that these search terms and links back to the search engines aren't used by them again for PageRank calculation - otherwise users would eventually only load on my zeitgeist page. And that would be Google spamming.

I always find this type of search term analysis quite interesting (by the way, I evaluate a whole bunch of search engines for this), as it makes it much easier to see what the web is interested in than with the normal evaluations that log analysis tools offer.

VW payments: Federal parliamentarian resigns

Why are politicians basically so stupid to believe their lies won't come out? Or why do they try every time with things like side income, some Miles-and-More deals or whatever else is cooking in terms of petty corruption to get out of it with really banal lies?

Caching for PHP Systems

Caching Strategies for PHP-Based Systems

There are basically two ways to implement caching in a PHP-based system. Okay, there are many more, but two main approaches are clearly identifiable. I've compiled what's interesting in this context - especially since some colleagues are currently suffering under high server load. The whole thing is kept general, but for understandable reasons also considers the specific implications for WordPress.

  • Caching of pre-compiled PHP pages
  • Caching of page output

There are numerous variations for both main approaches. PHP pages themselves exist on web servers as source code - unprocessed and not optimized in any way for the loading process. With complex PHP systems running, parsing and compiling into internal code happens for every PHP file. With systems that have many includes and many class libraries, this can be quite substantial. The first main direction of caching starts at this point: the generated intermediate code is simply stored away. Either in shared memory (memory blocks that are available to many processes of a system collectively) or on the hard disk. There are a number of solutions here - I personally use turck-mmcache. The reason is mainly that it doesn't cache in shared memory but on the disk (which as far as I know the other similar solutions also do) and that there is a Debian package for turck-mmcache. And that I've had relatively few negative experiences with it so far (at least on Debian stable - on Debian testing things are different, where PHP applications crash on you). Since WordPress is based on a larger set of library modules with quite substantial source content, such a cache brings quite a bit to reduce WordPress's baseline load. Since these caches are usually completely transparent - with no visible effects except for the speed improvement - you can also generally enable such a cache.

The second main direction for caching is the intermediate storage of page contents. Here's a special feature: pages are often dynamically generated depending on parameters - and therefore a page doesn't always produce the same output. Just think of mundane things like displaying the username when a user is logged in (and has stored a cookie for it). Page contents can also be different due to HTTP Basic Authentication (the login technique where the popup window for username and password appears). And POST requests (forms that don't send their contents via the URL) also produce output that depends on this data.

Basically, an output cache must consider all these input parameters. A good strategy is often not to cache POST results at all - because error messages etc. would also appear there, which depending on external sources (databases) could produce different outputs even with identical input values. So really only GET requests (URLs with parameters directly in the URL) can be meaningfully cached. However, you must consider both the sent cookies and the sent parameters in the URL. If your own system works with basic authentication, that must also factor into the caching concept.

A second problem is that pages are rarely purely static - even static pages certainly contain elements that you'd prefer to have dynamically. Here you need to make a significant decision: is purely static output enough, or does a mix come in? Furthermore, you still need to decide how page updates should affect things - how does the cache notice that something has changed?

One approach you can pursue is a so-called reverse proxy. You simply put a normal web proxy in front of the web server so that all access to the web server itself is technically routed through this web proxy. The proxy sits directly in front of the web server and is thus mandatory for all users. Since web proxies should already handle the problem of user authentication, parameters, and POST/GET distinction quite well (in the normal application situation for proxies, the problems are the same), this is a very pragmatic solution. Updates are also usually handled quite well by such proxies - and in an emergency, users can persuade the proxy to fetch the contents anew through a forced reload. Unfortunately, this solution only works if you have the server under your own control - and the proxy also consumes additional resources, which means there might not be room for it on the server. It also heavily depends on the application how well it works with proxies - although problems between proxy and application would also occur with normal users and therefore need to be solved anyway.

The second approach is the software itself - ultimately, the software can know exactly when contents are recreated and what needs to be considered for caching. Here there are again two directions of implementation. MovableType, PyDS, Radio Userland, Frontier - these all generate static HTML pages and therefore don't have the problem with server load during page access. The disadvantage is obvious: data changes force the pages to be recreated, which can be annoying on large sites (and led me to switch from PyDS to WordPress).

The second direction is caching from the dynamic application itself: on first access, the output is stored under a cache key. On the next access to the cache key, you simply check whether the output is already available, and if so, it's delivered. The cache key is composed of the GET parameters and the cookies. When database contents change, the corresponding entries in the cache are deleted and thus the pages are recreated on the next access.

WordPress itself has Staticize, a very practical plugin for this purpose. In the current beta, it's already included in the standard scope. This plugin creates a cache entry for pages as described above. And takes parameters and cookies into account - basic authentication isn't used in WordPress anyway. The trick, though, is that Staticize saves the pages as PHP. The cache pages are thus themselves dynamic again. This dynamism can now be used to mark parts of the page with special comments - which allows dynamic function calls to be used again for these parts of the page. The advantage is obvious: while the big efforts for page creation like loading the various library modules and reading from the database are completely done, individual areas of the site can remain dynamic. Of course, the functions for this must be structured so they don't need WordPress's entire library infrastructure - but for example, dynamic counters or displays of currently active users or similar features can thus remain dynamic in the cached pages. Matt Mullenweg uses it, for example, to display a random image from his library even on cached pages. Staticize simply deletes the entire cache when a post is created or changed - very primitive and with many files in the cache it can take a while, but it's very effective and pragmatic.

Which caches should you sensibly deploy and how? With more complex systems, I would always check whether I can deploy a PHP code cache - so turck mCache or Zend Optimizer or phpAccelerator or whatever else there is.

I would personally only activate the application cache itself when it's really necessary due to load - with WordPress you can keep a plugin on hand and only activate it when needed. After all, caches with static page generation have their problems - layout changes only become active after cache deletion, etc.

If you can deploy a reverse proxy and the resources on the machine are sufficient for it, it's certainly always recommended. If only because you then experience the problems yourself that might exist in your own application regarding proxies - and which would also cause trouble to every user behind a web proxy. Especially if you use Zope, for example, there are very good opportunities in Zope to improve the communication with the reverse proxy - a cache manager is available in Zope for this. Other systems also offer good fundamentals for this - but ultimately, any system that produces clean ETag and Last-Modified headers and correctly handles conditional GET (conditional accesses that send which version you already have locally and then only want to see updated contents) should be suitable.

Vesper Against Expansion of FMO

Translation

Anyone who has ever taken off or landed at FMO knows that State Building Minister Vesper is right in assessing that extending the runway is nonsense - because I don't know of any other airport as dead as the one in MĂĽnster and OsnabrĂĽck. It's simply a wasteland. And a longer runway won't help with that - nor will the fancy terminal expansion. Now the terminal where nothing happens is simply bigger than before. And with the longer runway, larger aircraft could land and take off.

Apple - iLife

The new iLife05 sounds good too - especially of course the new iPhoto with RAW file support. What's ultimately quite annoying for me: I then have to ask myself even more why I actually bought iView Media Pro in the first place - especially since iPhoto 5 now has the calendar view and keyword search has been available for a long time anyway.

Evil secret paternity tests

I'm taking Pepilog's reaction as a starting point here. Simply because he's someone I read in my aggregator. But what I'm addressing applies to all men.

What absolutely fascinates me about this discussion is the rather twisted argumentation on the topic. The argument of the cuckoo child and the cheating mother keeps coming up—and how a man could defend himself against it, if not through secret paternity tests.

Now let's think about it from the other direction for a moment. What if secret paternity tests were legal? What would the consequences be: fathers could provide material from the—potentially shared—child for genetic analysis without the consent or knowledge of the mother. Mind you, the child probably wouldn't be asked either, which wouldn't be possible anyway at an infant age. But this would apply to all fathers—even those where the father only believes he's not the father because he's pathologically jealous or wants to shirk responsibility or has somehow gotten it into his head that the child isn't his. I'm exaggerating deliberately now—after all, the other side keeps acting like the cheating mother is the most normal situation in the world. But paternity support evasion by fathers is demonstrably a common situation...

The argument goes that one couldn't give the right to authorize the analysis to the woman alone, because then the man would be disadvantaged—he couldn't protect himself against cuckoo children. On the other hand, the secret test completely excludes the woman—how does that fit into this supposedly fair discussion? With a secret test, both the woman and the child are excluded to the point that they don't even know about it—unless the father unilaterally decides what he's doing.

The secret paternity test disrespects the woman's right to a say (it's not about the woman's sole right to decide—it's about having a say!) by completely negating that participation. It also disrespects the child's right—though one could debate whether at the point in time when this situation typically occurs (child in infancy) the child's right to self-determination should be weighted higher than the parents' right to a say.

Additionally, what really bothers me is the matter-of-factness with which the cheating mother is assumed—here something is being demanded that probably doesn't apply in 90 percent of cases—most couples probably still know quite well that their children are theirs together.

Actually, the whole thing is the archetypal self-reassurance obsession of men who, in every restriction of their absolute freedom, immediately suspect they're being discriminated against—something like an emancipation envy spreading since women have been fighting for their rights.

Sorry, but I have to tell us men something: a restriction of freedoms to protect the rights of others is not necessarily discrimination against yourself!

And it's precisely to protect the rights of the woman—namely her right to a say in matters concerning the child—that secret paternity tests definitively violate. Because no matter whether the father is the father or not: the woman is demonstrably the mother. Her rights are in no way in doubt. But they are violated by the secret test and denied to her—on the grounds that the man has no other way to defend himself. Poor male society, if such fears move us...

Secret VW guideline for payments to politicians?

It's certainly reassuring to see how parties take care of their politicians. No, that's definitely not corruption, you can't even imagine such a thing, that politicians are corrupt and that corporations expect benefits from blowing money up politicians' asses.

Glossary for WordPress

I've written a small Wordpress Plugin that implements a glossary similar to Radio Userland or PyDS. The glossary simply replaces text that is delimited by | (pipe) symbols with replacement text (which can also contain XHTML markup). Saves typing ...

The plugin installs a small management page in the Wordpress backend, so the whole thing only works with Wordpress 1.5 (or possibly 1.3). The required database table is automatically created upon plugin activation when you first access the management page.

heise Security - News - Uncovered and Charged

How companies try to compensate for their incompetence through power. I hope the French court decides against these absurd demands. Full disclosure is often the only way for customers to defend themselves against stubbornness and unwillingness on the part of manufacturers - you can see this nicely in history, how companies (even industry giants like SUN) refused for years to acknowledge bugs and only were ultimately forced to take action through mailing lists like Bugtraq.

Companies must finally understand that security doesn't work in a quiet back room, but is only real security when it can withstand public scrutiny and analysis. Security by obscurity is no security at all ...

Comment Spam

Since comment spam has been increasingly occurring on WordPress blogs lately and I don't want to have to react only after it lands in the spam folder, I've proactively installed Spam-Karma. It's a pretty powerful tool where, fortunately, you can disable a lot of options. I hope this will prevent what will certainly be an onslaught on my comment function at some point.

Of course, such a tool always has potential negative side effects. So if you can't get a comment through, there's still the regular feedback form which sends a completely normal - and unfiltered - email to me. As long as it makes it through my mail spam filter, I'll know what's going on (with 300-400 spams a day just at home, I can't guarantee that I'll notice an email that was mistakenly flagged as spam - though apparently not much gets lost that way, statistical spam filters do their job).

It's kind of strange how we have to artificially neuter our communication tools just because people tend to exploit anything that can be exploited eventually...

Update: after it ate a trackback from Schockwellenreiter, I've disabled it for now. The main problem was that the trackback was eaten with an error message that supposedly was fixed in exactly the version I'm using.

Round and Healthy: Jan Ullrich in Mallorca

Business as usual at Team Telekom T-Mobile. Wild rumors, silly ideas (Zabael not going to the Tour? What nonsense) and a too-fat Jan Ullrich. And the competition is laughing up their sleeves again ... Well, if Armstrong really doesn't ride the Tour, at least one of the other riders (Klöden or Basso?) has a chance.

And Mobile Phone Cramp Again

My mobile phone contract was about to end again and T-Mobile was eager to get me to extend it. So they threw phones at me. A Motorola E398 is what I ended up with - hey, my most modern phone was a Nokia 6110 and Jutta grabbed that one so I was left with just the S3 Com if I didn't want to use my work phone ...

Well, the E398 is nice - it has everything you can imagine. And a bit more. If you want to know the technical specs, Motorola will happily tell you. I'm really only interested in one thing about all this fuss: Apple can exchange data with the phone and use it as a modem, but iSync only synchronizes with it via cable. Why must everything in the mobile phone environment always be completely illogical, complicated and confusing?

Oh yes, and the fact that the Motorola manual contains a lot of text but explains nothing in many parts, I don't need to mention separately. The documentation of all the options you can set is crammed in there: these options are listed again in the manual. And named. And that's it. No explanation whatsoever of what exactly you're supposed to enter there and where you'd get the information. And of course everyone immediately knows what to make of APN, IMPS, etc. Just like you naturally know straight away which IM technology the IM client uses when it doesn't say anywhere. Only with Google's help was I able to figure some things out.

Mobile phones are stupid.