sysadmin - 22.11.2004 - 27.1.2005

Stupid Spambot at Work

Right now a pretty stupidly constructed spambot is hammering away at my comment function and clogging up my moderation queue - nothing gets through from it because it's so stupid that it posts everything in plain text, loads of links and typical spam words. So it gets caught by the most basic filters. Nonetheless, something like this can of course have fallout - namely comments from others that end up in moderation (e.g. because the number of links is too high) could be overlooked by me in the mess of hundreds of spam comments and accidentally deleted along with it. If that happens, it's not personal. I just don't feel like scrutinizing carefully when dealing with several hundred spam comments to make sure I'm really only deleting spam...

Update: After taking a closer look at it, I've put it in /dev/null for now - the moderation queue is no longer burdened by it and legitimate moderated comments won't accidentally get deleted. What struck me during the closer examination: a large number of very widely scattered IP addresses are being used. Sounds very much like a botnet, especially since the IP addresses, based on spot checks, appear to all be dynamic dialup addresses. So our friends with remotely controlled Windows machines are once again the horse that spam rides on here. Great. Thanks, Microsoft...

No more direct access to newsgroups at AOL - we could now dream that September comes to an end ...

SCO vs. Linux: SCO Finds IBM's Code Demands Unreasonable

SCO vs. Linux: SCO finds IBM's code demands unreasonable. Amusing - crying for code themselves, but unable to hand over their own. And if they would actually be so blocked by the release of their own code - how do they want to sift through the vastly larger amounts of code from IBM? It's remarkable that the SCO people aren't embarrassed about this whole mess...

WP-Questionnaire Plugin

Ok, I've finished the plugin for Wordpress 1.5. Simple thing - a plugin and a small management page where you can set up various questions. To install you download the plugin and simply copy the files to the locations specified in the readme.txt and activate the plugin. Then you just add a few questions in the management section under Questionnaire and you're done. When commenting, a more or less silly question is asked, which should be satisfied with as short an answer as possible (we don't want to annoy the commenters too much). If the answer is correct, the comment - provided no other anti-spam methods kick in first - is released immediately. If the answer is wrong, the comment goes into moderation and must be approved by the admin.

You can of course also build a secret IQ test for your commenters with this and instead of simple questions put small riddles in there - only those who solve them are allowed to comment immediately.

I've activated the plugin on my site, let's see if it has any effects on the commenting behavior of people here. You can share your opinions here about what you think of such an anti-spam methodology.

A fairly interesting possible attack on any captcha solution can be found incidentally in the comments to Eric Meyer's WP-Gatekeeper: you can simply collect and save the comment forms. Additionally, you need a site where you can use these - for example, a site for free porn videos. There you present the captchas to the users of these sites and take their answers. You then send this answer to the saved form and the comment is done. Of course you can also take countermeasures against this - probably best would be an encoded timecode in the form and rejection of a timecode that's too old, since the answers from the porn viewers probably won't come immediately. Interesting approach, the whole thing.

Update: the plugin still has two bugs. For one, it also catches trackbacks (which of course never have the necessary variables) and it can currently still be circumvented pretty easily if you know what to look for in the form - you just need to solve one captcha and then you can spam other comments by changing the comment ID. The latter is actually a bug in many captcha solutions - you fall for it too easily, forgetting to bind the captchas to some form of serial number or similar so that a form can only be used once in that form...

So I'll be making an update to the plugin in the near future.

Update 2: the problem with trackbacks and pingbacks should now be solved. The problem with replay is still in there. I still need to think about that a bit. My previous solution approaches don't really appeal to me for that.

Update 3: I've now switched it off here again. I haven't gotten any comment spam so far and without a compelling reason, even a simple question to answer is pretty annoying...

Internet Explorer Still Vulnerable After Patch

Internet Explorer still vulnerable after patch - which is embarrassing enough in itself. But the Heise editorial recommendation:

In principle, ActiveX is always a gateway for malware and should be disabled if necessary. However, some websites will then no longer function correctly.

is somehow peculiar: I've never really noticed ActiveX as a barrier to visiting any websites. Well, I'm a Mac and Linux user - if websites only worked with ActiveX, I would have noticed it, since it's conceptually impossible for me to run it (not even in IE, because of the wrong processor architecture).

Sure, there are a few Microsoft products that rely on ActiveX - but you really can't claim that it's become widespread out there on the web. So I'd say: disable ActiveX at least for the Internet zone. It has no value there. And in the trusted zones - which I already consider a pretty big euphemism for IE - only enable it if it's really necessary (for example, because an intranet solution unfortunately uses ActiveX). Or install a proper browser for surfing the web. That's the better solution anyway ...

JSch for J2ME - no idea if I'd want to use an SSH client on my phone (text input on a phone is more than annoying), but it would be possible with this...

ModSecurity - Web Intrusion Detection And Prevention / mod_security is an Apache module that examines requests and decides based on filters whether a request should be allowed through or whether a filter measure (script, log, etc.) should be triggered. Quite interesting, even though I'm generally skeptical about rule-based filtering against attacks - it only finds known or expected attacks. The real danger lies in the unexpected attacks...

MT-Blacklist -> Hijacked comments.cgi

MT-Blacklist -> Hijacked comments.cgi - anyone using Moveable Type should disable the comment script. The email verification that checks whether the sender address input doesn't contain junk is broken - which allows you to sneak in additional recipient addresses by separating them from the actual sender address with a line feed. And with that you can happily use MT to spam other people.

A real beginner mistake - the email validation is done with a regex that doesn't match the end of the string and uses dotall - so it only goes up to a possible line feed and ignores everything after it. Really stupid.

confused face

IT&W Reconstructs Mac Video

IT&W reconstructs Mac video - I would link directly, but their server got hammered...

Microsoft lays off Windows testers and switches to automated tests instead. Tool worship has struck again. A rarely stupid idea, because automated tests only find what is automated. They lack the intuition that people (at least if they are good testers) have. But Microsoft software has never given me the impression of particularly good testing anyway...

Virtualization for desktop processors - particularly interesting for server farms. Of course, this can be done today with various VMWare versions, with User-Mode Linux and a few other projects, but support in the CPU naturally makes such solutions more efficient.

SCO vs. Linux: SCO Gets More Material

The seemingly strange decision by the judge in the SCO vs. IBM case is — as usual — explained by Groklaw. The judge's role is not to clarify who is right — that's a different judge's job. Her job is only to ensure that all parties put all relevant material on the table. So it's solely about the investigation documents. Still, this is of course the annoying delay tactic by SCO at work. But it's not the big interim victory for SCO as one might possibly see it.

Secure and anonymous on the Internet with proxies - Guide to using Privoxy and tor. I hadn't linked it here somehow. It's so good, you should read it. You won't become any dumber from it.

Zope Hosting and Performance - English Version

Somebody asked for an english translation of my article on Zope Hosting and Performance. Here it is - ok, it's not so much a direct translation than a rewrite of the story in english. Enjoy.

Recently the Schockwellenreiter had problems with his blog server. He is using Zope with Plone and CoreBlog. Since I am doing professional Zope hosting for some years now, running systems that range in the 2000-3000 hits per minute scale, I thought I put together some of the stuff I learnt (sometimes the hard way) about Zope and performance.

  • The most important step I would take: slim down your application. Throw out everything you might have in the Zope database that doesn't need to stay there. If it doesn't need content management, store it in folders that are served by Apache. Use mod_rewrite to seemlessly integrate it into your site so that people from the outside won't notice a difference. This can be best done for layout images, stylesheets etc. - Apache is much faster in delivering those.
  • Use Zope caching if possible at any rate. The main parameter you need to check: do you have enough RAM. Zope will grow when using caching (especially the RAMCacheManager). The automatic cleanup won't rescue you - Zope will still grow. Set up some process monitoring that automatically kills and restarts Zope processes that grow above an upper bound to prevent paging due to too large memory consumption. This is even a good idea if you don't use caching at all.
  • There are two noteable cache managers: one uses RAM and the other uses an HTTP accelerator. The RAMCacheManager caches results of objects in memory and so can be used to cache small objects that take much time or much resources to construct. The HTTPCacheManager is for using a HTTP accelerator - most likely people will use Squid, but you can use an appropriately configured Apache, too. The cache manager will provide the right Expires and Cache-Control headers so that most traffic can be delivered our of the HTTP accelerators instead of Zope.
  • Large Zope objects kill Zopes performance. When using caching they destroy caching efficiency by polluting the cache with large blobs of stuff that isn't often required and Zope itself will get a drain in performance by them, too. The reason is that Zope output is constructed in-memory. Constructing large objects in memory takes much resources due to the security layers and architectural layers in Zope. Better to create them with cronjobs or other means outside the Zope server and deliver them directly with Apache. Apache is much faster. A typical situation is when users create PDF documents in Zope instead of creating them outside. Bad idea.
  • Use ZEO. ZEO rocks. Really. In essence it's just the ZODB with a small communication layer on top. This layer is used in Zope instances instead of using the ZODB directly. That way you can run several process groups on your machine, all connecting to the same database. This helps with the above mentioned process restarting: when one is down, the other does the work. Use mod_backhand in Apache to distribute the load between the process groups or use other load balancing tools. ZEO makes regular database packs easier, too: they run on the server and not in the Zope instances - they actually don't notice much of the running pack.
  • If you have, use a SMP machine. Or buy one. Really - that helps. You need to run ZEO and multiple Zope instances, though - otherwise the global interpreter lock of Python will hit you over the head and Zope will just use one of the two processors. That's one reason why you want multiple process groups in the first place - distribution of load on the machine itself, making use of multiple processors.
  • You can gain performance by reducing the architectural layers your code goes through. Python scripts are faster than DTML. Zope products are faster than Python scripts. Remove complex code from your server and move it into products or other outside places. This needs rewriting of application code, so it isn't allways an option to do - but if you do, it will pay back.
  • Don't let your ZODB file grow too large. The ZODB only appends on write access - so the file grows. It grows quite large, if you don't pack regularily. If you don't pack and you have multi-GB ZODB files, don't complain about slow server starts ...
  • If you have complex code in your Zope application, it might be worthwile to put them into some outside server and connect to Zope with some RPC means to trigger execution. I use my |TooFPy| for stuff like this - just pull out code, build a tool and hook it into the Zope application via XMLRPC. Yes, XMLRPC can be quite fast - for example pyXMLRPC is a C-written version that is very fast. Moving code outside Zope helps because this code can't block one of the statically allocated listeners to calculate stuff. Just upping the number of listener threads doesn't pay as you would expect: due to the global interpreter lock still only one thread will run at a time and if your code uses C extensions, it might even block all other threads while using it.
  • If you use PostgreSQL, use PsycoPG as the database driver. PsycoPG uses session pooling and is very fast when your system get's lots of hits. Other drivers often block Zope due to limitations like only one query at a time and other such nonsense. Many admins had to learn the hard way that 16 listener threads aren't really 16 available slots if SQL drivers come into play ...

There are more ways to help performance, but the above are doable with relatively small work and are mostly dependend on wether you have enough memory and maybe a SMP machine. Memory is important - the more the better. If you can put memory into your machine, do so. There is no such thing as too-much-memory (as long as your OS supports the amount of memory, of course).

What to do if even those tips above don't work? Yes, I was in that situation. If you come into such a situation, there is only one - rather brutish - solution: active caching. By that I mean pulling stuff from the Zope server with cronjobs or other means and storing it in Apache folders and using mod rewrite to only deliver static content to users. mod rewrite is your friend. In essence you just take those pages that kill you currently and make them pseudo-static - they are only updated once in a while but the hits won't reach Zope at all.

Another step, of course, is more hardware. If you use ZEO it's no problem to put a farm of Zope servers before your ZEO machine (we currently have 5 dual-processor machines running the Zope instances and two rather big, fat, ugly servers in the background for databases, frontend with two Apache servers that look allmost like dwarves in comparisons to the backend stuff).

Zope is fantastic software - don't mistake me there. I like it. Especially the fact that it is an integrated development environment for web applications and content management is very nice. And the easy integration of external data sources is nice, too. But Zope is a resource hog - that's out of discussin. There's no such thing as a free lunch.

LynuxWorks Introduces First User-Mode Linux Software for Apple PowerPC G5 Based on the Linux 2.6 Kernel - this now makes it possible to build logically separated virtual environments under Linux on PPC machines as well.

Zope Hosting and Performance

Shockwave Rider is having problems with his Zope server. Since I've been doing professional Zope hosting in my company for several years now and run quite a few massive portals (between 2000 and 3000 hits per minute are not uncommon - though distributed across many systems), here are some tips from me on scaling Zope.

  • The most important step I would recommend to everyone is to streamline. Remove from Zope everything that doesn't need to be there - what can be created statically, what rarely changes, where no content management is needed: get rid of it. Put it in regular Apache directories. Use Apache's mod_rewrite to ensure the old URLs still work, but are served from Apache. This especially applies to all those little nuisances like layout graphics - they don't need to come from Zope, they're better served from Apache.
  • Use Zope caching whenever possible. Whenever possible means: enough memory on the server so that even memory-hungry processes have some breathing room. Generally, Zope's built-in caching causes processes to get fatter and fatter - the cleanup in its own cache is quite useless. So implement process monitoring that shoots down and restarts a Zope process when it uses too much memory. Yes, that really is sensible and necessary.
  • There are two good caching options in Zope: the RAMCacheManager and the HTTPCacheManager. The former stores results of Zope objects in main memory and can therefore cache individual page components - put the complex stuff in there. The second (HTTPCache) works together with Squid. Put a Squid in front of your Zope as an HTTP accelerator and configure the HTTP Cache Manager accordingly so that Zope generates the appropriate Expire headers. Then a large part of your traffic will be handled by Squid. It's faster than your Zope. Alternatively, you can configure an Apache as an HTTP accelerator with local cache - ideal for those who can't or don't want to install Squid, but do have options for further Apache configuration.
  • Large Zope objects (and I mean really large in terms of KB) kill Zope. With caching they destroy your best cache strategy, and Zope itself becomes incredibly slow when objects get too large. The reason lies in Zope's architecture: all objects are first laboriously pieced together through multiple layers by various software layers. In memory - and therefore take up corresponding space in memory. Get rid of complex objects with huge KB numbers. Make them smaller. Create them statically via cron job. Serve them from Apache - there's nothing dumber than storing all your large PDFs in Zope in the ZODB, or even generating them dynamically there.
  • Install ZEO. That thing rocks. Basically it's just the ZODB with a primitive server protocol. What's important: your Zope can be split into multiple process groups. You want this when you're using process monitoring to kill a rogue Zope process, but want the portal to appear as undamaged as possible from the outside - in that case just add mod_backhand to Apache, or another balancing technique between Apache and Zope. Additionally, ZEO also makes packing the ZODB (which should run daily) easier, since the pack runs in the background on the ZEO and the Zope servers themselves aren't greatly affected.
  • If you have it, use an SMP server. Or buy one. Really - it brings a lot. The prerequisite is the aforementioned technique with multiple process groups - Python has a global interpreter lock, which means that even on a multiprocessor machine, never more than one Python thread runs at a time. Therefore you want multiple process groups.
  • Performance is also gained by disabling layers. Unfortunately this often can only be realized with software changes, so it's more interesting for those who build it themselves. Move complex processes out of the Zope server and put them in Zope Products. Zope Products run natively without restrictions in the Python interpreter. Zope Python scripts and DTML documents, on the other hand, are dragged through many layers that ensure you respect Zope's access rights, don't do anything bad, and are generally well-behaved. And they make you slower. Products are worthwhile - but cost work and, unlike the other technical tips, aren't always feasible.
  • Additionally, it has proven useful not to put too much data in the ZODB, especially nothing that expands it - the ZODB only gets bigger, it only gets smaller when packing. After some time you easily have a ZODB in the GB range and shouldn't be surprised by slow server starts...
  • If more complex processes occur in the system, it can make sense to outsource them completely. I always use TooFPy for that. Simply convert all the more complex stuff into a tool and stick it in there - the code runs at full speed. Then simply access the tool server from Zope with a SOAP client or XMLRPC client and execute the functions there. Yes, the multiple XML conversion is actually less critical than running complex code in Zope - especially if that code demands considerable runtime. Zope then blocks one of its listeners - the number is static. And simply pushing it up doesn't help - thanks to the global interpreter lock, only more processes would wait for this lock to be released (e.g., for every C extension that's used). There's a good and fast C implementation for XMLRPC communication that can be integrated into Python, making the XML overhead problem irrelevant.
  • If you use PostgreSQL as a database: use PsycoPG as the database driver. Session pooling really gets Zope going. Generally you should check whether the corresponding database driver supports some form of session pooling - if necessary via an external SQL proxy. Otherwise, Zope might hang the entire system during SQL queries because a heavy query waits for its result. Many have already fallen into this trap and learned that 16 Zope threads doesn't necessarily mean 16 parallel processed Zope accesses when SQL databases are involved.

Of course there's a lot more you can do, but the above are largely manageable on the fly and mainly depend on you having enough memory in the server (and possibly a multiprocessor machine - but it works without one too). Memory is important - the more the better. If you can, just put more memory in. You can't have too much memory...

What to do if even all that's not enough (yes, I've had that - sometimes only the really heavy-handed approach helps). Well, in that case there are variations of the above techniques. My favorite technique in this area is active caching. By this I mean that Zope is configured at one point for which documents should be actively cached. This then requires a script on the machine that fetches the pages from Zope and puts them in a directory. Apache rewrite rules then ensure that the static content is served from the outside. Basically you're ensuring that the pages most frequently visited and suitable for this technique (i.e., for example, containing no personalization data) simply go out as a static page, no matter what else happens - the normal caching techniques just aren't brutal enough, too much traffic still goes through to the server.

Another step is of course the use of additional machines - simply put more machines alongside and connect them using the ZEO technique.

Zope is fantastic software - especially the high integration of development environment, CMS, and server is often incredibly practical, and the easy integration of external data sources is also very nice. But Zope is a resource hog, you have to put it that simply.

Cyclic Dependencies

Debian has a wonderful package system. And it has a whole range of very useful tools to make backports easier - for example, by using debootstrap to set up a chroot environment where you can safely gather the packages you need for the build and then create a corresponding package. I've used the whole thing several times, it's really great.

However, it can sometimes drive you crazy. I wanted to install the latest SQLite from Debian Testing. To do that, I first need the necessary tools to build the package. Since I had just set up a new chroot environment, not everything was there yet - for example, I was missing cdbs, a very powerful (and by now widely used) tool for easy creation of Debian packages. I had ported it once before, but I thought the opportunity was good to build a current version.

Or so I thought. It started off quite harmlessly - for the documentation it needs springgraph - a tool for formatting graphs. The tool itself actually has no build dependencies (except for the mandatory debhelpers). Fine. It also builds very quickly. When installing it, it complains about missing Perl modules for the GD2 integration. Okay, porting Perl modules is often tedious, but this one actually looked quite simple. A series of build dependencies, sure, but otherwise harmless. Except for the fact that it needs cdbs to build.

Aaaaarghl!!!!

Okay, I know what you have to do. Still. Sometimes I get the feeling that the Debian maintainers secretly get together to drive me crazy.

DNS Stuff: DNS tools, WHOIS, tracert, ping, and other network tools.

A whole bag full of tools around nameservers, reachability etc. Very practical when you want to quickly check whether the reverse resolution of the server address also works reliably from outside. Or when you want to test a whole set of RBLs against an IP (I found rbls.org for that recently). Email tests. Routing information. And more ...

Google receives patent on search term highlighting

Google Gets Patent on Search Term Highlighting - and this means my website violates exactly this patent. Thanks to the Search Highlight plugin for WordPress (which comes as standard), search terms are highlighted in color when visitors come to my pages from a search engine. Well, sue me then, Google ...

Patents are problematic enough as it is, but such trivial patents are just infuriating.

PECL :: Package :: APC - PHP caching system, Open Source (no weird stunts like phpAccelerator and not as dead as turck mmCache)

SURBL -- Spam URI Realtime Blocklists - Real-time blocking list that can check hostnames from URLs.

Caching for PHP Systems

Caching Strategies for PHP-Based Systems

There are basically two ways to implement caching in a PHP-based system. Okay, there are many more, but two main approaches are clearly identifiable. I've compiled what's interesting in this context - especially since some colleagues are currently suffering under high server load. The whole thing is kept general, but for understandable reasons also considers the specific implications for WordPress.

  • Caching of pre-compiled PHP pages
  • Caching of page output

There are numerous variations for both main approaches. PHP pages themselves exist on web servers as source code - unprocessed and not optimized in any way for the loading process. With complex PHP systems running, parsing and compiling into internal code happens for every PHP file. With systems that have many includes and many class libraries, this can be quite substantial. The first main direction of caching starts at this point: the generated intermediate code is simply stored away. Either in shared memory (memory blocks that are available to many processes of a system collectively) or on the hard disk. There are a number of solutions here - I personally use turck-mmcache. The reason is mainly that it doesn't cache in shared memory but on the disk (which as far as I know the other similar solutions also do) and that there is a Debian package for turck-mmcache. And that I've had relatively few negative experiences with it so far (at least on Debian stable - on Debian testing things are different, where PHP applications crash on you). Since WordPress is based on a larger set of library modules with quite substantial source content, such a cache brings quite a bit to reduce WordPress's baseline load. Since these caches are usually completely transparent - with no visible effects except for the speed improvement - you can also generally enable such a cache.

The second main direction for caching is the intermediate storage of page contents. Here's a special feature: pages are often dynamically generated depending on parameters - and therefore a page doesn't always produce the same output. Just think of mundane things like displaying the username when a user is logged in (and has stored a cookie for it). Page contents can also be different due to HTTP Basic Authentication (the login technique where the popup window for username and password appears). And POST requests (forms that don't send their contents via the URL) also produce output that depends on this data.

Basically, an output cache must consider all these input parameters. A good strategy is often not to cache POST results at all - because error messages etc. would also appear there, which depending on external sources (databases) could produce different outputs even with identical input values. So really only GET requests (URLs with parameters directly in the URL) can be meaningfully cached. However, you must consider both the sent cookies and the sent parameters in the URL. If your own system works with basic authentication, that must also factor into the caching concept.

A second problem is that pages are rarely purely static - even static pages certainly contain elements that you'd prefer to have dynamically. Here you need to make a significant decision: is purely static output enough, or does a mix come in? Furthermore, you still need to decide how page updates should affect things - how does the cache notice that something has changed?

One approach you can pursue is a so-called reverse proxy. You simply put a normal web proxy in front of the web server so that all access to the web server itself is technically routed through this web proxy. The proxy sits directly in front of the web server and is thus mandatory for all users. Since web proxies should already handle the problem of user authentication, parameters, and POST/GET distinction quite well (in the normal application situation for proxies, the problems are the same), this is a very pragmatic solution. Updates are also usually handled quite well by such proxies - and in an emergency, users can persuade the proxy to fetch the contents anew through a forced reload. Unfortunately, this solution only works if you have the server under your own control - and the proxy also consumes additional resources, which means there might not be room for it on the server. It also heavily depends on the application how well it works with proxies - although problems between proxy and application would also occur with normal users and therefore need to be solved anyway.

The second approach is the software itself - ultimately, the software can know exactly when contents are recreated and what needs to be considered for caching. Here there are again two directions of implementation. MovableType, PyDS, Radio Userland, Frontier - these all generate static HTML pages and therefore don't have the problem with server load during page access. The disadvantage is obvious: data changes force the pages to be recreated, which can be annoying on large sites (and led me to switch from PyDS to WordPress).

The second direction is caching from the dynamic application itself: on first access, the output is stored under a cache key. On the next access to the cache key, you simply check whether the output is already available, and if so, it's delivered. The cache key is composed of the GET parameters and the cookies. When database contents change, the corresponding entries in the cache are deleted and thus the pages are recreated on the next access.

WordPress itself has Staticize, a very practical plugin for this purpose. In the current beta, it's already included in the standard scope. This plugin creates a cache entry for pages as described above. And takes parameters and cookies into account - basic authentication isn't used in WordPress anyway. The trick, though, is that Staticize saves the pages as PHP. The cache pages are thus themselves dynamic again. This dynamism can now be used to mark parts of the page with special comments - which allows dynamic function calls to be used again for these parts of the page. The advantage is obvious: while the big efforts for page creation like loading the various library modules and reading from the database are completely done, individual areas of the site can remain dynamic. Of course, the functions for this must be structured so they don't need WordPress's entire library infrastructure - but for example, dynamic counters or displays of currently active users or similar features can thus remain dynamic in the cached pages. Matt Mullenweg uses it, for example, to display a random image from his library even on cached pages. Staticize simply deletes the entire cache when a post is created or changed - very primitive and with many files in the cache it can take a while, but it's very effective and pragmatic.

Which caches should you sensibly deploy and how? With more complex systems, I would always check whether I can deploy a PHP code cache - so turck mCache or Zend Optimizer or phpAccelerator or whatever else there is.

I would personally only activate the application cache itself when it's really necessary due to load - with WordPress you can keep a plugin on hand and only activate it when needed. After all, caches with static page generation have their problems - layout changes only become active after cache deletion, etc.

If you can deploy a reverse proxy and the resources on the machine are sufficient for it, it's certainly always recommended. If only because you then experience the problems yourself that might exist in your own application regarding proxies - and which would also cause trouble to every user behind a web proxy. Especially if you use Zope, for example, there are very good opportunities in Zope to improve the communication with the reverse proxy - a cache manager is available in Zope for this. Other systems also offer good fundamentals for this - but ultimately, any system that produces clean ETag and Last-Modified headers and correctly handles conditional GET (conditional accesses that send which version you already have locally and then only want to see updated contents) should be suitable.

heise Security - News - Uncovered and Charged

How companies try to compensate for their incompetence through power. I hope the French court decides against these absurd demands. Full disclosure is often the only way for customers to defend themselves against stubbornness and unwillingness on the part of manufacturers - you can see this nicely in history, how companies (even industry giants like SUN) refused for years to acknowledge bugs and only were ultimately forced to take action through mailing lists like Bugtraq.

Companies must finally understand that security doesn't work in a quiet back room, but is only real security when it can withstand public scrutiny and analysis. Security by obscurity is no security at all ...

Comment Spam

Since comment spam has been increasingly occurring on WordPress blogs lately and I don't want to have to react only after it lands in the spam folder, I've proactively installed Spam-Karma. It's a pretty powerful tool where, fortunately, you can disable a lot of options. I hope this will prevent what will certainly be an onslaught on my comment function at some point.

Of course, such a tool always has potential negative side effects. So if you can't get a comment through, there's still the regular feedback form which sends a completely normal - and unfiltered - email to me. As long as it makes it through my mail spam filter, I'll know what's going on (with 300-400 spams a day just at home, I can't guarantee that I'll notice an email that was mistakenly flagged as spam - though apparently not much gets lost that way, statistical spam filters do their job).

It's kind of strange how we have to artificially neuter our communication tools just because people tend to exploit anything that can be exploited eventually...

Update: after it ate a trackback from Schockwellenreiter, I've disabled it for now. The main problem was that the trackback was eaten with an error message that supposedly was fixed in exactly the version I'm using.

symbolics.com is the oldest still registered domain. Very cool. By the way, I have a Symbolics sitting in my room.

Apple - Mac mini

Come on, now tell us which blockhead at Apple accidentally stepped on a Cube

kasia in a nutshell: Spam breeds more spam

Kasia is conducting a fascinating experiment: she simply leaves two comment spam entries standing and waits for Google to index them. Less than 24 hours later, this entry was bombarded with spam - several hundred pieces.

One can therefore conclude that the spambots work at least partially in two stages and that it really is about Google ranking. The first entry is, so to speak, a test entry. If it remains standing so that it can be found again via Google, it is an entry where one can spam well - it is unattended and is indexed quickly by Google. Ideal fodder for spammers.

Google is thus an integral tool and target simultaneously for the spammers. One can certainly reduce the wind from the spammers' sails through technical separation of one's own comments (as my old blog had, where the comments were not only on a separate page behind a popup link, but additionally also on a completely different web server) and through indexing prohibition for these comment addresses. You would still be caught by the test samples, but the gigantic momentum afterward should be absent.

This could possibly also explain the Schockwellenreiter's problems: due to its exposed position, Google should visit it very frequently and if a spam comment once remains standing longer and could be indexed (it could also only happen by the spammer's luck if they spam just before Google's visit) the spammer has entered the server into spam lists. In principle, he only needs to have found the Schockwellenreiter once via Google regarding his test spams.

Now I just need to come up with a good idea how to implement the whole thing for WordPress. Popup comments already exist, but I would also have to place it on a different virtual address and exclude search engines there via robots.txt.

Linux: Tuning The Kernel With A Genetic Algorithm

Cool - Genetische Algorithmen zur Kernel-Optimierung to use, that's something.

It's cool, man!

However, at some point the problem comes that the kernel is smarter than its user ...

Validation of WordPress Postings and Comments - I should take a look at that. If you already have a validating blog, it should stay that way...

Problems with Firefox and Thunderbird on OS X 10.2

I recently wrote about (P2984) the problems I've been having with Firefox on OS X. It has since turned out what's causing it. It's the Codetek Virtual Desktop Manager. As soon as it's active (I constantly have lots of windows open and otherwise can't find anything in the mess - and no, Expose wouldn't really help either), both Firefox and Thunderbird exhibit various misbehaviors:

  • after startup the menu is empty. You first have to click in the background and then back on the application window for the menu to work properly
  • the keyboard focus isn't always correct. Then you have to do the same as with the missing menu.
  • after switching desktops (or also when normally hiding and showing the application again) the window is completely empty - only resizing it brings the content back.

As I said, this only happens with the desktop manager. Unfortunately I can't use Expose because I don't have 10.3. Besides, it wouldn't solve my problem: I need many parallel workspaces in which I have all the windows open for the respective task. Expose would only handle that very inadequately.

Bummer.

confused face

QEMU CPU Emulator

Hey, I didn't know about this yet: an emulator for various CPUs with just-in-time compilation and support for a whole mix of target and host CPUs. For example, emulating an Intel chip on PPC. Or conversely a PPC on Intel. Or ARM on PPC. And Sparc as a target is already in the works.

Particularly interesting for Linux users: it can do user emulation or system emulation. The latter does what Virtual PC does - present a virtual machine. The former simply offers the ability to run binaries for a different CPU on your own computer, even if you have a different CPU. For example, running Intel binaries on a Linux PPC - without major system emulation.

Due to the just-in-time compilation, the whole thing should also be significantly faster than Bochs. For OS X there's a graphical launcher that also handles the installation of qemu right away. Unfortunately only from OS X 10.3 onwards. Here's the original article.

RBL Test Pages for Multiple RBLs at Once

For those like me who don't have time to chase after thousands of RBLs (lists of possible or alleged spam relays) to check whether someone has mistakenly listed their own server there again, these two links offer good services: they check a large set of RBLs all at once. The first link is the faster one:

LXer: RANT_MODE=1: Current generation shells -- Will Microsoft Ever Fill The Needs of the Enter ...

Paul Ferris tears apart Microsoft's announcement of a great shell environment for Longhorn in 2007. And as I see it, he hits the nerve: a shell today (and if possible one that already has a few decades of conceptual experience under its belt) is worth more than empty promises for 2007 ...

Here's the original article.

Easy-to-Remember PINs

And speaking of stupid: the British are currently introducing smart cards - credit cards with chips. And the credit card companies seem to be publicly recommending changing the PIN for the chip card. That is, changing the randomly generated PIN to a different one - and specifically an easier to remember one. Like nice things such as birthdays or lucky numbers.

I really only have one question about this: how high must be the beef consumption among the people who came up with this stupid campaign?

At Schneier on Security you can find the original article.

Hartz IV: Disaster in Unemployment Benefit II Payment [Update]

Botch. Total botch. With such monster projects, you always do a test run with real data in advance - to avoid exactly these kinds of catastrophes. But these federal bunglers have already shown with other major projects that they might know a thing or two, but they have no clue about IT.

confused face

The problem at hand is a banal interface issue that shouldn't have come up at this stage of the project - unless the people implementing it are completely incompetent and stupid.

The original article can be found at heise online news as the original article.

eBay could not prevent password theft

We take this problem very seriously. - sure, and pigs can fly.

At heise online news there is the original article.

Everything new for OS/2

OS/2 - yes, there was such a sad system back then, long long ago

At heise online news there's the original article.

German WordPress Community

For Wordpress there is a German community website with documentation, tips and tricks. Perhaps interesting for one or another - I still get pimples from PHP, but if it has to be PHP and this glorified index-file-handler called MySQL, then please something like Wordpress Here's the original article.

EFF & TOR

Good news: EFF will support TOR (The Onion Router). That's a good opportunity to point out the excellent guide on using TOR and Privoxy. With it, you can not only reliably cover your tracks (you can't erase anything, as becomes clear again and again - but you don't have to make it unnecessarily easy for people) but also defend yourself against overly curious websites. All in all, a very sensible thing.

Update: I've installed a tor server on simon.bofh.ms. If this doesn't completely eat up my bandwidth (I have 250 GB free space on the server, which should be sufficient) and the server performance doesn't suffer either, it will become a permanent installation. Projects like tor live on the fact that as many people as possible participate and provide resources.

And tor is practically end-user friendly - although network speed over tor is of course not comparable to raw network connection. Concepts like Onion Routing always have performance implications. While tor is slower than naked internet access, it's quite usable - unlike freenet, for example, where access to sites becomes an absolute ordeal.

At raben.horst I found the original article.

EU Court President Confirms Sanctions Against Microsoft [Update]

Good then.

At heise online news there is the original article.

IRC, identd and Privacy

IRC and Privacy

IRC is fundamentally a privacy problem when it comes to data protection: on one hand, an IRC user reveals quite a bit of data through their client and client connection — not necessarily more than with a web browser, but still enough to identify them. On the other hand, IRC is precisely the kind of place where people voluntarily say a lot about themselves — or at least claim to. So it makes sense that people want to appear anonymously on IRC — perhaps not in technical support channels, but there are other channels too.

So it seems natural to simply access the IRC network of your choice via Tor and thus achieve technical anonymization.

However, this presents some specific problems with IRCNet in Germany: on one hand, connections are not accepted from all external computers, and on the other hand, identd user resolution is required. Both of these, of course, create problems with anonymizing networks: I cannot ensure that I access a network through these methods and always come from a German node — the whole point of anonymization is precisely to distribute access across the entire world.

Additionally, an identd query creates a problem: it would have to be handled on the Tor server from which the connection goes out. This can certainly be done — there are identd servers that simply return default values for queries. But nonetheless, it's certainly a strange situation: in order to access IRC I have to allow access to my computer. By the way, this already creates a problem with firewalls if they don't properly provision identd responses.

The reason is of course clear: the network administrators want to ensure they have at least minimal control over what connects to their servers. An understandable requirement. On the other hand, this makes it difficult to operate, for example, help forums on the German IRCNet — I know from my own experience with a channel that it's absolutely not trivial for many users to configure their client accordingly. And anonymizing networks are completely left out.

I have no idea what the solution is here — except to move a help forum to a network that doesn't have these problems.

By the way, OS X users have another problem: IRC clients with SOCKS support (necessary for Tor) are few and far between. socat can help here — with it you can create a connection to a service via a SOCKS proxy without the client software having to support it. However, installing and using socat is not necessarily beginner-friendly. It's a shame that Apple hasn't implemented an appropriate mechanism in the operating system itself that would automatically use a SOCKS proxy — regardless of whether the client software supports it or not.

Who says safe computing must remain a pipe dream?

Bruce Schneier with a few tips about computer security. I want to share the most important one here, because I absolutely agree with it: If possible, don't use Microsoft Windows. Buy a Macintosh or use Linux.

Teufelsgrinsen

Here's the original article.

Federal Court of Justice issues ruling on domain grabbing

Hmph. On the one hand good - because not everyone can just have their domain taken away. On the other hand, though, also bad - the domain grabbers will be happy now. And anyone who's dealt with such people knows how amusing it will be to get a domain back out from someone who just registered it as a speculative object. Whether we'll have to expect similar conditions as in the USA in the near future remains to be seen. At least with .com, .net and .org addresses, one increasingly finds only generic pages from grabbers who are setting up shop there.

At heise online news there's the original article.

SCO vs. Linux: A Journalistic Revelation

The Latest Absurdities from Absurdistan

Teufelsgrinsen

At heise online news you can find the original article.

Oops

Teufelsgrinsen

At Die wunderbare Welt von Isotopp I found the original article.

Pro-Linux News: Daffodil Replicator becomes free software - Replicator that is database-agnostic and supports PostgreSQL among others

Criticism of ITU Proposals for Network Management

And again new brain cramps from the ITU - this time even with obvious evidence of this bureaucratic monster's technical incompetence. Sorry, but geographically or politically oriented allocation of IP addresses is simply and plainly gross nonsense - merely submitting such a proposal already disqualifies one from further discussions about technical matters of the Internet ...

Yes, RIPE and other IP registries are certainly also geographically organized - but their organization is based on the rough geographic structure of network topology. Breaking that down to such silly concepts as nation states would be utter lunacy.

At heise online news you can find the original article.

ATM as Gaming Console!

Ouch. That almost hurts.

I found the original article at Uhu's Weblog Droppings.

Gap in Sun's Java Plug-ins Grants Access to the System

Holla the Forest Fairy! Now it's getting exciting, when the first cross-platform worm or virus will be based on this.

From a purely technical standpoint, something like that would certainly be pretty cool nonetheless

At heise online news you can find the original article.

Digital Lumber, Inc. - A complete nameserver in Python