WordPress : Tackling Comment Spam is a fairly comprehensive compilation of various approaches to combat comment spam and trackback spam in WordPress.

WordPress NoFollow Plugin

The WordPress NoFollow Plugin adds rel="nofollow" to links in comments to remove their Google ranking. While I personally find it a shame that links in comments are generally not followed, thus removing the useful opportunity for smaller blogs to promote their own through active discussion in other blogs. Okay, in the end it's not that bad, but somehow a small piece of the "one link washes the other" mentality of blogs is lost... A small handicap is that the author has directly linked the plugin and unfortunately his server executes the PHP directly. At the moment, you can't download it, you only get an empty HTML page.

Candidate for the award of the most absurd WordPress plugin: a Code39 barcode generator. Well, maybe then people who print out websites can actually do something with it.

ChapterZero » IllustRender - you can also take the previous approach to extremes with LaTeX: this one even embeds graphics via Ghostscript ...

Using LaTeX in WordPress » LatexRender as a plugin - yes, it is what you would expect: an integration of LaTeX into WordPress. Weird.

Morganically Grown » MiniPosts Plugin for WordPress - a plugin for blogmarks - these small titleless postings. I still make those with a patched template and my own category.

no status quo » RunPHP WordPress Plugin executes PHP code directly in posts. This gives you something like < ?php echo "Macros"; ?> in WordPress.

PHP Markdown - newer version than in Wordpress CVS. But I've sworn off Markdown - the performance was sometimes absurdly high.

Blogs - the new money machine?

A plugin I certainly won't install:

BlogMine enables content targeted ads in both feeds and web pages, simplifies and increases revenue generation for bloggers. The service provides a universal way to monetize all blog related content, regardless of whether it is published to the web or as an RSS feed.

Caching for PHP Systems

Caching Strategies for PHP-Based Systems

There are basically two ways to implement caching in a PHP-based system. Okay, there are many more, but two main approaches are clearly identifiable. I've compiled what's interesting in this context - especially since some colleagues are currently suffering under high server load. The whole thing is kept general, but for understandable reasons also considers the specific implications for WordPress.

Caching of pre-compiled PHP pages
Caching of page output

There are numerous variations for both main approaches. PHP pages themselves exist on web servers as source code - unprocessed and not optimized in any way for the loading process. With complex PHP systems running, parsing and compiling into internal code happens for every PHP file. With systems that have many includes and many class libraries, this can be quite substantial. The first main direction of caching starts at this point: the generated intermediate code is simply stored away. Either in shared memory (memory blocks that are available to many processes of a system collectively) or on the hard disk. There are a number of solutions here - I personally use turck-mmcache. The reason is mainly that it doesn't cache in shared memory but on the disk (which as far as I know the other similar solutions also do) and that there is a Debian package for turck-mmcache. And that I've had relatively few negative experiences with it so far (at least on Debian stable - on Debian testing things are different, where PHP applications crash on you). Since WordPress is based on a larger set of library modules with quite substantial source content, such a cache brings quite a bit to reduce WordPress's baseline load. Since these caches are usually completely transparent - with no visible effects except for the speed improvement - you can also generally enable such a cache.

The second main direction for caching is the intermediate storage of page contents. Here's a special feature: pages are often dynamically generated depending on parameters - and therefore a page doesn't always produce the same output. Just think of mundane things like displaying the username when a user is logged in (and has stored a cookie for it). Page contents can also be different due to HTTP Basic Authentication (the login technique where the popup window for username and password appears). And POST requests (forms that don't send their contents via the URL) also produce output that depends on this data.

Basically, an output cache must consider all these input parameters. A good strategy is often not to cache POST results at all - because error messages etc. would also appear there, which depending on external sources (databases) could produce different outputs even with identical input values. So really only GET requests (URLs with parameters directly in the URL) can be meaningfully cached. However, you must consider both the sent cookies and the sent parameters in the URL. If your own system works with basic authentication, that must also factor into the caching concept.

A second problem is that pages are rarely purely static - even static pages certainly contain elements that you'd prefer to have dynamically. Here you need to make a significant decision: is purely static output enough, or does a mix come in? Furthermore, you still need to decide how page updates should affect things - how does the cache notice that something has changed?

One approach you can pursue is a so-called reverse proxy. You simply put a normal web proxy in front of the web server so that all access to the web server itself is technically routed through this web proxy. The proxy sits directly in front of the web server and is thus mandatory for all users. Since web proxies should already handle the problem of user authentication, parameters, and POST/GET distinction quite well (in the normal application situation for proxies, the problems are the same), this is a very pragmatic solution. Updates are also usually handled quite well by such proxies - and in an emergency, users can persuade the proxy to fetch the contents anew through a forced reload. Unfortunately, this solution only works if you have the server under your own control - and the proxy also consumes additional resources, which means there might not be room for it on the server. It also heavily depends on the application how well it works with proxies - although problems between proxy and application would also occur with normal users and therefore need to be solved anyway.

The second approach is the software itself - ultimately, the software can know exactly when contents are recreated and what needs to be considered for caching. Here there are again two directions of implementation. MovableType, PyDS, Radio Userland, Frontier - these all generate static HTML pages and therefore don't have the problem with server load during page access. The disadvantage is obvious: data changes force the pages to be recreated, which can be annoying on large sites (and led me to switch from PyDS to WordPress).

The second direction is caching from the dynamic application itself: on first access, the output is stored under a cache key. On the next access to the cache key, you simply check whether the output is already available, and if so, it's delivered. The cache key is composed of the GET parameters and the cookies. When database contents change, the corresponding entries in the cache are deleted and thus the pages are recreated on the next access.

WordPress itself has Staticize, a very practical plugin for this purpose. In the current beta, it's already included in the standard scope. This plugin creates a cache entry for pages as described above. And takes parameters and cookies into account - basic authentication isn't used in WordPress anyway. The trick, though, is that Staticize saves the pages as PHP. The cache pages are thus themselves dynamic again. This dynamism can now be used to mark parts of the page with special comments - which allows dynamic function calls to be used again for these parts of the page. The advantage is obvious: while the big efforts for page creation like loading the various library modules and reading from the database are completely done, individual areas of the site can remain dynamic. Of course, the functions for this must be structured so they don't need WordPress's entire library infrastructure - but for example, dynamic counters or displays of currently active users or similar features can thus remain dynamic in the cached pages. Matt Mullenweg uses it, for example, to display a random image from his library even on cached pages. Staticize simply deletes the entire cache when a post is created or changed - very primitive and with many files in the cache it can take a while, but it's very effective and pragmatic.

Which caches should you sensibly deploy and how? With more complex systems, I would always check whether I can deploy a PHP code cache - so turck mCache or Zend Optimizer or phpAccelerator or whatever else there is.

I would personally only activate the application cache itself when it's really necessary due to load - with WordPress you can keep a plugin on hand and only activate it when needed. After all, caches with static page generation have their problems - layout changes only become active after cache deletion, etc.

If you can deploy a reverse proxy and the resources on the machine are sufficient for it, it's certainly always recommended. If only because you then experience the problems yourself that might exist in your own application regarding proxies - and which would also cause trouble to every user behind a web proxy. Especially if you use Zope, for example, there are very good opportunities in Zope to improve the communication with the reverse proxy - a cache manager is available in Zope for this. Other systems also offer good fundamentals for this - but ultimately, any system that produces clean ETag and Last-Modified headers and correctly handles conditional GET (conditional accesses that send which version you already have locally and then only want to see updated contents) should be suitable.

Glossary for WordPress

I've written a small Wordpress Plugin that implements a glossary similar to Radio Userland or PyDS. The glossary simply replaces text that is delimited by | (pipe) symbols with replacement text (which can also contain XHTML markup). Saves typing ...

The plugin installs a small management page in the Wordpress backend, so the whole thing only works with Wordpress 1.5 (or possibly 1.3). The required database table is automatically created upon plugin activation when you first access the management page.

Comment Spam

Since comment spam has been increasingly occurring on WordPress blogs lately and I don't want to have to react only after it lands in the spam folder, I've proactively installed Spam-Karma. It's a pretty powerful tool where, fortunately, you can disable a lot of options. I hope this will prevent what will certainly be an onslaught on my comment function at some point.

Of course, such a tool always has potential negative side effects. So if you can't get a comment through, there's still the regular feedback form which sends a completely normal - and unfiltered - email to me. As long as it makes it through my mail spam filter, I'll know what's going on (with 300-400 spams a day just at home, I can't guarantee that I'll notice an email that was mistakenly flagged as spam - though apparently not much gets lost that way, statistical spam filters do their job).

It's kind of strange how we have to artificially neuter our communication tools just because people tend to exploit anything that can be exploited eventually...

Update: after it ate a trackback from Schockwellenreiter, I've disabled it for now. The main problem was that the trackback was eaten with an error message that supposedly was fixed in exactly the version I'm using.

Canned !! -- my Atropine » iG:Syntax Hiliter - and here's another WordPress plugin that uses Geshi right away.

kasia in a nutshell: Spam breeds more spam

Kasia is conducting a fascinating experiment: she simply leaves two comment spam entries standing and waits for Google to index them. Less than 24 hours later, this entry was bombarded with spam - several hundred pieces.

One can therefore conclude that the spambots work at least partially in two stages and that it really is about Google ranking. The first entry is, so to speak, a test entry. If it remains standing so that it can be found again via Google, it is an entry where one can spam well - it is unattended and is indexed quickly by Google. Ideal fodder for spammers.

Google is thus an integral tool and target simultaneously for the spammers. One can certainly reduce the wind from the spammers' sails through technical separation of one's own comments (as my old blog had, where the comments were not only on a separate page behind a popup link, but additionally also on a completely different web server) and through indexing prohibition for these comment addresses. You would still be caught by the test samples, but the gigantic momentum afterward should be absent.

This could possibly also explain the Schockwellenreiter's problems: due to its exposed position, Google should visit it very frequently and if a spam comment once remains standing longer and could be indexed (it could also only happen by the spammer's luck if they spam just before Google's visit) the spammer has entered the server into spam lists. In principle, he only needs to have found the Schockwellenreiter once via Google regarding his test spams.

Now I just need to come up with a good idea how to implement the whole thing for WordPress. Popup comments already exist, but I would also have to place it on a different virtual address and exclude search engines there via robots.txt.

sYp » Syntax Highlighting with Enscript in WordPress is another WordPress plugin that uses enscript for formatting here.

Validation of WordPress Postings and Comments - I should take a look at that. If you already have a validating blog, it should stay that way...

Plugins/Staticize « WordPress Codex - Mini guide to Staticize. Bookmarked only so I can find it again with mfunc.

What happened here then?

Well, some of you may have noticed: something has changed here. Namely, I've switched my weblog from my own software - PyDS - to WordPress. Why? Well, there are many reasons. Not even the worst of them is: because I can. But the more technical ones are a bit more complicated:

It starts with the database. PyDS uses a rather peculiar database, namely Metakit. Metakit is nice when you have small and compact data, but not so nice when it grows. Eventually it starts to behave strangely. Under certain circumstances it shreds the data. With my nearly 4000 articles, I'm still far from that point, but you don't have to push your luck and wait until the last minute, right?

Then there's the concept of PyDS rendering all articles as static content. That's also quite great, because the files are of course delivered much faster than if they came from a database. Unfortunately, with nearly 4000 posts, you have to wait quite a while for everything to be generated if you make a layout change. For that reason I already have a cronjob that regenerates everything every night. But somehow that's still pretty weird, so away with it.

Besides, I'll eventually want to switch back to my own software - but to do that I'd have to migrate the data anyway. Now I no longer have it in that somewhat wobbly Metakit, but in MySQL. Yes, I know, MySQL sucks dead hamsters through clogged straws. I say that myself all the time. Anyway, it's still significantly better than Metakit. And my new software is currently still just a pipe dream. I don't even know if I'll really want to write it...

Additionally, WordPress has a number of nice features and PyDS has become a bit baroque in its internal structure over time - for example, PyDS can't handle hierarchical categories and the categories of Blogmarks and Weblog overlap. Now everything is in one common pot and that's that.

Otherwise, I've had WordPress in use for quite a while already and was very satisfied with version 1.2 - although it was still very sparse in terms of features. Version 1.5 is now quite impressive. Ok, it's still a pure beta, but it's already good enough for normal use. I've noticed a few small bugs so far, nothing serious or critical.

Let's see what happens. I should have redirected everything possible from the old stuff to the new stuff. So RSS feeds should continue to work and links to old posts should also be properly redirected. If anyone notices anything that doesn't work but should, or has any other comments: you know where to find the comment function here.

Aside from that, PyDS has been in existence for almost 2 years now. So it's time something changed - PyDS itself will of course continue to be developed. And is of course still available, nothing changes there. I still have it in use on various other sites. It's just this weblog monster here that has simply outgrown the system.

German WordPress Community

For Wordpress there is a German community website with documentation, tips and tricks. Perhaps interesting for one or another - I still get pimples from PHP, but if it has to be PHP and this glorified index-file-handler called MySQL, then please something like Wordpress Here's the original article.

Photo Matt - Bizarre Windows Behavior

Matt Mullenweg is really having fun with Windows: an automatic security update just rebooted his computer and ate a few hours of his work. Somehow I understand why I prefer Apple's method much more, which just tells me that something is available instead of automatically pushing it to disk. Above all, it's absolutely stupid that an automatic update bypasses all application dialogs for saving open files. But all the Windows advocates will now surely provide a thousand reasons why this was all the user's fault. By the way, Matt is no novice or anything like that - he's the programmer of WordPress and normally you can assume he has a reasonable level of computer competence. If even he has his system eat his data just like that, then this feature probably isn't that easy or obvious to disable or document. Here's the original article.

::jamesoff:: » Check RBL for WordPress 0.1 - Check comment accesses against RBLs - possibly interesting to filter spam access from the start?

International Domains and WordPress

Because Textpattern and the browsers caused problems when I wanted to try TXP on an international domain (that thing with umlauts), I used Wordpress instead. I already have quite a bit of experience with it. However, not so far with UTF-8 character set and not with international domains either.

Result: the same error as in the TXP Admin - the Apache header is not being set. Pretty annoying, since browsers nowadays - correctly - prefer the Apache header over the meta tag. And when you want to change the URL in the options from the automatically filled technical address (this xn-- stuff) to the correct international address (the one with umlauts), there are problems. The server does a redirect that doesn't work. If you correct that, the whole thing still doesn't work - it simply doesn't get saved.

By the way, Wordpress works with Opera - the only browser that handles international domains correctly - only very limitedly. Both the layout causes issues and the problem described above is also there in Opera.

Somehow I have the feeling that you shouldn't run a CMS on international domains at all, but rather use these international domains only as a redirector for the actual main domain. Because not much else works reliably with these annoying things...

First Impressions of Textpattern

Apart from the fact that I first had to fix UTF-8 handling in Textpattern and international URLs don't work properly, I'm not really impressed by Textpattern. Sorry. But somehow it seems quite unfinished to me. Sure, it's a CMS and only incidentally designed for blogging - but where is a calendar? Where is time-based navigation? And the available plugins for that don't particularly excite me either.

You can upload images - that's the bare minimum. But file extensions are checked case-sensitively. And as a result, you can't upload images directly from the camera - on OS X they're usually copied with capital letters in the extension. Besides, images are also missing even the most rudimentary handling - creating thumbnails according to specifications, folder management, etc. The fact that there are translations is nice - but why are they only 90% complete? Help is available too - but not for every element. Sure, writing help texts is work. But if you have input fields like "closet" and "cupboard" in the advanced options of a post, you shouldn't be surprised by user questions. There's almost no documentation - at least none that I could find. I mean simple things like explaining what exactly Sections and Categories are supposed to achieve.

Up-to-the-minute hit logs and referrer logs are nice too - but why the heck are they just presented in raw form? I already have that in my web server logs. If I'm storing the hits anyway, I'd expect them to be intelligently filtered - for example, resolving article connections and generating summaries and overviews. Otherwise it's useless.

I couldn't find the bookmarklet that's supposed to be there for one-click adding of links. I find it more practical if something like that is available as a link for drag-and-drop. If I have to search for it somewhere first, it's just inconvenient. Especially since you can't search on the Textpattern homepage. And the documentation doesn't exist anyway, which of course makes searching in it difficult...

And with browser-based plugin installation, I'd expect at least that I can specify not just a file, but also a URL. Because why should I first download a plugin to my hard drive that I'm supposed to install on another website from the web?

The built-in search engine is nice enough for visitors, but it apparently doesn't search in the subject line. Why not? The subject line is predestined for searching.

All in all, Textpattern makes a very strange, unfinished impression on me. Many interesting approaches, but unlike, for example, WordPress, all of them somehow not fully thought through. Only sketched out. A shame, really - because visually Textpattern looks very impressive. WordPress, by comparison, seems downright prudish.

New image gallery plugin - needs testers - WordPress plugin for images in posts

WordPress 1.2

The final is out. However, trackbacking still doesn't work quite right - at least not when the target is a topic at TopicExchange. At WordPress WordBlog there's the original article.

WordPress Tinkering

Since I'm currently playing around with blog utilities and CMSes, my current WordPress installation has already gotten some content and layout improvements. Of all the alternatives for small sites, I still like it the best. For Drupal (my current favorite for larger sites), I might also find a use case.

Do I have too many domains and sites? Oh well.

Update: since I'm now running this blog with WordPress and the other one had become outdated, I simply shut it down. One less site to maintain...

WordPress Support › Static "like" pages - Discussion about creating pseudo-static pages (e.g. imprint) in WordPress

WordPress Wiki - Comment Moderation Plugin - Comment confirmation via email - could be interesting for TIMMY

WordPress Wiki - WP Plugins - WordPress Plugins for the new 1.2 Plugin Interface

wordpress - 17.5.2004 - 21.1.2005

Caching Strategies for PHP-Based Systems