It's a delight to just switch off and configure out broken NT machines, when they go bozo. Ok, usually we try to repair stuff, but since the machine that broke down twice today is the last NT based machine in our production environment, I wasn't that keen on getting it back to work. So I just switched the last three shops running on that box to dummy pages, unconfigured everything monitoring this POS and am done with it. On Monday the shops are transferred to a newer box on Linux.
But there is still a question: when a machine has automatic memory error discovery and automatic bank disabling, why can't this POS just do what it is expected to do and switch off the broken memory bank and go on? It worked the last time, why doesn't it work this time? Bah.