sysadmin - 15.12.2010 - 18.5.2011

MichaelMacInnis/oh. A shell in Go. The shell has some interesting features, for example prototype-based object orientation, first-class functions, and explicit channels (which other shells have implicitly behind pipes). Looks quite interesting at first glance.

Javascript PC Emulator - Technical Notes. Yes, a PC emulator. Based on QEmu, so quite mature code. Boots Linux in the browser. Because it can.

ZenphotoPress is a WordPress plugin that allows you to access images and galleries in ZenPhoto from WordPress. Since you can upload entire folders to ZenPhoto via FTP or other methods (e.g., by simply linking the Albums directory to Dropbox), and thus easily get images into galleries, you can also quickly and easily access these images in WordPress. Might be something as a tinkering project, as I'm still looking for simple ways to feed my photo blog from Lightroom.

Dropbox Lied to Users About Data Security, Complaint to FTC Alleges. Just a reminder: anyone using something like Dropbox (or any of the other services with similar functionality) should encrypt client-side (on Macs, sparse bundles are suitable), if it's critical or personal content. Because even if a service promises to encrypt everything and no one can read the data, this service can simply lie. Or have a wrong implementation. The deduplication, the folder sharing and the fact that for some versions a public URL is generated for each file - and thus in both cases people get access to files to whom you have not revealed your password - should make it clear that Dropbox must be able to decrypt server-side. Which of course does not make the wrong presentation on their advertising pages any better - yes, it was just omitting information, but with security statements you'd better say a bit more to make it clear what you actually guarantee. If you leave out essential information, you should not be surprised if you (rightfully!) are called a liar. And especially in the USA, something like this could put a company in quite a predicament.

Metaowl is life!. Wow, just realized that on June 16th, it's the 6th birthday of the Meta Owl! By now, almost 8700 posts have been collected. And in the meantime, the automatic caching of the posts has paid off, because one or the other blog (for example, my old muensterland.org address has been gone for a while) has disappeared, but the content (at least the texts) is still accessible. The whole thing has even survived several server moves unscathed.

Microsoft Near Deal to Buy Skype for Nearly $8 Billion - WSJ.com. Ugh. Skype is already quite a mess (unfortunately a necessary mess for me), but if Microsoft now "improves" it, it's going to be quite funny ...

obensonne / hg-autosync. An extension for Mercurial that implements automatic syncs between working directories via a central repository. Can be executed manually as a command or in daemon mode (then it simply runs cyclically at intervals). This way you can do something like a controlled Dropbox - only the included files are synchronized. I would prefer a combination of inotify and xmpp Notify instead of the interval solution - this way the daemon would not constantly start up. But something like this could perhaps even be built from it. Update there is already such a thing.

philikon / weaveclient-chromium. Hmm, a Mozilla Sync client as an extension for Chrome. Unfortunately, there is nowhere properly indicated how to install it and some comments on the net suggest that it probably does not run stably with newer versions. But maybe still take a look if I find some spare time. With this I could then, for example, connect Chrome on Mac or Linux with Firefox Mobile on Android. Since on Android the normal browser can't even sync with Google's own desktop browser (which is really embarrassing), this might be something.

PayPal Money Module « Snoopy Pfeffer’s Blog. The original article in which Snoopy describes a bit more about the extensions to Adam Frisby's DTL PayPal Money Module. Not up to date regarding installation, but the features still fit.

SnoopyPfeffer/Mod-PayPal - GitHub. I should take a closer look at this, as it allows you to use PayPal as a money module in OpenSim. It might be interesting if I ever want to revive and share my OpenSim projects again. It is based on OpenSim 0.7.1, so I can only try it out when the new Diva D2 is released (which is already in the works).

inotify - get your file system supervised. Bookmarked for later - a daemon that automatically triggers scripts on file events. This could be used to implement automatic image imports via upload from Dropbox, for example.

PDP-11 emulator. In JavaScript. Runs Unix System 6. Yes, just like that, with disk access and all the well-known programs from back then. Because there aren't enough strange things already.

iPhone Location Data Again

Once again regarding the Apple response to the motion profile allegations and why Apple is right, but there is still a problem (but one that is significantly smaller than the dramatized problem in the press).

Apple produces a database with - anonymously collected, there are no indications so far that it is not anonymous - position data of iPhones with activated GPS, in which positions of networks are stored. Networks in this context are radio masts for GSM, 3G and WLANs that the iPhone sees at that time. However, this is not what is stored in the database that everyone is talking about. This is only the basis on which something is built that then ends up in the database.

The data sent to Apple is averaged internally and a "center" is determined for the networks reported by various iPhones (since the exact position of WLAN routers or radio masts is not simply provided - this must first be determined in some way). This data is stored in a large database at Apple. The position data therefore refers to the center of radio identifications. The original position data is only basic material for the determined position data.

The iPhone can now determine an approximate position via the visible radio identifications and their position information and a weighted average of the data based on transmission strength - but internet access is required for this. And internet access to the database at Apple. Therefore, the iPhone downloads the information about radio identifications and caches this locally. But of course not the entire database - that would be too much. Rather, a relevant excerpt determined by algorithms. This is now the database on the iPhone.

Apparently, Apple not only downloads the networks that the iPhone currently sees, but also neighboring networks - which makes sense, as the user moves around more often and the data from neighboring networks will be needed (potentially - the iPhone does not know in advance where I am going). Presumably, the iPhone will say "I see networks A, B, C" and the database will then provide "here are the networks A-M from the metropolitan area where you are located". The iPhone then takes X% of A, Y% of B and Z% of C as a basis and calculates a rough position and says "here I am". If it then moves into the visibility of network D, its position is already known and the iPhone can perform the position calculation directly without downloading.

In addition, the iPhone seems to store a temporal history of these downloads - presumably the developer assumed that if the user has been there before, there is a high chance that he will go there again. For this purpose, the iPhone keeps these data ready for one year. The claim by Apple that the duration of storage is a bug is certainly rather an embellishment - presumably a developer simply made up a duration and used it without considering how much would really be sensible - after all, these were not special data in his understanding. Only technical caches for downloads that he anyway makes when the user asks for his position.

What does this mean for the user? The data does not reproduce where he was in the coordinates - it only reproduces where the radio identifications are, in whose vicinity he was approximately. And since it also contains neighboring networks, this is really very approximate. Of course, a rough spatial profile of the user can be derived from this - for example, in my data I can indeed see that I have been in Amsterdam, in Frankfurt and in Berlin.

But for example, it also means in reverse that only the approximate regions are included if you also had network reception there, with download options. I was in Copenhagen - there I also had network access via the hotel, so traces of this are present. In Malmö and at the turn of the year in Russia I did not have network access - so GSM, but no internet access - and therefore the iPhone could not access these location data and could not download radio identifications with positions. Therefore, these data are also completely missing from my iPhone and there are no traces of Malmö, Ekaterinburg or Nischni Tagil (the same should apply if you have activated airplane mode or simply turn off WLAN and mobile data).

Furthermore, the spaces should become larger when you come to more rural regions - few WLANs, so mainly GSM cells and these with a larger range and more scattered. If you store a cell with the neighbors, this is already a fairly large area that is covered. In large cities, on the other hand, the covered area should be significantly smaller, simply because WLANs have significantly smaller ranges and there are more of them there. And radio cells there are also usually smaller (just because a cell can only cover a finite number of users, but the user density in cities is greater).

This is particularly interesting for programmers: do you think about what can be derived from cached data when you program? Take as a basis for consideration that someone has access to your DNS cache - which every system has internally, simply to reduce DNS queries. What picture of you as an image could this technically harmless information produce? These are the small pitfalls that programmers like to stumble over. It is actually harmless - auxiliary data that you get from the network is the beginning. Throwing away after use - well, if they are needed again, then it makes sense to have the most frequent ones ready, or? And it is exactly then that you run into problems like Apple currently has.

The discussion about why your browser cache contains porn pictures (because you read your mails with Outlook, for example, and opened a spam mail and had image display activated - not an outlandish situation!), if your wife finds them there, could already become quite interesting. The data no longer shows why they ended up where they ended up.

As stated in the title: I am referring here to the answer from Apple and have only checked this with my own data. My own data matches the information from Apple's statement and this statement itself is also consistent - both the contents and the specification of the use match quite well. I therefore see no reason why I should distrust the statement.

Apple's answer that the iPhone does not record the user's motion profile is therefore correct - it simply stores information for a position determination as an alternative to GPS. At the same time, however, it is at least a profile of the stay in large areas. Criticism is therefore quite appropriate. But in my opinion, it should be more intelligent than "Apple stores the user's positions in the last year", because this is simply wrong.

But as Apple says in the introduction to the answer: these are technical relationships that are more complicated than simply "does Apple store a motion profile Yes/No". And our press has massive problems with questions to which an answer contains more than two sentences. "Apple stores data from which the presence in large areas can be derived" does not sound so great and catchy as a headline.

Unfortunately, this very imprecise reporting can lead to problems arising - if I know that the data only covers regions where I have been, but not precise points of my stay, the explanation why my data from Frankfurt also includes the red light district (it's just near the train station) is much easier than if I have to assume that these are all places where I have been.

Apple must (and will, according to its own explanation) improve this - caching data for a year is nonsense. Backing up the data is also nonsense, they can simply be downloaded again if they are missing. Similarly, the data does not need to be stored if all location services are globally deactivated. It might also be generally interesting to have a switch "Pseudo-GPS Yes/No" or something like that, with which this type of position determination can be deactivated - then the user simply has to wait until the GPS satellites are logged in. Just as, in my opinion, the anonymous data collection for WLAN and radio masts should be switchable.

In my opinion, no cache should exist without a control function for this cache (just as you can also empty the browser cache). Because one thing must be clear: due to the general necessity of linking access time and loaded data (because only in this way can a cache with temporary storage function), every type of cache provides a kind of user profile. And this should be at least rudimentarily controllable by the user (in the sense of deleting). Setting up caches fundamentally with a clear function and a UI for this should become just as much a best practice as the encrypted storage of passwords on servers (hello Sony!).

Serious PSN hack: Personal customer data copied. Now it's out why PSN was offline for so long (not that it affected me particularly - I don't have a PlayStation - but the silence around the downtime was quite strange).

AWS Developer Forums: Life of our patients is at stake - I am ... - I hope this is a fake, but I fear it's actually true, that a company has been running life-critical monitoring systems for heart patients on EC2 without using multiple Availability Zones or having a failover plan ....

Alex Levinson has some interesting comments on the "new" discovery of the collected geodata on the iPhone. Apart from the fact that it is not Apple that collects the data, but only the user's own device and computer, it is quite interesting that this "new" discovery was so well known that Alex has spoken about it at conferences and it was already described in his book on iPhone Forensics at the end of December 2010. A printed book. One of those made of paper. Something that researchers should actually read when they investigate things. So they don't make themselves look ridiculous when they write hyped articles about topics that have been known for a long time, without referring to previous research on the subject ...

Gondor — effortless production Django hosting. Hmm, that sounds quite interesting - a tool for easier deployment including database migrations (via South). As far as I understand, it is tied to their infrastructure - so rather an alternative to Google AppEngine, directly based on Python.

Broadway update 3 « Alexander Larsson. No idea how I will use this or for what, but I want to! Run GTK+ applications as a client-server app with the interface in the browser - and Gimp already does it. Crazy.

Toshiba releases self-erasing drives. What could possibly go wrong.

Code rant: Message Queue Shootout!. Not a real shootout and only an incomplete selection of message queues. But still something interesting as a result: if you have nodes that already have their own persistence and transaction solutions, between which you just want to send messages as quickly as possible - there is nothing better than ZeroMQ. It is - due to its architecture - simply the fastest solution. And we are talking about really drastic differences.

NOSQL Databases. Excellent overview of all available NoSQL databases. Good starting point if you want to inform yourself about the available systems and their orientation and implementation.

BBC News - Net giants challenge French data law. Great idea from France, mandatory storage of plaintext passwords, so they can be handed over to any random authority. The Federal Interior Friedrich will probably like that, as he is so keen on all data ... (Discussions on reddit claim that bbc misrepresented the situation, it's only about storing account data even after closing the account for one year - if a service does not do this, it will be held liable for activities of non-identifiable users afterwards - so not quite what you could read in the BBC article)

HBase vs Cassandra: why we moved « Dominic Williams. Not entirely uninteresting blog post that dares to compare Hadoop/HBase with Cassandra and tries to highlight the different focuses. His conclusion: HBase is more for warehousing, Cassandra more for transaction processing. Alone, this would make something like Brix even more interesting if it could really combine these two aspects.

The Secrets of Building Realtime Big Data Systems. This is how I came across Cascalog and ElephantDB: a talk by the programmer of both projects about Big Data. He is also currently writing a book "Big Data". Could be very interesting.

Microsoft Shuts off HTTPS in Hotmail for Over a Dozen Countries | Electronic Frontier Foundation - a villain who thinks evil about it. Surely pure coincidence that the list of affected countries reads like the "elite" of democratic states. The hypocrisy of large corporations is actually only surpassed by FDP economic ministers.

WordPress › Really Static « WordPress Plugins. Well blogged, because it allows you to generate static pages directly from WordPress (this could also be done with WP Super Cache and its directly cached pages, but these are not automatically updated) and perhaps this could be an interesting way in the long run. Okay, I would probably have to forego some elements to make the whole thing work without "artefacts" - but many of them are actually dispensable. For example, a tag cloud would be frozen at the state of the last rendering if it is part of the page. Similarly, information such as "latest comments" or "latest posts". The same goes for calendars, which have more marked days on newer pages than on older ones. This is also the main reason why I have repeatedly abandoned baked sites - on the other hand, are these problem cases really important for a blog?

Vundle 0.7 is out. I usually use Pathogen, but Vundle has some features that make it quite interesting - maybe I should play around with it. On the other hand, I haven't made any updates and changes to my Vim installation for a long time. But since all vim.org scripts are now on GitHub, Vundle's GitHub integration is certainly very interesting.

Privacy advocates: Piwik instead of Google Analytics - that's a good start, concrete suggestions for what site operators should do if they want statistics. We should probably take a closer look at work to recommend it to customers who ask for statistics.

By adding extra code to a digital music file, they were able to turn a song burned to CD into a Trojan horse. When played on the car's stereo, this song could alter the firmware of the car's stereo system, giving attackers an entry point to change other components on the car.

via With hacking, music can take control of your car | ITworld.

Check it out: pqc - PostgreSQL Query Cache. A PostgreSQL proxy that caches queries via a Memcache database to improve performance for recurring queries. Since it works as a proxy, it can also speed up applications that don't already implement caching on their own.

Apple just can't do encryption

I fell for it again and thought, I'll just enable the encryption of iPad backups. Pretty stupid. I should have been warned by the debacles with the encrypted home directory. But of course, I did it again. Everything worked fine until today when the backup mess happened - it got stuck in the first step and just wouldn't proceed. Possibly corrupt backup files on the Mac. Ok, the standard procedure is to simply delete the backup in the settings under devices and create a new one. But that doesn't work if you have encryption enabled - it complains, naturally only after all the steps have been completed, that it can't make backups because no session with the iPad can be started. Huh?

And of course, I can't reset the password - it always claims it's wrong (even before I deleted the backup). My suspicion: the password is checked against the backup and if there isn't one, or it's defective, you can't perform a successful check. Resetting the password doesn't work, creating new backups doesn't work, and making iTunes forget the iPad also doesn't work. Before someone thinks they need to tell me I don't know the password: iTunes saves the password in the keychain if requested and yes, the password is the one I enter. And yes, that is definitely the correct one - the device identifier is saved as the account name with the password. And no, this exact password is of course not accepted...

Solution according to Apple? Completely reset the iPad and set it up again. Great, fantastic idea. Sure, many of the data I have are on my Mac, but over time, data have also been added that are not on the Mac. And I would like to transfer those somehow.

By the way, normal backups and restores work - and with unencrypted backups, you can also create a new one if the backups are corrupted. But not if you have encryption enabled.

Frankly, this renewed experience with Apple's inability to build reasonably stable encryption solutions makes me rather skeptical about their full-disk encryption in the upcoming 10.7...

Update: after a few experiments (tested on another computer, iPad backup reconstructed from the TimeMachine backup and tried with it) I suspect the password is also noted on the device - and this note seems to be corruptible. Because even on another device, the definitely correct password is rejected as wrong, and another device also insists on making an encrypted backup (which makes sense, otherwise you could trivially get the data via a backup on another device). The problem is not that it protects itself against manipulation - the problem is that this crap can break and without any external signs - the backups have always worked fine so far, they are just suddenly worthless now (just like the data on the device).

Naked Password - jQuery Plugin to Encourage Stronger Passwords. Yes, that's what it says. The internet is very, very strange.

IP Addresses and Privacy

IP Address: Data Protectionists Target AdSense, Amazon Links, and IVW. I don't know, but I think slowly this is starting to overshoot the mark. Yes, data collection should be avoided when avoidable. And certainly, one should always keep in mind what can be done with the data for a central figure like Google. But if this leads to, for example, the Google API Loader for jQuery no longer being usable because their accesses also go to Google servers, or if, as here, complaints are made about Amazon Affiliate links - which only access Amazon when clicked, not generally - then things are getting a bit hairy.

Then we are only a short step away from generally prohibiting links to pages from larger providers. Or absurdities, such as the idea expressed here of the illegality of using Google Mail in Germany. Yes, IP addresses are conditionally personally identifiable. And with IPv6, this will certainly become even more apparent (since there the reuse of IP addresses is not as mandatory as with IPv4). But the IP address is at the same time the central pivot of the Internet, and if one focuses too much on it, one eventually reaches the point where the highest data protection officer prohibits access to the international Internet because one thereby reveals one's IP address to computers outside Germany...

Data protection is to a large extent also the education of users and the self-responsibility of users - the latter can of course only be achieved with an appropriate level of knowledge. I would feel much better if the data protection authorities also produced useful output in the form of citizen information. But there it's somehow bleak.

So, discuss with the large providers and, if necessary, take them to court to force them to comply with data protection guidelines: yes. Public discussion about the problems and dangers: yes. Wildly attacking random forum operators: no.

Why is the data protectionist going after something as irrelevant as mentioned in the article, and not even against one of the big players in the forum industry, such as Heise, Spiegel, Focus, or Golem? Too much respect for the reaction to be expected there?

Something smells fishy about the whole thing. Possibly we don't have all the information - but I can't think of what information might be missing that would make the whole thing an appropriate reaction.

lsyncd is somewhat like Dropbox but very simple. Essentially, it's a daemon that listens for directory changes via inotify and automatically triggers an rsync if needed to synchronize directory trees. Since you can sync any directories with it and can also intervene in the sync process via Lua integration in lsyncd, it could be useful for some loosely coupled sync situations (e.g., autonomous nodes in a very loosely coupled cluster or home servers that automatically sync to a server on the internet). Additionally, it offers functions similar to Hazel - you can assign various actions (not just sync) to different file change events.

And for those who want an overview of various S3 file systems: Comparison of S3QL and other S3 file systems. The list also includes commercial packages and a simple feature comparison is made.

s3fs - I should also check this out, as it allows me to mount Amazon S3 on the server and access it from the outside via WebDAV, indirectly also to S3. For my pictures, it might be sensible in the long run to have two offsite backups with different techniques. For this purpose, I should also take a look at the s3fslight fork, as it is said to work better with rsync and that would be interesting for automatic backups. Both store files directly in S3, so they can be trivially downloaded. On the other hand, both have problems with the eventual consistency feature of S3, you have to run the synchronization multiple times, especially when you make many changes. Therefore, I should also take a look at s3ql, which forms a complete file system that only uses S3 as storage. This makes it more difficult to access file contents outside of s3ql - but there are also things like deduplication and encryption (in cloud storage, it seems more sympathetic to store things encrypted rather than unencrypted, even if you can trust some providers more than others).

nginx HttpDavModule. I want to check it out to get easier access to my server from OSX - might be quite interesting for some things (e.g. backups of my pictures to my server to have an offsite storage location).

Advanced sign-in security for your Google account - Official Gmail Blog. Generally a good idea, as it makes the login - when used correctly - really more secure. But whether one overcomes one's inner laziness and actually uses it ... (I'm not even sure if I want to do this for my email)

Streitfall: Telekom will einheitlichen De-Mail-Domainnamen per Gesetz - state-subsidized scam free from technical expertise. The entire de-mail debacle can hardly be surpassed in absurdity.

Gravatars: why publishing your email's hash is not a good idea. And it also explains why avatars have disappeared again from my blog for commentators - not that I suffer from paranoia, but why open up the possibility of determining an email address for a gimmick?

using negotiate authentication (GSSAPI Kerberos) with Firefox. We have looked at this from time to time and wondered how to link Firefox to Kerberos logins.

Google: Bing Is Cheating, Copying Our Search Results. Interesting article, if this is true, Microsoft has made a pretty big (and embarrassing) mistake.

Java Hangs When Converting 2.2250738585072012e-308. PHP too. The solution to the puzzle in both cases: the number is the smallest representable number in double floats and approximations are determined for the conversion in Java and PHP, but unfavorable values are assumed as starting points - and thus infinite loops result because the target value is never reached. And yes, this is critical because you can send servers into a loop if you enter these number values in input fields that convert to double float. I also tried it with Python (CPython and PyPy), but they don't run into a loop, they simply deliver a slightly different value.

CoRD: Remote Desktop for Mac OS X. Open Source and more Mac-like than the Microsoft version.

IPython as a system shell explains the function of the sh profile for iPython. I need to play around with it, because Python as a system shell can be very helpful.

Bug 1044 – CVE-2010-4345 exim privilege escalation. The second part of the Exim march. This is the privilege escalation via Exim and an alternative config file. Because Exim is a monolithic server running under suid rights (i.e., starts with root rights even if executed as another user), there is a small time window in which the service always runs as root - and this is exploited through the alternative config file. The patch restricts the locations where these config files may reside and, combined with the configuration of write permissions on this location, can prevent non-root users from injecting their own configs.

Bug 787 – memory corruption in string_format code. Important if you're running a Debian version older than Lenny, as there are no more security updates available and you have to patch it yourself. This one closes the door. By the way, it's quite interesting to look at the date - it has been fixed since 2008, but due to the early discontinuation of security updates for outdated Debian releases, it is still present in many Debian systems based on Etch (and older). Debian is only recommended for use if you can actually keep up with every release change in a timely manner. Otherwise, solutions like Ubuntu LTS are by far the better choice. Apart from that, it's quite embarrassing that Lenny still had such an outdated Exim ...

rhodecode is something like bitbucket or github. Like bitbucket, it uses mercurial and offers various tools in the interface. The special thing? The code is free and thus something like Bitbucket for self-hosting. Maybe an alternative to Trac.

HP Storage Hardware Harbors Secret Back Door | threatpost - hopefully puts an end to the regular "we need to switch to HP because it's much better than NetApp" discussions. And yes, that was sarcasm.

Nicholas Piël » Benchmark of Python Web Servers. Very interesting benchmarking, I definitely have to take a look at gevent, the performance in the tests is already impressive. Update: after I looked at gevent - I am impressed. For web services you have to be careful: gevent.wsgi only supports GET and POST, only gevent.pywsgi also supports PUT and DELETE.