Brisk – Apache Hadoop™ powered by Cassandra | DataStax. I should also keep an eye on this, as someone is marrying Hadoop with Cassandra as a backend. This makes higher-level layers from the Hadoop project also usable with the high performance of Cassandra and, above all, with the freer update of data in Cassandra and its eventually-consistent model.
programmierung - 28.1.2011 - 28.3.2011
HIVE: Data Warehousing & Analytics on Hadoop. A point that particularly interests me at the moment: evaluations, especially BI, of large amounts of data. Hadoop provides Hive as a solution for this. Access to HIVE can not only be made via Thrift, but also via JDBC and ODBC.
Apache Thrift. Also worth taking a closer look, in principle something similar to Google Protocol Buffers, but more prevalent in the Apache environment. Therefore, in areas like Hadoop, it is often chosen as the path of choice for accessing services from various languages. A small comparison of the two protocols. I like that Thrift not only allows binary representation but also a JSON-based representation - this makes the integration of Thrift APIs into web solutions easier, as JSON is native to JavaScript.
The Secrets of Building Realtime Big Data Systems. This is how I came across Cascalog and ElephantDB: a talk by the programmer of both projects about Big Data. He is also currently writing a book "Big Data". Could be very interesting.
nathanmarz/elephantdb. Same author as for Cascalog, here he built a distributed Key/Value-Store on Hadoop with Clojure. Also not uncool.
nathanmarz/cascalog - take a closer look, a marriage of Clojure and Hadoop for easier evaluation of large data sets. The interesting thing about Cascalog: it draws ideas from Datalog and forms a query language for Hadoop data sets in Clojure.
JavaScript Quotations - interesting link about a metaprogramming feature for JavaScript. In this case for a very interesting JavaScript implementation: written in F# and for the CLR world, runnable under .NET and Mono.
Enterprise Java Development Tools | SpringSource. I should take a closer look, as it was recently about J2EE and EJB alternatives, and this is one of the more well-known alternatives.
Trinity - Microsoft Research. I should take a closer look at this, it sounds somewhat like distributed Redis (in-memory structures that are persisted) combined with a query semantics that is more based on graph relationships (comparable to RDF Triple Stores).
Programming, Motherfucker. Do you. speak it?
Why Cloud9 Deserves your Attention - browser-based IDE in Javascript on server and client. And source of the current version available on github.
Django-nonrel - NoSQL support for Django. Provides a first approach to integrating various NoSQL databases into Django at the level of the Django ORM. Backends for MongoDB (no thanks), AppEngine and Cassandra are in the works. Cassandra is particularly interesting to me at the moment.
Vundle 0.7 is out. I usually use Pathogen, but Vundle has some features that make it quite interesting - maybe I should play around with it. On the other hand, I haven't made any updates and changes to my Vim installation for a long time. But since all vim.org scripts are now on GitHub, Vundle's GitHub integration is certainly very interesting.
Programming Languages - Progopedia - Encyclopedia of Programming Languages. That was the programming language wiki I was looking for recently when the deletion frenzy struck Wikipedia again. I think I already mentioned it in the old blog.
Instagram now has official APIs. It completely passed me by. Maybe I can eventually get around Tumblr to get my Instagram pictures into the sidebar. On the other hand, Tumblr has been doing quite well lately, and why change something that works (the curse of any further development - good enough).
pdict.py at master from segfaulthunter/sandbox - GitHub. A PersistentHashMap for Python - so a functional data structure that does not allow changes, but provides a new structure with minimal change compared to an existing structure with substructure sharing to the original structure. A rather interesting implementation. There are also further explanations of the ideas behind it. And an alternative implementation of the same idea.
Threads are great, but not every problem is a nail
If you want to have a good laugh: Node JS and Server side Java Script. Here, someone from the Java camp complains that Node.JS really isn't to be taken seriously and then produces the best example why something like Node.JS (and many other alternatives for server programming) exists - because the Java code gets longer and longer with each step. And even after several iterations for an example that is quite simple to implement in Node.JS (or e.g. with gevent in Python), a few errors and gaps in the Java code are already mentioned in the first comments.
Don't get me wrong - Java has a lot of good solutions for programming with multiple threads in the standard library. Probably the largest selection of possibilities for programming with multiple threads of all currently available languages. But as so often in life: threads are not the answer to all questions of parallelization. Especially when it comes to high request load, the assessment in the comments that 20K threads are already very high is ridiculous - tell that to the programmers of Eve Online, where every ship in their virtual universe is modeled as a microthread.
Java is an interesting platform, precisely because it comes with many low-level libraries with which you can do very interesting things - and which are helpful to build reasonable high-level constructs on top of them. For example, in combination with languages like Clojure or Scala, the thread monster loses some of its terror. But sometimes the answer is not the thread, but asynchronous IO (both for disk access and network access) and the intensive use of coroutines or continuations.
Also, the incomprehension of Java programmers about the approach of solving the multi-core problem simply with several parallel processes and message-passing between them is quite strange in 2011 - after all, 2009 and 2010 were the revival years for Erlang (don't forget, the language has existed for much longer) and the central idea of Erlang is precisely to set network- and CPU-spanning message-passing as the standard in order to achieve very simple parallelizability and scalability.
Java programmers always remind me of the COBOL programmers of my early days, who in every language and every programming approach deliberately picked out the things that were solved differently in COBOL (and sometimes even perhaps a bit simpler) - but then fell flat on their faces when they had to solve real problems outside the COBOL comfort zone with them.
The best thing about Java is the JVM and thus a platform that makes the multi-paradigm and multi-language approaches possible with which you can then use the tools for problems that are appropriate for them. And even then, sometimes the answer is still Node.JS or another small, lean, asynchronous server. Because even with a large collection of various hammers, you will still get a screwdriver for the screw.
Re: Factor: Google Charts - I really should use Factor more often. Every time I see how practical a visual REPL is (in Factor, graphical representations of objects can be embedded in the normal output, similar to old Lisp machines), it tempts me.
Python Tools for Visual Studio. If you are on Windows and a number cruncher - SciPy and NumPy are now directly available on the .NET platform with these tools. And I wonder why Apple doesn't include something like this with Xcode, as it would certainly be popular in the university environment (just think of Sage).
ABCL - Release notes v0.25. New version out and ABCL is increasingly developing into a really usable Common Lisp implementation. Since it runs on the JVM, you also have easy access to many libraries (if you want to) and since 0.24, Quicklisp also runs smoothly with ABCL, giving you easy access to many Common Lisp libraries. However, there are some issues with the CL libraries, as many programmers do not consider ABCL (and there are still deficiencies in the CLOS area).
fantasm - Project Hosting on Google Code. Definitely worth checking out, a workflow engine in Python. Something like this could be quite interesting for projects at work.
harukizaemon/hamster. Immutable Threadsafe Datastructures - for Ruby. You can't change them, but you get new, modified versions back. Ideal for using them across thread boundaries. Clojure has this built-in, Scala since 2.8 as well. I would like something like this for Python ...
Pyjamas - Python Javascript Compiler, Desktop Widget Set and RIA Web Framework. I already mentioned this in the old blog, but a) a lot has happened and b) it came up again today as a topic, so I'm blogging about it again.
Check it out: pqc - PostgreSQL Query Cache. A PostgreSQL proxy that caches queries via a Memcache database to improve performance for recurring queries. Since it works as a proxy, it can also speed up applications that don't already implement caching on their own.
jsFiddle is a very nicely made online editor for JavaScript, HTML, and CSS. Various JavaScript frameworks are supported, and there is the possibility to save snippets and discuss them with others. Progressing.js is also available, as well as a number of tools to unleash on the code. Quite cool for experiments.
balupton/history.js provides an API for accessing HTML5 History manipulation, but it also supports older browsers and uses that ugly # notation - but only when HTML5 is not available. Could be quite interesting for a project of mine.
WordPress JSON API. I don't know if I really need this, but it might come in handy someday - the XMLRPC or Atom APIs are quite cumbersome if you just want to quickly access data from the blog via JavaScript.
Feeding the Bit Bucket» Blog Archive » Common Lisp, Clojure and Evolution. No, Clojure is not described as an evolution of Common Lisp - it's simply the example program "Evolution" from the book "Land of Lisp" translated into Clojure by someone who is learning Clojure by implementing all the examples in Clojure using the Common Lisp code as a basis. And therefore a good comparison opportunity between Clojure and Common Lisp. Maybe interesting for 2 or 3 readers of my blog. Otherwise, for me as a bookmark to look back at later.
Ada 95: The Craft of Object-Oriented Programming. Free online book (formerly Prentice Hall from 97) about Ada 95. Quite nice to see the beautifully byzantine-looking source code of Ada again.
Because I wrote about Prograph: Andescotia Software seems to have a new commercial Prograph version available. The whole thing works under OSX 10.4 and there is a demo version to try out. And it's not expensive at all with 68 dollars. I think I know what I'll be playing around with tonight! And as a free download there is the book "Visual Programming With Prograph CPX". Update: the playing has been canceled, the demo does not start under Snow Leopard ... (and the traffic on their mailing list does not look like there is a big reaction to be expected). Too bad. I wrote an email, maybe something will happen yet, but it sounds very much like a dead project again. Once again.
hotzen/ScalaFlow provides a very interesting extension to Scala: dataflow programming with automatic resolution via continuations - you define variables, can access variable values before values are assigned to them and the system itself sorts all accesses and assignments into the correct order. Particularly interesting as a basis for parallelization, when partial areas only emerge later but corresponding processing should already be defined earlier. Dataflow languages have been of interest to me since Prograph. The integration into a normal language as a basis could be quite interesting.
remogatto/gospeccy - a ZX Spectrum Emulator written in Go. Since I was an old Spectrum owner and this was my first box that I bought with my own earned money (back then an article in the c't! My first and only foray into writing!).
Get inPulse and Hack Your Watch. Looks pretty cool, just a small computer and a display, programmable, connected via Bluetooth. Could do some nice tricks with it. Maybe even display the time.
PyPy Status Blog: PyPy Winter Sprint Report. Most important point: fast-forward is in Trunk, so the next version of PyPy will definitely have 2.7 compatibility.
SourceTree | Mercurial and Git GUI for Mac OS X. Hmm - it's not exactly cheap at 45 Euros in the AppStore. But sometimes I would already have a GUI for working with Mercurial, especially when I work with foreign repositories and possibly have local changes. Maybe I'll play around with the trial sometime.
ongoing by Tim Bray · Broken Links. Why these overused #! fragments in URLs are a big mess and why you shouldn't use them. And yes, it's annoying to rape the web - especially since there's absolutely no reason to do so, dynamic servers can easily map various URL structures. And yes, I know about the problem that you can only switch the URL in the browser in the fragment part via JavaScript, without forcing a reload - but that's no reason to convert all URLs to such a stupid fragment format.
scgi-wsgi 1.1 released - Allan Saddi's projects blog - so far we have been using its FLUP-based server that comes with Django, but the option to switch to mod_prox_scgi would be interesting because we could effectively save one server in between and no longer have to work with ajp. Although ajp is not that bad either - so maybe just do a few tests. For simple web services, however, I will continue to use the wsgi server based on gevent that I have been using in deezeit, because it is simply incredibly fast and uses almost no resources.
RUR-PLE is something like Logo, only with Python instead of Logo as the language. So actually just the graphical environment of typical Logo implementations. In any case, a nice toy.
How to write vim plugins with python. Because I like Python, because I like Vim and because you always want to build smaller things that make life easier. And because Vim's own scripting language is rather awful.
WorkingWithSubversion - Mercurial. Since I keep encountering outdated SVN repositories and clearly prefer Mercurial, I should take a closer look at hgsubversion.
Because I'm not looking for something like this for the first time and it looks quite practical: Sorting elements with jQuery – James Padolsey.
Java Hangs When Converting 2.2250738585072012e-308. PHP too. The solution to the puzzle in both cases: the number is the smallest representable number in double floats and approximations are determined for the conversion in Java and PHP, but unfavorable values are assumed as starting points - and thus infinite loops result because the target value is never reached. And yes, this is critical because you can send servers into a loop if you enter these number values in input fields that convert to double float. I also tried it with Python (CPython and PyPy), but they don't run into a loop, they simply deliver a slightly different value.
mobl is more my thing, a programming language that compiles to HTML5+JavaScript and comes with IDE support in Eclipse. Since HTML5 also includes client-side databases, and the entire application can be cached on the device via manifest files, you can also build offline-capable applications. And incidentally, it's also useful for Android.
Three20 - check it out if I want to give iPhone programming another try. It has some interesting concepts, especially regarding persistence and internal structure (uses internal URLs and URL routing to bring models and views together).
Introduction to Pharen. A Lisp that compiles to PHP. Weird. Okay, could be practical if the host only offers PHP as a server language. But still. Weird.
cfbolz / Pyrolog. Interesting project because it implements Prolog in Python, but uses the PyPy toolchain for JIT - this gives a nice insight into what is possible with PyPy besides Python.
Sho - Microsoft Research. A bit like SciPy and Sage (the part of Sage that deals with data analysis and visualization), but based on IronPython and .NET.
eMIPS - Microsoft Research. Yes, Microsoft does other things besides windows. And some of it is quite interesting - such as extensible MIPS, essentially a processor architecture with loadable microcode. We had something like this before with the Xerox machines (the Alto of course and later also the D systems).