In dirtSimple.org: Optimization Surprises, Phillip J. Eby writes about optimizations he made to his implementation of generic functions in Python. I find it fascinating whenever he writes about this project, because generic functions are well-known to me from Common Lisp. However, what's equally fascinating is how he squeezes out half microseconds of performance.
In his case, it actually makes a lot of sense, since it's about central machinery that gets called constantly with generic functions. Minimal performance improvements make a huge difference in tight loops.
Also very interesting is what he discovers about Python's internal mechanisms and what effects, for example, simply the existence of closures in a function has on processes.
Exciting. Absolutely exciting.