I ran across Pierre Terdiman's article on Radix Sort for floating point numbers, and I became interested in seeing how far I could push the performance. I figured out what I think are a few unusual optimizations, and while I'm not really sure that any of them are new, the combination makes my code run pretty fast. Multiple Histogramming First, I use histogramming to make the radix work fast -- thi