> wc --lines unigram_raw.txt 290768333 unigram_raw.txtそもそも,たかだか3億要素,1.7Gのデータのソートに,最近のマシンで sort | uniq -c が858分もかかるのは変ですよね. > export LC_ALL=C > time sort -S 2G unigram_raw.txt | uniq -c > tmp.sort.uniq sort -S 2G unigram_raw.txt 389.93s user 16.32s system 99% cpu 6:49.61 total uniq -c > tmp.sort.uniq 15.40s user 1.56s system 4% cpu 6:49.62 totalIntel Xeon E5462 (3.2Ghz) が Dual Core AMD Opteron 1210