Hello folks, We are in the starting phase of a project, and we are currently wondering whether Heritrix or Nutch is the best choice of crawler for us. Our project: Basically, we're going to set up Hadoop and crawl the web for images. We will then run our own indexing software on the images stored in HDFS based on the Map/Reduce facility in Hadoop. We will not use other indexing than our own. Some
![Yahoo! Groups](https://cdn-ak-scissors.b.st-hatena.com/image/square/cb7b17a0ee8a44750f8e770379a9038040ff073d/height=288;version=1;width=512/https%3A%2F%2Fs1.yimg.com%2Fdh%2Fap%2Fdefault%2F130909%2Fy_200_a.png)