Apache Nutch is a highly extensible and scalable open source web crawler software project. Nutch robot mascot Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering. The fetcher ("robot" or "web crawler"