Welcome to the Apache Nutch Wiki Please contribute your knowledge about Nutch here! Or browse the open issues, open a new Jira ticket, or check the Nutch source code on git. Table of Contents What is Apache Nutch?Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene, the project comprises two codebases, namely: Nutch 1.x (ACTIVE): A