Apache Nutch, a subproject of Apache Lucene, is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.

Apache Nutch 1.0 contains almost 200 resolved issues and improvements such as Solr Integration, new indexing framework and new scoring framework just to mention a few.

Nutch 1.0 is available from here.