Search Results for: document/CDRG_ch07_7.1

Getting Started with Lucene Setup

…different applications ranging from large scale Internet search with hundreds of millions of documents, to eCommerce store fronts serving large volumes of users, to embedded devices. A Brief History of…

Tags:

Content Extraction with Tika

…content type detection and content extraction framework. Tika provides a general application programming interface that can be used to detect the content type of a document and also parse textual…

Tags:

Solr Cloud Document Routing

Overview Solr Cloud document routing was released in Solr 4.1. This feature expanded upon the simple hash based routing that was available in Solr 4.0 by introducing a new…

Tags: , , ,

Scaling Lucene and Solr

…of factors, a single machine can easily host a Lucene/Solr index of 5 – 80+ million documents, while a distributed solution can provide subsecond search response times across billions of…

Tags:

pipeline_preview_5a

When the mapping gets tough, the tough use JavaScript

Fusion pipelines are composed of stages. Fusion has more than two dozen pipeline stages which provide ready-made components for parsing, transforming, modifying, and otherwise enriching documents and queries, as…

Interview with Ian Holsman of Relegence (AOL)

documents and just tries to figure out what the document is about.So it takes the names of the people out. It tries to figure out the category of the topic…

Tags: ,

Exploring Lucene's Indexing Code: Part 2

…Using some basic IR knowledge, we know that addDocument is going to use our Analyzer to break up each field in the given document, and use the resulting terms to…

Options to tune document’s relevance in Solr

Working at Lucid Imagination a customer once asked me about how they could modify the score of the documents in Solr in order to get most relevant results higher…

Tags: , ,

Poor man's "entity" extraction with Solr

…bonus we’ll also extract URLs found in text too.  Let’s start with an example input and the corresponding output all of the described techniques provides. Example document textual content:The CHO…

Tags: , ,

Debugging Search Application Relevance Issues

…are helpful: Precision is the percentage of documents in the returned results that are relevant. Recall is the percentage of relevant results returned out of all relevant results in the…

Tags: