Search Results for: document/25129a5fa7a87229/file_handle_usage_of_lucene_indexreader

Getting Started with Lucene Setup

than plain text, the application will need to extract text from the original file in order to make it searchable. While this is outside the scope of the core Lucene

Tags:

Scaling Lucene and Solr

of RAM. You can omit norms with Lucene when adding a field to a document and with Solr by using the correct field definition in your Schema.xml file. Lazy Field…

Tags:

Exploring Lucene's Indexing Code: Part 2

index. Some of the information is simplified for illustration purposes. Per Index Files Segments File (segments_N, segments.gen) The Segments File references the active segments in the index. Lucene uses a…

Optimizing Findability in Lucene and Solr

…obstacles. Even if I’m using Lucene directly, I’ll often send the documents into Solr with a very simple schema that does basic analysis of the text and throws everything into…

Tags:

Content Extraction with Tika

above example I first create a FileInputStream containing the document to parse. Then I use a Tika content handler called BodyContentHandler that internally constructs a content handler decorator of type…

Tags:

[Update] Accessing words around a positional match in Lucene 4

Way back when, I posted a blurb on how to access words around a positional match in Lucene and a friend of mine asked me how to do similar…

The Apache Lucene Ecosystem: My view of 2009

of Lucene related talks at ApacheCon US plus two days of training and meetups. (In the past, organization was always handled by the ASF Conference Committee). In looking ahead for…

Tags: , , , , , ,

Interview with Ian Holsman of Relegence (AOL)

too easy. We combine that. Now each of these documents are tagged location-wise. We can then use Local Lucene to do boundary searches if we wanted to on top of…

Tags: ,

Accessing words around a positional match in Lucene

From time to time, users on the Lucene mailing list ask a variant of the following question: Given a term match in a document, what’s the best way to…

Tags: ,

Getting Started with Payloads

(see also [1]). Introduction Like Spans, payloads involve the position of terms, but go one step further.  Namely, a Payload in Apache Lucene is an arbitrary byte array stored at…

Tags: , ,