Search Results for: document/3d8310376b6cdf6b/centroid_calculations_with_sparse_vectors

Getting Started with Lucene Setup

…different applications ranging from large scale Internet search with hundreds of millions of documents, to eCommerce store fronts serving large volumes of users, to embedded devices. A Brief History of…


Content Extraction with Tika

…bytes of the document. Data structures of Mime Type Detection are configurable so you can easily add new capabilities to it together with new parser adapters. The most important capability…


Solr Cloud Document Routing

router is used by default when a collection is created with the “numShards” parameter. If the numShards parameter is not supplied at collection creation time then the “implicit” document router…

Tags: , , ,

Scaling Lucene and Solr

documents. Over that range, query throughput can be adjusted with index replication at each individual server. The standard procedure for scaling Lucene/Solr is as follows: first, maximize performance on a…


Interview with Ian Holsman of Relegence (AOL)

and the location. So what you end up with is a document with a lot of metadata. So, for example, it might say, “This document is talking about New York….

Tags: ,

Exploring Lucene's Indexing Code: Part 2

…of documents which contain each term, along with the frequency of the term in that document (unless you use omitTf on that field). Skip list data is also stored in…


When the mapping gets tough, the tough use JavaScript

JavaScript. The preview tool expects to take in a list of PipelineDocument objects in JSON format. The list is pre-populated with two skeleton document objects, each of which contains two…

Integrating Apache Mahout with Apache Lucene and Solr – Part 1/3

Mahout uses an internal, sparse vector representation for text documents (dense vector representations are also available) so this file contains the “key” for making sense of the vectors later.  As…

Tags: , , ,

Poor man's "entity" extraction with Solr

…bonus we’ll also extract URLs found in text too.  Let’s start with an example input and the corresponding output all of the described techniques provides. Example document textual content:The CHO…

Tags: , ,

Options to tune document’s relevance in Solr

and can be specified with every new request to Solr. Also, what gets boosted is not a document or a field, but a subquery on the search. The simplest way…

Tags: , ,