Apache Solr for Multi-language Content Discovery Through Entity Driven Search

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Alessandro Benedetti’s session on using entity driven search for multi-language content discovery and search.

This talk is about the description of the implementation of a Semantic Search Engine based on Solr. Meaningfully structuring content is critical, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of Solr search results. Our solution is based on three advanced features:

  1. Entity-oriented search – Searching not by keyword, but by entities (concepts in a certain domain)
  2. Knowledge graphs – Leveraging relationships amongst entities: Linked Data datasets (Freebase, DbPedia, Custom …)
  3. Search assistance – Autocomplete and Spellchecking are now common features, but using semantic data makes it possible to offer smarter features, driving the users to build queries in a natural way.

The approach includes unstructured data processing mechanisms integrated with Solr to automatically index semantic and multi-language information. Smart Autocomplete will complete users’ query with entity names and properties from the domain knowledge graph. As the user types, the system will propose a set of named entities and/or a set of entity types across different languages. As the user accepts a suggestion, the system will dynamically adapt following suggestions and return relevant documents. Semantic More Like This will find similar documents to a seed one, based on the underlying knowledge in the documents, instead of tokens.

Alessandro Benedetti is a search expert and semantic technology passionate, working in the R&D division of Zaizi. His favorite work is in R&D on information retrieval, NLP and machine learning with a big emphasis on data structures, algorithms and probability theory. Alessandro earned his Masters in Computer Science with full grade in 2009, then spent 6 month with Universita’ degli Studi di Roma working on his masters thesis around a new approach to improve semantic web search. Alessandro spent 3 years with Sourcesense as a Search and Open Source consultant and developer.

http://www.slideshare.net/lucidworks/multilanguage-content-discovery-through-entity-driven-search-alessandro-benedetti

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

You Might Also Like

Protected: From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

There is no excerpt because this is a protected post.

Read More

Build custom AI agents without writing a single line of code? Yep, we did that.

Finally, a low-code AI platform (really, no code) that lets the people...

Read More

How a B2B distribution giant uses smart search to navigate inflation, tariffs, and 10,000+ daily queries

Meet Ryan Finley: A 17-year search veteran who's turning enterprise search into...

Read More

Quick Links