Apache Solr for Multi-language Content Discovery Through Entity Driven Search

Using entity driven search for multi-language content discovery and search.

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Alessandro Benedetti’s session on using entity driven search for multi-language content discovery and search.

This talk is about the description of the implementation of a Semantic Search Engine based on Solr. Meaningfully structuring content is critical, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of Solr search results. Our solution is based on three advanced features:

  1. Entity-oriented search – Searching not by keyword, but by entities (concepts in a certain domain)
  2. Knowledge graphs – Leveraging relationships amongst entities: Linked Data datasets (Freebase, DbPedia, Custom …)
  3. Search assistance – Autocomplete and Spellchecking are now common features, but using semantic data makes it possible to offer smarter features, driving the users to build queries in a natural way.

The approach includes unstructured data processing mechanisms integrated with Solr to automatically index semantic and multi-language information. Smart Autocomplete will complete users’ query with entity names and properties from the domain knowledge graph. As the user types, the system will propose a set of named entities and/or a set of entity types across different languages. As the user accepts a suggestion, the system will dynamically adapt following suggestions and return relevant documents. Semantic More Like This will find similar documents to a seed one, based on the underlying knowledge in the documents, instead of tokens.

Alessandro Benedetti is a search expert and semantic technology passionate, working in the R&D division of Zaizi. His favorite work is in R&D on information retrieval, NLP and machine learning with a big emphasis on data structures, algorithms and probability theory. Alessandro earned his Masters in Computer Science with full grade in 2009, then spent 6 month with Universita’ degli Studi di Roma working on his masters thesis around a new approach to improve semantic web search. Alessandro spent 3 years with Sourcesense as a Search and Open Source consultant and developer.

http://www.slideshare.net/lucidworks/multilanguage-content-discovery-through-entity-driven-search-alessandro-benedetti

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

You Might Also Like

From search company to practical AI pioneer: Our vision for 2025 and beyond

CEO Mike Sinoway shares insights on AI's future, introducing Commerce Studio™ and...

Read More

When AI Goes Wrong: Real-World Fails and How to Prevent Them

Don’t let your AI chatbot sell a $50,000 Tahoe for $1! This...

Read More

Lucidworks Core Packages: Industry-Optimized AI Search & Personalization Solutions

Discover our comprehensive Core Packages that combine Analytics Studio, Commerce Studio, and...

Read More

Quick Links