Advanced Linguistics

AI-powered search is challenging enough when you’re only working with datasets in the English language. The search engine must identify the correct language, normalize non-English characters, and improve content recall – without sacrificing precision. Non-English languages have distinct semantics, grammar and slang, not to mention the unique characters found in Asian, European, and Arabic base languages.

Lucidworks offers an Advanced Linguistics Package for Fusion, powered by partner BasisTech and their Rosette technology. Fusion with Rosette delivers personalized, relevant results, regardless of the language used to search or browse. Rosette Entity Extractor (REX) delivers structure, clarity, and insight, by revealing the key information—names, places, organizations, products, and key phrases—in 19 languages.

Language Identification

Fast, dependable language identification in 55 languages
Customizable dictionaries, script conversion, and orthographic normalization
Advanced morphological features include tokenization, lemmatization, and decompounding

Entity Extraction

Find entities which cannot be exhaustively listed in rules
Field training field kits create personalized entity extraction models for your use case
Foundation for apps in eDiscovery, social media analysis, and financial compliance

European Languages

Lemmatization for words with inflection (beau, beaux, belle)
Distinguish words with common stems (animal, animate)
Noun decompounding for German, Dutch, and Scandinavian languages

Asian Languages

Tokenization of Asian characters improves search precision
Normalization of meaningless character variations
Convert older style Japanese Kanji characters to modern characters

Fusion + Rosette

The Advanced Linguistics Package for Fusion personalizes search in Asian languages (Chinese, Japanese, and Korean), European base languages, and Arabic base languages. Global organizations that support multiple languages for their commerce, customer service, and workplace solutions can make the content they manage more accessible, more relevant and more personalized to a global audience.

Asian Base

Rosette’s Chinese base linguistics converts Chinese scripts to a single form – whether traditional or simplified – in order to be searched and processed correctly. Japanese base linguistics tools normalize Katakana spelling variations and also normalizes older kanji to modern kanji. The system understands the difference between Chinese and Japanese text written in Han script, and accurately returns pronunciation information.

European Base

The Advanced Linguistics Package includes language-specific tools for lemmatization and decompounding. Words in French, Spanish, and Italian can be highly inflected (e.g. Beautiful in French can be spelled beau, beaux, belle or belles). Lemmatization links words based on their meaning, not on how they look. This is useful for entity recognition and search relevancy. Decompounding is useful for German, Dutch, and Scandinavian languages.

Arabic Base

Arabic words frequently incorporate grammatical elements indicating attributes such as verb aspect, object, conjugation, person, number, gender, and others. Designed to plug into mainstream search engines and data mining applications, Arabic Base Linguistics performs orthographic and lexical normalization of Arabic text for use in Fusion queries. Fusion’s Advanced Linguistics Package also supports base linguistics for Persian (Farsi and Dari), Pashto, and Urdu.

Lucidworks Platform Overview

Lucidworks Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Search Path

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Search Path

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

Advanced Linguistics Capabilities