Lemmatization is the grouping together of different forms of the same word. In search queries, lemmatization allows end users to query any version of a base word and get relevant results. Because search engine algorithms use lemmatization, the user is free to query any inflectional form of a word and get relevant results. For example, if the user queries the plural form of a word (routers), the search engine knows to also return relevant content that uses the singular form of the same word (router).

Lemmatization is an important aspect of natural language understanding (NLU) and natural language processing (NLP) and plays an important role in big data analytics and artificial intelligence (AI). Complex algorithms use the rules of linguistic morphology, in context with a particular language's vocabulary, to group words used in speech and writing by inflected forms. Deep learning is used to analyze and understand the grouping as a whole, so when any inflectional form of a word is mentioned, the base term's entire lemmatization is included.

In linguistics, lemmatization is closely related to stemming, the practice of stripping of prefixes and suffixes that have been added to a word's base form. Lemmatization is more complex than stemming, however, because it requires words to be categorized by a part-of-speech as well as by inflected form. This can become quite complicated in languages other than English, whose only inflected forms are singular/plural, verb tense and comparative/superlative forms of adverbs and adjectives.

This was last updated in February 2018

Continue Reading About lemmatization

Dig Deeper on AI technologies

Business Analytics
Data Management