TechTarget.com/whatis

https://www.techtarget.com/whatis/definition/named-entity-recognition-NER

What is named entity recognition (NER)?

By Cameron Hashemi-Pour

Named entity recognition (NER) is a natural language processing (NLP) method that extracts information from text. NER involves detecting and categorizing important information in text known as named entities. Named entities refer to the key subjects of a piece of text, such as names, locations, companies, events and products, as well as themes, topics, times, monetary values and percentages.

NER is also referred to as entity extraction, chunking and identification. It's used in many fields in artificial intelligence (AI), including machine learning (ML), deep learning and neural networks. NER is a key component of NLP systems, such as chatbots, sentiment analysis tools and search engines. It's used in healthcare, finance, human resources (HR), customer support, higher education and social media analysis.

What is the purpose of NER?

NER identifies, categorizes and extracts the most important pieces of information from unstructured text without requiring time-consuming human analysis. It's particularly useful for quickly extracting key information from large amounts of data because it automates the extraction process.

NER delivers critical insights to organizations about their customers, products, competition and market trends. For example, companies use it to detect when they're mentioned in publications. Healthcare providers use it to extract key medical information from patient records.

As NER models improve their ability to correctly identify important information, they are helping improve AI systems in general. These systems are enhancing AI language comprehension capabilities in areas such as summarization and translation systems and the ability of AI systems to analyze text.

How does NER work?

NER uses algorithms that are based on grammar, statistical NLP models and predictive models. These algorithms are trained on data sets labeled with predefined named entity categories, such as people, locations, organizations, expressions, percentages and monetary values. Categories are identified with abbreviations; for example, LOC is used for location, PER for persons and ORG for organizations.

Once trained on textual data and entity types, an NER learning model automatically analyzes new unstructured text, categorizing named entities and semantic meaning based on its training. When the information category of a piece of text is recognized, an information extraction utility extracts the named entity's related information and constructs a machine-readable document that other tools can process to extract meaning.

What are the four types of NER?

The four most used types of NER systems are the following:

  1. Supervised ML-based systems use ML models trained on texts humans have prelabeled with named entity categories. Supervised machine learning approaches use algorithms such as conditional random fields and maximum entropy, two complex statistical language models. This method is effective for parsing semantic meanings and other complexities, though it requires large volumes of training data.
  2. Rule-based systems use rules to extract information. Rules can include capitalizations or titles, such as "Dr." This method requires a lot of human intervention to input, monitor and tweak the rules, and it might miss textual variations not included in its training annotations. It's thought that rule-based systems don't handle complexity as well as machine learning models.
  3. Dictionary-based systems use a dictionary with an extensive vocabulary and synonym collection to cross-check and identify named entities. This method might have trouble classifying named entities with variations in spellings.
  4. Deep learning systems are the most accurate of the four. The use of neural networks, such as recurrent neural networks and transformer architectures, to examine the syntax and semantics of sentence structures. This approach is considered an upgrade from traditional machine learning because it can handle large data sets of text better and automatically learn features and attributes of input data.

NER methods

There are several methods available for implementing NER. Each is a type of tool trained to perform specific NER tasks. They are best described as follows:

Who uses NER?

Various industries and applications use NER in different ways. Each use case simplifies searching for and extracting important information from large data volumes so people can spend time on more valuable tasks. Examples include the following:

NER benefits and challenges

There are several benefits and challenges relevant to NER.

NER benefits

Named entity recognition provides a range of advantages when used appropriately:

NER challenges

NER also comes with its own set of issues:

NER best practices

Enterprises should follow a set of best practices when training, using and maintaining their NER systems. These practices include:

Natural Language Toolkit vs. SpaCy

NLTK and SpaCy are two NER programs with unique differences. NLTK is based on Python's NLP library and provides several algorithms. NLTK is often used for teaching NLP to beginners, as well as researchers building applications from the ground level. It uses strings as inputs and outputs in preprocessing. It provides tokenization, stemming, part-of-speech tagging and parsing and can be trained on customized data.

SpaCy, on the other hand, is open source and uses a single stemmer algorithm suited for concrete tasks. It's often used for building professional NLP applications and is object-oriented in preprocessing. SpaCy is also able to handle large data volumes, extract relationships between entities and offer support for word vectors. It's considered faster than NLTK.

Named entity recognition is a critical part of natural language processing. Learn how NLP augments enterprise analytics.

28 Oct 2024

All Rights Reserved, Copyright 1999 - 2026, TechTarget | Read our Privacy Statement