Browse Definitions :
Definition

information extraction (IE)

Information extraction (IE) is the automated retrieval of specific information related to a selected topic from a body or bodies of text.

Information extraction tools make it possible to pull information from text documents, databases, websites or multiple sources. IE may extract info from unstructured, semi-structured or structured, machine-readable text. Usually, however, IE is used in natural language processing (NLP) to extract structured from unstructured text.

Information extraction depends on named entity recognition (NER), a sub-tool used to find targeted information to extract. NER recognizes entities first as one of several categories such as location (LOC), persons (PER) or organizations (ORG). Once the information category is recognized, an information extraction utility extracts the named entity’s related information and constructs a machine-readable document from it, which algorithms can further process to extract meaning. IE finds meaning by way of other subtasks including co-reference resolution, relationship extraction, language and vocabulary analysis and sometimes audio extraction.

IE dates back to the early days of Natural Language Processing of the 1970’s. JASPER is a system for IE that for Reuters by Carnegie Melon University is an early example. Current efforts in multimedia document processing in IE include automatic annotation and content recognition and extraction from images and video could be seen as IE as well.

Because of the complexity of language, high-quality IE is a challenging task for artificial intelligence (AI) systems.

This was last updated in January 2018

Continue Reading About information extraction (IE)

SearchNetworking
SearchSecurity
  • man in the browser (MitB)

    Man in the browser (MitB) is a security attack where the perpetrator installs a Trojan horse on the victim's computer that is ...

  • Patch Tuesday

    Patch Tuesday is the unofficial name of Microsoft's monthly scheduled release of security fixes for the Windows operating system ...

  • parameter tampering

    Parameter tampering is a type of web-based cyber attack in which certain parameters in a URL are changed without a user's ...

SearchCIO
  • chief procurement officer (CPO)

    The chief procurement officer, or CPO, leads an organization's procurement department and oversees the acquisitions of goods and ...

  • Lean Six Sigma

    Lean Six Sigma is a data-driven approach to improving efficiency, customer satisfaction and profits.

  • change management

    Change management is a systematic approach to dealing with the transition or transformation of an organization's goals, processes...

SearchHRSoftware
SearchCustomerExperience
  • clickstream data (clickstream analytics)

    Clickstream data and clickstream analytics are the processes involved in collecting, analyzing and reporting aggregate data about...

  • neuromarketing

    Neuromarketing is the study of how people's brains respond to advertising and other brand-related messages by scientifically ...

  • contextual marketing

    Contextual marketing is an online marketing strategy model in which people are served with targeted advertising based on their ...

Close