text tagging

Text tagging is the process of manually or automatically adding tags or annotation to various components of unstructured data as one step in the process of preparing such data for analysis. Tagging takes place at a more granular level than categorization, and may offer additional benefits in terms of insight. One common form of text tagging is "named entity extraction". With this extraction method, a batch of unstructured data might be scanned to identify names of people, products, organizations, locations or dates. Such an approach might be useful to determine the relationship and pattern of interactions between the named entities.

Tagging can be done manually, but there are also computer programs that can perform auto-tagging. Some programs simply use rules and word lists to tag content appropriately when most of the critical parameters are known. However, more complex systems use advanced natural language processing and machine learning (based on previous examples) that may provide a higher level of accuracy and efficiency for large data sets. With both rules-based and machine learning models, the taxonomy tends to expand over time to allow more data to be structured with tags. As analysis is performed on the structured data, the resulting intelligence may be used to refine or expand the text tagging system.

This was last updated in September 2015

Continue Reading About text tagging

Dig Deeper on Data science and analytics