unstructured text

Unstructured text is written content that lacks metadata and cannot readily be indexed or mapped onto standard database fields. It is often user-generated information such as email or instant messages, documents or social media postings. Unstructured text is an important source of information for businesses, research institutes and surveillance agencies. Enterprises often mine unstructured text for data to enhance their business intelligence strategy and gain a competitive advantage in the marketplace. The unstructured text collected from social media activities plays a key role in predictive analytics for the enterprise because it is a prime source for sentiment analysis to determine the general attitude of consumers toward a brand or idea.

Mining of unstructured text delivers new insights by uncovering previously unknown information, detecting patterns and trends, and identifying connections between seemingly unrelated pieces of data. Natural language processing software and other automated tools are typically used to prepare unstructured text for indexing. Because language is often vague, disambiguation of the text through an examination of context is often an important initial step in the mining process. The content is also reviewed for word frequency and other patterns. Tagging is performed to label various pieces of text-derived data so it can be categorized and grouped in ways that are most likely to deliver useful information. Once the text has been turned into data, it can be analyzed and evaluated for relevance and importance.

This was last updated in September 2015

Continue Reading About unstructured text

Dig Deeper on Data science and analytics