There are lots of questions about unstructured data and its impact on the data enterprise. Can we start with a definition?
What we're really doing is designating our data as structured or unstructured. Let's start with structured data, which is really data that is organized in a structure so that it is identifiable. The most universal form of structured data is a database like SQL or Access. For example, SQL (Structured Query Language) allows you to select specific pieces of information based on columns and rows in a field. You might look for all the rows containing a particular date or ZIP code or name -- this is structured data, and it is organized and searchable by data type within the actual content.
More information on unstructured data
How to manage unstructured data
Tips to marry unstructured data with ML
The problem with unstructured information
Email archiving implementation
By comparison, unstructured data has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data. Even though the email messages themselves are organized in a database, such as Microsoft Exchange or Lotus Notes, the body of the message is really freeform text without any structure at all -- the data is considered raw. Documents are another example of unstructured data. Although a Word document has some formatting attached to it, the content of the document is completely free form.
The nature of some data types, such as spreadsheets, is still a matter of debate. The spreadsheet itself has some structure, but the data you put into each cell of a spreadsheet, like Excel, is not regulated by the application.
Listen to the Unstructured data FAQ audiocast.
Go to the beginning of the Unstructured Data FAQ Guide.