Browse Definitions :
Definition

semi-structured data

Semi-structured data is data that has not been organized into a specialized repository, such as a database, but that nevertheless has associated information, such as metadata, that makes it more amenable to processing than raw data.

The difference between structured data, unstructured data and semi-structured data:
Unstructured data has not been organized into a format that makes it easier to access and process. In reality, very little data is completely unstructured. Even things that are often considered unstructured data, such as documents and images, are structured to some extent. Structured data is basically the opposite of unstructured: It has been reformatted and its elements organized into a data structure so that elements can be addressed, organized and accessed in various combinations to make better use of the information. Semi-structured data lies somewhere between the two. It is not organized in a complex manner that makes sophisticated access and analysis possible; however, it may have information associated with it, such as metadata tagging, that allows elements contained to be addressed.

Here's an example: A Word document is generally considered to be unstructured data. However, you can add metadata tags in the form of keywords and other metadata that represent the document content and make it easier for that document to be found when people search for those terms -- the data is now semi-structured. Nevertheless, the document still lacks the complex organization of the database, so falls short of being fully structured data.

In reality, there is considerable overlap between the boundaries of the three categories, which are sometimes described collectively as the data continuum.

This was last updated in November 2014

Continue Reading About semi-structured data

SearchNetworking
  • throughput

    Throughput is a measure of how many units of information a system can process in a given amount of time.

  • traffic shaping

    Traffic shaping, also known as packet shaping, is a congestion management method that regulates network data transfer by delaying...

  • open networking

    Open networking describes a network that uses open standards and commodity hardware.

SearchSecurity
  • buffer underflow

    A buffer underflow, also known as a buffer underrun or a buffer underwrite, is when the buffer -- the temporary holding space ...

  • pen testing (penetration testing)

    A penetration test, also called a pen test or ethical hacking, is a cybersecurity technique that organizations use to identify, ...

  • single sign-on (SSO)

    Single sign-on (SSO) is a session and user authentication service that permits a user to use one set of login credentials -- for ...

SearchCIO
  • benchmark

    A benchmark is a standard or point of reference people can use to measure something else.

  • spatial computing

    Spatial computing broadly characterizes the processes and tools used to capture, process and interact with 3D data.

  • organizational goals

    Organizational goals are strategic objectives that a company's management establishes to outline expected outcomes and guide ...

SearchHRSoftware
  • talent acquisition

    Talent acquisition is the strategic process employers use to analyze their long-term talent needs in the context of business ...

  • employee retention

    Employee retention is the organizational goal of keeping productive and talented workers and reducing turnover by fostering a ...

  • hybrid work model

    A hybrid work model is a workforce structure that includes employees who work remotely and those who work on site, in a company's...

SearchCustomerExperience
Close