data dictionary

What is a data dictionary?

A data dictionary is a collection of descriptions of the data objects or items in a data model to which programmers and others can refer. Often, a data dictionary is a centralized metadata repository.

Data dictionaries sometimes play a role in data modeling, which creates a tangible diagram of object relationships that lists each object's name, assigned data values and defined relationships. The type of data, such as text, image or binary value, is described; possible predefined default values are listed; and a brief textual description is provided. This collection of information can be referenced through a data dictionary.

For example, a bank or group of banks could model the data objects involved in consumer banking. They could then provide a data dictionary for the bank's programmers. The data dictionary would describe each set of data in its data model for consumer banking, such as "account holder" and "available credit."

Types of data dictionaries

There are two types of data dictionaries: active and passive.

Active data dictionaries are created within the databases they describe and automatically reflect any updates or changes in their host databases. This avoids any discrepancies between the data dictionaries and their database structures.

Passive data dictionaries are created separately from the databases they describe to act as a repository for data information. Passive data dictionaries require additional work to stay in sync with the databases they describe. As such, database managers must handle passive directories with care to ensure there are no discrepancies.

Data dictionary components

The specific components of a data dictionary can vary, but they typically take the form of various types of metadata. Examples of these components include the following:

  • data object listings, such as names and definitions;
  • data element properties, such as data types, unique identifiers, sizes and indexes;
  • entity relationship diagrams;
  • system-level diagrams;
  • reference data;
  • missing data and quality indicator codes; and
  • business rules for validation of data quality and schema objects.

How to create a data dictionary

When planning a data dictionary build, it is important to consider all available data management resources, including associated databases and spreadsheets.

Most database management systems, as well as information systems created by computer-aided software engineering tools, contain integrated active data dictionaries. For example, the Performance Analyzer tool for Microsoft Access, which analyzes and documents databases, can create data dictionaries from data either based in or connected to a Microsoft Access implementation.

If it's not possible to automatically generate a machine-readable data dictionary, it's recommended to submit a data dictionary from a single source as a spreadsheet file. Spreadsheets can be made into data dictionaries within Microsoft Excel. There are also online templates that can come in handy when creating this type of data dictionary.

Pros and cons of data dictionaries

Data dictionaries can be valuable tools for the organization and management of large data listings. These are some of the biggest benefits of using a data dictionary:

  • provides organized, comprehensive lists of data;
  • easily searchable;
  • provides reporting and documentation for data across multiple programs;
  • simplifies the structure for system data requirements;
  • reduces data redundancy;
  • maintains data integrity across multiple databases; and
  • provides relationship information between different database tables.

However, data dictionaries can also prove difficult for some to manage. Here are some of the downsides:

  • lack of functional details regarding data;
  • diagrams that are not always visually appealing; and
  • can be difficult for nontechnical users to understand.
This was last updated in December 2022

Continue Reading About data dictionary

Dig Deeper on Application management tools and practices

Software Quality
Cloud Computing