A data dictionary is a collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them. Often a data dictionary is a centralized metadata repository.
A first step in analyzing a system of interactive objects is to identify each one and its relationship to other objects. This process is called data modeling and results in a picture of object relationships. After each data object or item is given a descriptive name, its relationship is described, or it becomes part of some structure that implicitly describes relationship. The type of data, such as text or image or binary value, is described, possible predefined default values are listed and a brief textual description is provided. This data collection can be organized for reference into a book called a data dictionary.
When developing programs that use the data model, a data dictionary can be consulted to understand where a data item fits in the structure, what values it may contain and what the data item means in real-world terms. For example, a bank or group of banks could model the data objects involved in consumer banking. They could then provide a data dictionary for a bank's programmers. The data dictionary would describe each of the data items in its data model for consumer banking, such as "Account holder" and "Available credit."
Types of data dictionaries
There are two types of data dictionaries. Active and passive data dictionaries differ in level of automatic synchronization.
- Active data dictionaries. These are data dictionaries created within the databases they describe automatically reflect any updates or changes in their host databases. This avoids any discrepancies between the data dictionaries and their database structures.
- Passive data dictionaries. These are data dictionaries created as new databases -- separate from the databases they describe -- for the purpose of storing data dictionary information. Passive data dictionaries require an additional step to stay in sync with the databases they describe and must be handled with care to ensure there are no discrepancies.
Data dictionary components
Specific contents in a data dictionary can vary. In general, these components are various types of metadata, providing information about data.
- Data object listings (names and definitions)
- Data element properties (such as data type, unique identifiers, size, nullability, indexes and optionality)
- Entity-relationship diagrams (ERD)
- System-level diagrams
- Reference data
- Missing data and quality-indicator codes
- Business rules (such as for validation of data quality and schema objects)
How to create a data dictionary
When planning to create a data dictionary, it is important to consider all available data management resources, including databases and spreadsheets.
Most database management systems (DBMSes), as well as information systems created by computer-aided software engineering (CASE) tools, contain integrated active data dictionaries. For example, the Analyzer tool for Microsoft Access -- which analyzes and documents databases -- can be used to create a data dictionary from Access-based or Access-connected data.
If a machine-readable data dictionary cannot be automatically generated, it is suggested to submit a data dictionary from a single source as a spreadsheet.
Within Excel, .XLS or .XLSX spreadsheets can be made into data dictionaries. Online templates are useful for creating this type of data dictionary.
Pros and cons of data dictionaries
Data dictionaries can be a valuable tool for the organization and management of large data listings. Other pros include:
- Provides organized, comprehensive list of data
- Easily searchable
- Can provide reporting and documentation for data across multiple programs
- Simplifies the structure for system data requirements
- No data redundancy
- Maintains data integrity across multiple databases
- Provides relationship information between different database tables
- Useful in the software design process and test cases
Though they provide thorough listings of data attributes, data dictionaries may be difficult to use for some users. Other cons include:
- Functional details not provided
- Not visually appealing
- Difficult to understand for non-technical users