red150770 - Fotolia


An enterprise architect's guide to the data modeling process

As volumes of both business and application data grow, organizations need a strict, three-phase data modeling process that keeps that data contained and manageable.

Databases are all about the data, but to application architects and developers, the way in which the database is structured is equally important. They must create a data model to structure data within a database.

A properly structured database optimizes data access for applications, enforces compliance mandates and identifies important data relationships. Data models and structuring aren't formed in an automatic process -- it's one that takes careful planning and navigation.

Let's discuss how to identify business data for modeling, and explore the three phases of the process: the conceptual model, the logical model and the physical model.

Identify the data

The first step in data modeling is to identify the actual data content spread across the organization. The early stages of this data modeling process should involve both business and software groups. For instance, IT shops need to consider business process management factors, the structure of the enterprise architecture and the development team's domain-driven design process.

Instead of the specific structure of the database, focus on the data itself and on the relationships of data to business processes. Focusing too much on the database structure now can disengage business-line experts. Get their knowledge on the data's business context. Maintaining that context -- and using it to map data to software missions -- is critical in the phases of modeling.

Phases of data modeling

Phase 1: The conceptual model

Conceptual modeling in the process is normally based on the relationship between application components. The model assigns a set of properties for each component, which will then define the data relationships. These components can include things like organizations, people, facilities, products and application services.

The definitions of these components should identify business relationships. For example, a product ships from a warehouse, and then to a retail store. An effective conceptual data model diligently traces the flow of these goods, orders and payments between the various software systems the company uses.

Conceptual models are sometimes translated directly into physical database models. However, when data structures are complex, it's worth creating a logical model that sits in between. It populates the conceptual model with the specific parametric data that will, eventually, become the physical model.

Is relational database management right for you?

The majority of databases are based on a relational database design. Many organizations are tempted to avoid a logical database model in favor of a relational database management system (RDBMS). However, that move can sometimes be a mistake. While an RDBMS can certainly support applications built on a logical model, RDBMSes can still exhibit real-time performance issues. For real-time applications with limited analytics and reporting capabilities, you should assess a non-relational approach to see which one fits better.

Phase 2: The logical model

In the logical modeling step, create unique identifiers that define each component's property and the scope of the data fields. Some people like to use the original data source and the property name in a qualified-name structure, such as SalesOrder.CustomerName, for this step of the data modeling process. Some people identify the data source in parentheses, while some do not provide source information at all. Data architects generally agree that organizations should explicitly identify the data source, in some way.

Source information is a critical way to eliminate redundancy in storage. Establish a single entity as the data source. Then, link that single data source to all the entities that share the data. When done during design, this practice eliminates the need for database deduplication later on.

A logical model can translate directly into a relational database, an entity/relationship model, an object-modeling language or a business-oriented language. In some cases, the relationships themselves dictate the proper model.

Phase 3: The physical model

The final step in the data modeling process is to map the logical model to a database management system for the database administrators. Once enough of the data is explicitly identified and mapped, administrators can create the actual structure of the database. The physical model helps generate a tangible schema for the components of the database's structure, such as columns, identification keys, constraints, indexes and triggers.

Most organizations already have preferred database software, tools and skill sets. Some of these popular tool providers include MongoDB, MySQL and SAP HANA. Generally speaking, stick with the tools already in place that people are comfortable using. Adopting a new vendor could increase costs and impact operations significantly. If the data modeling process raised specific issues that the current setup cannot support, check with your vendors for any readily available strategies to address them.

Dig Deeper on Enterprise architecture management

Software Quality
Cloud Computing