In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables. A snowflake schema is a variation of the star schema.
Snowflaking is used to improve the performance of certain queries. The schema is diagramed with each fact surrounded by its associated dimensions (as in a star schema), and those dimensions are further related to other dimensions, branching out into a snowflake pattern.
Data warehouses and data marts may use snowflaking to support specific query needs. Snowflaking can improve query performance against low cardinality attributes that are queried independently. Business intelligence applications that use a relational OLAP (ROLAP) architecture may perform better when the data warehouse schema is snowflaked.
A star schema stores all attributes for a dimension into one denormalized (“flattened”) table. This requires more disk space than a more normalized snowflake schema. Snowflaking normalizes the dimension by moving attributes with low cardinality (few distinct values) into separate dimension tables that relate to the core dimension table by using foreign keys. Snowflaking for the sole purpose of minimizing disk space is not recommended, however, because it can adversely impact query performance.
Cases for snowflaking include:
- Sparsely populated attributes, where most dimension member records have a NULL value for the attribute, are moved to a sub-dimension.
- Low cardinality attributes that are queried independently. For example, a product dimension may contain thousands of products, but only a handful of product types. Moving the product type attribute to its own dimension table can improve performance when the product types are queried independently.
- Attributes that are part of a hierarchy and are queried independently. Examples include the year, quarter, and month attributes of a date hierarchy; and the country and state attributes of a geographic hierarchy.