Browse Definitions :
association rules data labeling

dimensionality reduction

What is dimensionality reduction?

Dimensionality reduction is a process and technique to reduce the number of dimensions -- or features -- in a data set. The goal of dimensionality reduction is to decrease the data set's complexity by reducing the number of features while keeping the most important properties of the original data.

Dimensionality reduction is advantageous to AI developers or data professionals working with massive data sets, performing data visualization and analyzing complex data. It aids in the process of data compression, allowing the data to take up less storage space as well as reducing computation times. The technique is commonly used in machine learning (ML).

Different techniques, such as feature selection and feature extraction, are used to complete dimensionality reduction. Along with this, each technique uses several methods that simplify the modeling of complex problems, eliminate redundancy and reduce the possibility of the model overfitting.

Why is dimensionality reduction important for machine learning?

Machine learning requires large data sets to properly train and operate. Dimensionality reduction is a particularly useful way to prevent overfitting and to solve classification and regression problems.

This process is also useful to preserve the most relevant information while reducing the number of features in a data set. Dimensionality reduction removes irrelevant features from the data, as irrelevant data can decrease the accuracy of machine learning algorithms.

What are different techniques for dimensionality reduction?

There are two common dimensionality reduction techniques: feature selection and feature extraction.

  • In feature selection, small subsets of the most relevant features are chosen from a larger set of dimensional data to represent a model by filtering, wrapping or embedding. The goal here is to reduce the data set's dimensionality while keeping its most important features.
  • Feature extraction combines and transforms the data set's original features to create new features. The goal is to create a lower-dimensional data set that still has the data set's properties.

Feature selection uses different methods, such as the following:

  • The filter method. Filters a data set into a subset that only has the most relevant features of the original data set.
  • The wrapper method. Feeds features into an ML model to evaluate if a feature should be removed or added.
  • The embedded method. Evaluates the performance of each feature by checking training iterations of the ML model.

Feature extraction uses methods such as the following:

  • Principal component analysis (PCA). A statistical process that identifies smaller units of features from larger data sets. These small units are called principal components.
  • Linear discriminant analysis (LDA). A method that finds features that separate different classes of data the best.
  • T-distributed stochastic neighbor embedding (t-SNE). An unsupervised, nonlinear dimensionality reduction method that creates a probability distribution over pairs of objects and then creates a probability distribution over the points in a low-dimensional map.

Other methods used in dimensionality reduction include the following:

  • Factor analysis.
  • High correlation filter.
  • UMAP.
  • Random forest.
Examples of techniques and methods used in dimensionality reduction.
Dimensionality reduction can be enacted using a variety of techniques and methods.

Benefits and challenges of dimensionality reduction

Dimensionality reduction has benefits, such as the following:

  • Improved performance. Dimensionality reduction reduces the complexity of data, which reduces irrelevant data and improves performance.
  • Increase in visualization. High dimensional data is more difficult to visualize when compared to lower/simplified dimensional data.
  • Prevents overfitting. Higher dimensional data can lead to overfitting in ML models, which dimensionality reduction helps prevent.
  • Reduced storage space. Reduces require storage space as the process eliminates irrelevant data.

The process does come with downsides, however, such as the following:

  • Data loss. Dimensionality reduction should ideally have no data loss, as data can be recovered. However, the process might still result in some data loss, which can impact how training algorithms work.
  • Interpretability. It might be difficult to understand the relationships between original features and the reduced dimensions.
  • Computational complexity. Some reduction methods might be more computationally intensive than others.
  • Outliers. If not detected, data outliers might trouble the dimensionality reduction process.

To improve the performance of an ML model, dimensionality reduction can also be used as a data preparation step. Learn more data preparation steps for ML.

This was last updated in September 2023

Continue Reading About dimensionality reduction

  • local area network (LAN)

    A local area network (LAN) is a group of computers and peripheral devices that are connected together within a distinct ...

  • TCP/IP

    TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect ...

  • firewall as a service (FWaaS)

    Firewall as a service (FWaaS), also known as a cloud firewall, is a service that provides cloud-based network traffic analysis ...

  • identity management (ID management)

    Identity management (ID management) is the organizational process for ensuring individuals have the appropriate access to ...

  • single sign-on (SSO)

    Single sign-on (SSO) is a session and user authentication service that permits a user to use one set of login credentials -- for ...

  • fraud detection

    Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretenses.

  • change management

    Change management is a systematic approach to dealing with the transition or transformation of an organization's goals, processes...

  • IT project management

    IT project management is the process of planning, organizing and delineating responsibility for the completion of an ...

  • chief financial officer (CFO)

    A chief financial officer (CFO) is the corporate title for the person responsible for managing a company's financial operations ...

  • core HR (core human resources)

    Core HR (core human resources) is an umbrella term that refers to the basic tasks and functions of an HR department as it manages...

  • HR service delivery

    HR service delivery is a term used to explain how an organization's human resources department offers services to and interacts ...

  • employee retention

    Employee retention is the organizational goal of keeping productive and talented workers and reducing turnover by fostering a ...

Customer Experience
  • martech (marketing technology)

    Martech (marketing technology) refers to the integration of software tools, platforms, and applications designed to streamline ...

  • transactional marketing

    Transactional marketing is a business strategy that focuses on single, point-of-sale transactions.

  • customer profiling

    Customer profiling is the detailed and systematic process of constructing a clear portrait of a company's ideal customer by ...