dimensionality reduction
Dimensionality reduction is a machine learning (ML) or statistical technique of reducing the amount of random variables in a problem by obtaining a set of principal variables. This process can be carried out using a number of methods that simplify the modeling of complex problems, eliminate redundancy and reduce the possibility of the model overfitting and thereby including results that do not belong.
The process of dimensionality reduction is divided into two components, feature selection and feature extraction. In feature selection, smaller subsets of features are chosen from a set of many dimensional data to represent the model by filtering, wrapping or embedding. Feature extraction reduces the number of dimensions in a dataset in order to model variables and perform component analysis.
Methods of dimensionality reduction include:
- Factor Analysis
- Low Variance Filter
- High Correlation Filter
- Backward Feature Elimination
- Forward Feature Selection
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis
- Methods Based on Projections
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- UMAP
- Independent Component Analysis
- Missing Value Ratio
- Random Forest
Dimensionality reduction is advantageous to AI developers or data professionals working with massive datasets, performing data visualization and analyzing complex data. It aids in the process of data compression, allowing the data to take up less storage space as well as reduces computation times.