Dimensionality reduction

Dimensionality reduction is one of the main applications of unsupervised learning . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.^[1] High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.^[2]

Components

Dimensionality reduction can be devided into two components or subcategories^[3]:

Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways^[4]:

- Wrappers
- Filters
- Embedded

Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space^[4].
- Principal component analysis

Algorithms

Some of the most common dimensionality reduction algorithms in machine learning are listed as follows^[1]:

Principal Component Analysis
Kernel principal component analysis (Kernel PCA)
Locally-Linear Embedding

Methods

Some common methods to perform dimensionality reduction are listed as follows^[4]:

Missing values:
Low variance:
Decision trees:
Random forest:
High correlation:
Backward feature elimination:
Factor analysis:
Principal component analysis (PCA):

References

↑ ^1.0 ^1.1 Real World Applications of Unsupervised Learning
↑ Dimensionality Reduction courses.washington.edu
↑ Template:Cite web
↑ ^4.0 ^4.1 ^4.2 What is Dimensionality Reduction – Techniques, Methods, Components

[pythonistaplanet.com-1] 1.0 ^1.1 Real World Applications of Unsupervised Learning

[courses.washington.edu-2] Dimensionality Reduction courses.washington.edu

[cognitive_class-3] Template:Cite web

[flair-4] 4.0 ^4.1 ^4.2 What is Dimensionality Reduction – Techniques, Methods, Components

[1]

[2]

[3]

[4]