Dimensionality reduction: Difference between revisions

Revision as of 18:35, 24 March 2020

Dimensionality reduction is one of the main applications of unsupervised learning . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.^[1] High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.^[2]

Components

Dimensionality reduction can be devided into two components or subcategories^[3]:

Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways^[4]:

- Wrappers
- Filters
- Embedded

Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space^[4].
- Principal component analysis

Algorithms

Some of the most common dimensionality reduction algorithms in machine learning are listed as follows^[1]:

Principal Component Analysis
Kernel principal component analysis (Kernel PCA)
Locally-Linear Embedding

Methods

Some common methods to perform dimensionality reduction are listed as follows^[4]:

Missing values:
Low variance:
Decision trees:
Random forest:
High correlation:
Backward feature elimination:
Factor analysis:
Principal component analysis (PCA):

References

↑ ^1.0 ^1.1 Real World Applications of Unsupervised Learning
↑ Dimensionality Reduction courses.washington.edu
↑ Template:Cite web
↑ ^4.0 ^4.1 ^4.2 What is Dimensionality Reduction – Techniques, Methods, Components

[pythonistaplanet.com-1] 1.0 ^1.1 Real World Applications of Unsupervised Learning

[courses.washington.edu-2] Dimensionality Reduction courses.washington.edu

[cognitive_class-3] Template:Cite web

[flair-4] 4.0 ^4.1 ^4.2 What is Dimensionality Reduction – Techniques, Methods, Components

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
 '''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]</ref> High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.<ref name="courses.washington.edu">[http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf Dimensionality Reduction] courses.washington.edu</ref>
-== Categories ==
+== Components ==
-Dimensionality reduction can be devided into two subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>:
+Dimensionality reduction can be devided into two components or subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>:
+* Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways<ref name="flair"/>:
-* Feature selection:
 ** Wrappers
 ** Filters
 ** Embedded
-* Feature extraction:
+* Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space<ref name="flair"/>.
 ** [[Principal component analysis]]