Dimensionality reduction: Difference between revisions

Latest revision as of 19:28, 24 March 2020

Dimensionality reduction is one of the main applications of unsupervised learning . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.^[1] High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.^[2]

Components

Dimensionality reduction can be devided into two components or subcategories^[3]:

Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways^[4]:
- Wrappers
- Filters
- Embedded

Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space^[4].
- Principal component analysis

Algorithms

Some of the most common dimensionality reduction algorithms in machine learning are listed as follows^[1]:

Principal Component Analysis
Kernel principal component analysis (Kernel PCA)
Locally-Linear Embedding

Methods

Some common methods to perform dimensionality reduction are listed as follows^[4]:

Missing values:
Low variance:
Decision trees:
Random forest:
High correlation:
Backward feature elimination:
Factor analysis:
Principal component analysis (PCA):

Disadvantages

One of the main disadvantages of dimensionality reduction is the loss of some amount of data. In the case of PCA, it tends to find linear correlations between variables, which is sometimes undesirable.^[4]

References

↑ ^1.0 ^1.1 Real World Applications of Unsupervised Learning
↑ Dimensionality Reduction courses.washington.edu
↑ Template:Cite web
↑ ^4.0 ^4.1 ^4.2 ^4.3 What is Dimensionality Reduction – Techniques, Methods, Components

[pythonistaplanet.com-1] 1.0 ^1.1 Real World Applications of Unsupervised Learning

[courses.washington.edu-2] Dimensionality Reduction courses.washington.edu

[cognitive_class-3] Template:Cite web

[flair-4] 4.0 ^4.1 ^4.2 ^4.3 What is Dimensionality Reduction – Techniques, Methods, Components

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
-'''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com">{{cite web |title=Real World Applications of Unsupervised Learning |url=https://pythonistaplanet.com/applications-of-unsupervised-learning/ |website=pythonistaplanet.com |accessdate=23 March 2020}}</ref> High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.<ref name="courses.washington.edu">{{cite web |title=Dimensionality Reduction |url=http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf |website=courses.washington.edu |accessdate=24 March 2020}}</ref>
+'''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]</ref> High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.<ref name="courses.washington.edu">[http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf Dimensionality Reduction] courses.washington.edu</ref>
-== Categories ==
+== Components ==
-Dimensionality reduction can be devided into two subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>:
+Dimensionality reduction can be devided into two components or subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>:
-* Feature selection:
+* Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways<ref name="flair"/>:
 ** Wrappers
 ** Filters
 ** Embedded
-* Feature extraction:
+* Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space<ref name="flair"/>.
 ** [[Principal component analysis]]
@@ Line 19: / Line 20: @@
 * [[Kernel principal component analysis]] (Kernel PCA)
 * [[Locally-Linear Embedding]]
+== Methods ==
+Some common methods to perform dimensionality reduction are listed as follows<ref name="flair">[https://data-flair.training/blogs/dimensionality-reduction-tutorial/ What is Dimensionality Reduction – Techniques, Methods, Components]</ref>:
+* Missing values:
+* Low variance:
+* Decision trees:
+* Random forest:
+* High correlation:
+* Backward feature elimination:
+* Factor analysis:
+* Principal component analysis (PCA):
+== Disadvantages ==
+One of the main disadvantages of dimensionality reduction is the loss of some amount of data. In the case of PCA, it tends to find linear correlations between variables, which is sometimes undesirable.<ref name="flair"/>
@@ Line 24: / Line 42: @@
 * [https://www.youtube.com/watch?v=AU_hBML2H1c]
 * [http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf]
+* [https://www.coursera.org/lecture/big-data-machine-learning/dimensionality-reduction-PDGeA]
 == References ==