Dimensionality reduction: Difference between revisions
No edit summary |
No edit summary |
||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com"> | '''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]</ref> High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.<ref name="courses.washington.edu">[http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf Dimensionality Reduction] courses.washington.edu</ref> | ||
== | == Components == | ||
Dimensionality reduction can be devided into two subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>: | Dimensionality reduction can be devided into two components or subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>: | ||
* Feature selection: | * Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways<ref name="flair"/>: | ||
** Wrappers | ** Wrappers | ||
** Filters | ** Filters | ||
** Embedded | ** Embedded | ||
* Feature extraction: | |||
* Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space<ref name="flair"/>. | |||
** [[Principal component analysis]] | ** [[Principal component analysis]] | ||
Line 19: | Line 20: | ||
* [[Kernel principal component analysis]] (Kernel PCA) | * [[Kernel principal component analysis]] (Kernel PCA) | ||
* [[Locally-Linear Embedding]] | * [[Locally-Linear Embedding]] | ||
== Methods == | |||
Some common methods to perform dimensionality reduction are listed as follows<ref name="flair">[https://data-flair.training/blogs/dimensionality-reduction-tutorial/ What is Dimensionality Reduction – Techniques, Methods, Components]</ref>: | |||
* Missing values: | |||
* Low variance: | |||
* Decision trees: | |||
* Random forest: | |||
* High correlation: | |||
* Backward feature elimination: | |||
* Factor analysis: | |||
* Principal component analysis (PCA): | |||
== Disadvantages == | |||
One of the main disadvantages of dimensionality reduction is the loss of some amount of data. In the case of PCA, it tends to find linear correlations between variables, which is sometimes undesirable.<ref name="flair"/> | |||
Line 24: | Line 42: | ||
* [https://www.youtube.com/watch?v=AU_hBML2H1c] | * [https://www.youtube.com/watch?v=AU_hBML2H1c] | ||
* [http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf] | * [http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf] | ||
* [https://www.coursera.org/lecture/big-data-machine-learning/dimensionality-reduction-PDGeA] | |||
== References == | == References == |
Latest revision as of 19:28, 24 March 2020
Dimensionality reduction is one of the main applications of unsupervised learning . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.[1] High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.[2]
Components
Dimensionality reduction can be devided into two components or subcategories[3]:
- Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways[4]:
- Wrappers
- Filters
- Embedded
- Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space[4].
Algorithms
Some of the most common dimensionality reduction algorithms in machine learning are listed as follows[1]:
- Principal Component Analysis
- Kernel principal component analysis (Kernel PCA)
- Locally-Linear Embedding
Methods
Some common methods to perform dimensionality reduction are listed as follows[4]:
- Missing values:
- Low variance:
- Decision trees:
- Random forest:
- High correlation:
- Backward feature elimination:
- Factor analysis:
- Principal component analysis (PCA):
Disadvantages
One of the main disadvantages of dimensionality reduction is the loss of some amount of data. In the case of PCA, it tends to find linear correlations between variables, which is sometimes undesirable.[4]