Dimensionality reduction: Difference between revisions

From Machinelearning
No edit summary
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]</ref> High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.<ref name="courses.washington.edu">[http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf Dimensionality Reduction] courses.washington.edu</ref>
'''Dimensionality reduction''' is one of the main applications of [[unsupervised learning]] . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.<ref name="pythonistaplanet.com">[https://pythonistaplanet.com/applications-of-unsupervised-learning/ Real World Applications of Unsupervised Learning]</ref> High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.<ref name="courses.washington.edu">[http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf Dimensionality Reduction] courses.washington.edu</ref>


== Categories ==
== Components ==


Dimensionality reduction can be devided into two subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>:
Dimensionality reduction can be devided into two components or subcategories<ref name="cognitive class">{{cite web |title=Machine Learning - Dimensionality Reduction - Feature Extraction & Selection |url=https://www.youtube.com/watch?v=AU_hBML2H1c |website=youtube.com |accessdate=24 March 2020}}</ref>:


* Feature selection:
* Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways<ref name="flair"/>:
** Wrappers
** Wrappers
** Filters
** Filters
** Embedded
** Embedded
* Feature extraction:
 
* Feature extraction: Used to reduce the data in a high dimensional space to a lower dimension space<ref name="flair"/>.
** [[Principal component analysis]]
** [[Principal component analysis]]


Line 22: Line 23:
== Methods ==
== Methods ==


Some common methods to perform dimensionality reduction are listed as follows<refname="">[https://data-flair.training/blogs/dimensionality-reduction-tutorial/ What is Dimensionality Reduction – Techniques, Methods, Components]</ref>:
Some common methods to perform dimensionality reduction are listed as follows<ref name="flair">[https://data-flair.training/blogs/dimensionality-reduction-tutorial/ What is Dimensionality Reduction – Techniques, Methods, Components]</ref>:


* Missing values:
* Missing values:
Line 32: Line 33:
* Factor analysis:
* Factor analysis:
* Principal component analysis (PCA):
* Principal component analysis (PCA):
== Disadvantages ==
One of the main disadvantages of dimensionality reduction is the loss of some amount of data. In the case of PCA, it tends to find linear correlations between variables, which is sometimes undesirable.<ref name="flair"/>




Line 37: Line 42:
* [https://www.youtube.com/watch?v=AU_hBML2H1c]
* [https://www.youtube.com/watch?v=AU_hBML2H1c]
* [http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf]
* [http://courses.washington.edu/css581/lecture_slides/17_dimensionality_reduction.pdf]
 
* [https://www.coursera.org/lecture/big-data-machine-learning/dimensionality-reduction-PDGeA]


== References ==
== References ==

Latest revision as of 19:28, 24 March 2020

Dimensionality reduction is one of the main applications of unsupervised learning . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.[1] High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.[2]

Components

Dimensionality reduction can be devided into two components or subcategories[3]:

  • Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways[4]:
    • Wrappers
    • Filters
    • Embedded

Algorithms

Some of the most common dimensionality reduction algorithms in machine learning are listed as follows[1]:

Methods

Some common methods to perform dimensionality reduction are listed as follows[4]:

  • Missing values:
  • Low variance:
  • Decision trees:
  • Random forest:
  • High correlation:
  • Backward feature elimination:
  • Factor analysis:
  • Principal component analysis (PCA):

Disadvantages

One of the main disadvantages of dimensionality reduction is the loss of some amount of data. In the case of PCA, it tends to find linear correlations between variables, which is sometimes undesirable.[4]


References