Dimensionality reduction

From Machinelearning
Revision as of 18:35, 24 March 2020 by Sebastian (talk | contribs)

Dimensionality reduction is one of the main applications of unsupervised learning . It can be understood as the process of reducing the number of random variables under consideration by getting a set of principal variables.[1] High dimensionality has many costs, including redundant and irrelevant features which degrade the performance of some algorithms, difficulty in interpretation and visualization, and infeasible computation.[2]

Components

Dimensionality reduction can be devided into two components or subcategories[3]:

  • Feature selection: Consists in finding a subset of the original set of variables, and a subset aimed at modeling the problem. It usually involves three ways[4]:
    • Wrappers
    • Filters
    • Embedded

Algorithms

Some of the most common dimensionality reduction algorithms in machine learning are listed as follows[1]:

Methods

Some common methods to perform dimensionality reduction are listed as follows[4]:

  • Missing values:
  • Low variance:
  • Decision trees:
  • Random forest:
  • High correlation:
  • Backward feature elimination:
  • Factor analysis:
  • Principal component analysis (PCA):


References