Feature selection

Definition

Feature selection is the process of selecting the set of features of a given input datum that we will use to predict the corresponding output value. For instance, a set of features that we may use to predict the price of a house may be the number of floors, floor area, zip code, size of forth porch, and number of windows.

Distinction between elementary feature selection and derived feature selection

Elementary features are features that cannot be deduced from other, simpler features already available. Derived features are features that can be deduced from other features that have already been included. The choice of derived features can be thought of as more a model selection rather than a feature selection problem, because derived features can be incorporated into the functional form rather than thought of as features. Therefore, this page concentrates on the selection of elementary features.

The predictive power may be constrained by the choice of features, regardless of the power of models or learning algorithms

Once the set of features is chosen, that puts an upper bound on just how predictive the model can be. Suppose, for instance, that the output that we are trying to predict is sum sum $x_{1}+x_{2}+x_{3}$ of independent features $x_{1},x_{2},x_{3}$ where $x_{3}$ is normally distributed with mean $\mu$ and standard deviation $\sigma$ . If we choose only $x_{1}$ and $x_{2}$ as our features, then the most we can say about the output is that it is normally distributed with mean $x_{1}+x_{2}+\mu$ and standard deviation $\sigma$ . We simply cannot get more precise.

Selecting too many features could be problematic

The following are the problems with selecting too many features:

Collecting information on the values of the features for all the training data, as well as for the new inputs on which we are trying to make predictions, becomes harder.
The learning algorithm used to optimize the parameter values becomes more computationally intensive.
A functional form that uses too many features may suffer from overfitting.

Definition

Distinction between elementary feature selection and derived feature selection

The predictive power may be constrained by the choice of features, regardless of the power of models or learning algorithms

Selecting too many features could be problematic

See also