Feature selection
Definition
Feature selection is the process of selecting the set of features of a given input datum that we will use to predict the corresponding output value. For instance, a set of features that we may use to predict the price of a house may be the number of floors, floor area, zip code, size of forth porch, and number of windows.
Distinction between elementary feature selection and derived feature selection
Elementary features are features that cannot be deduced from other, simpler features already available. Derived features are features that can be deduced from other features that have already been included. The choice of derived features can be thought of as more a model selection rather than a feature selection problem, because derived features can be incorporated into the functional form rather than thought of as features. Therefore, this page concentrates on the selection of elementary features.
The predictive power may be constrained by the choice of features, regardless of the power of models or learning algorithms
Once the set of features is chosen, that puts an upper bound on just how predictive the model can be. Suppose, for instance, that the output that we are trying to predict is sum sum of independent features where is normally distributed with mean and standard deviation . If we choose only and as our features, then the most we can say about the output is that it is normally distributed with mean and standard deviation . We simply cannot get more precise.
Selecting too many features could be problematic
The following are the problems with selecting too many features:
- Collecting information on the values of the features for all the training data, as well as for the new inputs on which we are trying to make predictions, becomes harder.
- The learning algorithm used to optimize the parameter values becomes more computationally intensive.
- A functional form that uses too many features may suffer from overfitting.
See also
- Feature scaling: This is a process of linear scaling typically executed after feature selection, and in some cases along with model selection. The idea is to make the ranges of values typically taken by the features roughly comparable. This is to avoid some features taking very large values and some features taking very small values, something that poses a problem for some learning algorithms and also for regularization.
- Model selection