A cross-validation set is a set of labeled examples (input-output pairs) used in supervised learning algorithms for the goal of hyperparameter optimization for the learning algorithm. It is distinguished from the training set, on which we run the learning algorithm to determine the parameters. The cross-validation set may be used to tune the values of model hyperparameters (such as the degree of the polynomial to use), regularization hyperparameters (such as the coefficient to use for - or -regularization), or learning algorithm hyperparameters (such as the learning rate or the number of iterations).
The cross-validation set also differs from the test set. The test set is a subset of the labeled examples that is withheld for the entire duration of the execution of the whole machine learning problem, and is used only at the very end to judge the quality of the final result of the algorithm.
The concept of cross-validation
The term cross-validation is used for any approach that involves dividing the learning data into a training set and a cross-validation set, possibly doing this multiple times in different ways and averaging the results of all approaches.