Learning algorithm

Definition

The term learning algorithm is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then refune that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector.

Evaluation of learning algorithms

There are two ways of judging the success of a learning algorithm, and these approaches correspond to different views of the relationship between the learning algorithm and regularization.

Measurement of the cost function on the training data

This is a narrow view of the learning algorithm: the smaller the (appropriately regularized) cost function on the training data, the better the learning algorithm. The question of how good the cost function is on the test or cross-validation data is not considered the domain of the learning algorithm, but rather, of the choices made in the regularization process. Thus, the learning algorithm is not itself concerned with overfitting: that is something for the regularization process to worry about.

In this view, then, the goal of the learning algorithm is simply to solve a calculus optimization problem: we are given a function of several variables (the function being the cost function on the training data and the variables being the parameters) and we need to minimize it. Standard optimization algorithms apply. The success of an algorithm is determined by how low the cost function value is on the parameter vector it outputs.

Measurement of the cost function on data used for cross-validation or testing

This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.

In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that performs well on the cross-validation or test data.

This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how good the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.

This approach is sometimes necessary, because some techniques to avoid overfitting, such as early stopping, are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.