The term learning algorithm is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then refine that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector. In practice, most learning algorithms are iterative, and have the property of being anytime algorithms: they can be stopped at any intermediate stage to give a solution that works (while an iteration is proceeding, the parameter vector is stored as the result of the most recent completed iteration).
Types of learning algorithms
- Iterative learning algorithms are algorithms that start with an initial parameter vector, and then, in each iteration, produce a new parameter vector based on the previous parameter vector. Iterative learning algorithms may be limited-memory (these only remember a fixed number of previous iterations) or full-memory. For an iterative learning algorithm, the decision of when to stop is part of the implementation and can be viewed as a learning algorithm hyperparameter.
Evaluation of learning algorithms
There are two ways of judging the success of a learning algorithm, and these approaches correspond to different views of the relationship between the learning algorithm and regularization.
Measurement of the cost function on the training data
This is a narrow view of the learning algorithm: the smaller the (appropriately regularized) cost function on the training data, the better the learning algorithm. The question of how good the cost function is on the test or cross-validation data is not considered the domain of the learning algorithm, but rather, of the choices made in the regularization process. Thus, the learning algorithm is not itself concerned with overfitting: that is something for the regularization process to worry about.
In this view, then, the goal of the learning algorithm is simply to solve a calculus optimization problem: we are given a function of several variables (the function being the cost function on the training data and the variables being the parameters) and we need to minimize it. Standard optimization algorithms apply. The success of an algorithm is determined by how low the cost function value is on the parameter vector it outputs.
Measurement of the cost function on data used for cross-validation or testing
This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.
In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that has low generalization error (as measured by its performance well on the cross-validation or test data).
This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how well the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.
This approach is sometimes necessary, because some techniques to avoid overfitting, such as early stopping, are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.