Learning algorithm - Revision history

Vipul: /* Measurement of the cost function on data used for cross-validation or testing */

2014-08-15T23:25:51Z

Measurement of the cost function on data used for cross-validation or testing

← Older revision		Revision as of 23:25, 15 August 2014
Line 23:		Line 23:
	In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that has low [[generalization error]] (as measured by its performance well on the cross-validation or test data).		In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that has low [[generalization error]] (as measured by its performance well on the cross-validation or test data).

	This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how ~~good~~ the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.		This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how well the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.

	This approach is sometimes necessary, because some techniques to avoid overfitting, such as [[early stopping]], are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.		This approach is sometimes necessary, because some techniques to avoid overfitting, such as [[early stopping]], are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.

Vipul: /* Measurement of the cost function on data used for cross-validation or testing */

2014-08-15T23:25:34Z

Measurement of the cost function on data used for cross-validation or testing

← Older revision		Revision as of 23:25, 15 August 2014
Line 21:		Line 21:
	This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.		This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.

	In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that ~~performs~~ well on the cross-validation or test data.		In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that has low [[generalization error]] (as measured by its performance well on the cross-validation or test data).

	This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how good the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.		This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how good the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.

	This approach is sometimes necessary, because some techniques to avoid overfitting, such as [[early stopping]], are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.		This approach is sometimes necessary, because some techniques to avoid overfitting, such as [[early stopping]], are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.

Vipul: /* Types of learning algorithms */

2014-08-15T23:24:38Z

Types of learning algorithms

← Older revision		Revision as of 23:24, 15 August 2014
Line 5:		Line 5:
	==Types of learning algorithms==		==Types of learning algorithms==

	* [[Iterative learning algorithm]]s are algorithms that start with an initial parameter vector, and then, in each iteration, produce a new parameter vector based on the previous parameter vector. Iterative learning algorithms may be limited-memory (these only remember a fixed number of previous iterations) or full-memory. For an iterative learning algorithm, the decision of when to stop is part of the implementation and can be viewed as a [[learning algorithm ~~parameter~~]].		* [[Iterative learning algorithm]]s are algorithms that start with an initial parameter vector, and then, in each iteration, produce a new parameter vector based on the previous parameter vector. Iterative learning algorithms may be limited-memory (these only remember a fixed number of previous iterations) or full-memory. For an iterative learning algorithm, the decision of when to stop is part of the implementation and can be viewed as a [[learning algorithm hyperparameter]].

	==Evaluation of learning algorithms==		==Evaluation of learning algorithms==

Vipul: /* Definition */

2014-08-15T23:24:27Z

Definition

← Older revision		Revision as of 23:24, 15 August 2014
Line 1:		Line 1:
	==Definition==		==Definition==

	The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then ~~refune~~ that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector.		The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then refine that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector. In practice, most learning algorithms are iterative, and have the property of being [[wikipedia:anytime algorithm\|anytime algorithms]]: they can be stopped at any intermediate stage to give a solution that works (while an iteration is proceeding, the parameter vector is stored as the result of the most recent completed iteration).

	==Types of learning algorithms==		==Types of learning algorithms==

Vipul at 22:35, 18 June 2014

2014-06-18T22:35:10Z

← Older revision		Revision as of 22:35, 18 June 2014
Line 2:		Line 2:

	The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then refune that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector.		The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then refune that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector.

			==Types of learning algorithms==

			* [[Iterative learning algorithm]]s are algorithms that start with an initial parameter vector, and then, in each iteration, produce a new parameter vector based on the previous parameter vector. Iterative learning algorithms may be limited-memory (these only remember a fixed number of previous iterations) or full-memory. For an iterative learning algorithm, the decision of when to stop is part of the implementation and can be viewed as a [[learning algorithm parameter]].

	==Evaluation of learning algorithms==		==Evaluation of learning algorithms==

Vipul: /* Evaluation of learning algorithms */

2014-06-18T22:31:41Z

Evaluation of learning algorithms

← Older revision		Revision as of 22:31, 18 June 2014
Line 5:		Line 5:
	==Evaluation of learning algorithms==		==Evaluation of learning algorithms==

	There are two ways of judging the success of a learning algorithm:		There are two ways of judging the success of a learning algorithm, and these approaches correspond to different views of the relationship between the learning algorithm and regularization.

	* Measurement of the cost function on the training data~~: This is a narrow view of the learning algorithm: the smaller the cost function on the training data, the better the learning algorithm.~~		===Measurement of the cost function on the training data===
	* Measurement of the cost function on separate data (cross-validation or test data): This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.

	~~These two different views~~ of ~~how~~ the learning algorithm ~~should be measured relate with different views~~ of the ~~relationship between~~ the learning algorithm ~~and~~ the regularization ~~choice made to avoid~~ [[overfitting]]~~. One view~~ is ~~that the problems are cleanly separable: regularization (through~~ the ~~addition of a~~ regularization ~~term~~ to ~~the cost function) serves~~ the goal of ~~addressing overfitting, and~~ the learning algorithm~~'s goal~~ is simply to ~~minimize~~ the ~~regularized~~ cost function on the training ~~set~~. The ~~alternate view~~ is ~~that~~ the ~~regularization method and~~ the learning algorithm ~~are intricately intertwined~~, ~~and~~ the learning algorithm is ~~responsible not just for doing well with~~ the ~~regularized problem on~~ the ~~training data but also for doing~~ well ~~with the original problem~~ on the cross-validation or test data.		This is a narrow view of the learning algorithm: the smaller the (appropriately regularized) cost function on the training data, the better the learning algorithm. The question of how good the cost function is on the test or cross-validation data is not considered the domain of the learning algorithm, but rather, of the choices made in the regularization process. Thus, the learning algorithm is not itself concerned with [[overfitting]]: that is something for the regularization process to worry about.

			In this view, then, the goal of the learning algorithm is simply to solve a calculus optimization problem: we are given a function of several variables (the function being the cost function on the training data and the variables being the parameters) and we need to minimize it. Standard optimization algorithms apply. The success of an algorithm is determined by how low the cost function value is on the parameter vector it outputs.

			===Measurement of the cost function on data used for cross-validation or testing===

			This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.

			In this view, the learning algorithm and regularization choices are not cleanly separable: it is the responsibility of the learning algorithm and the regularization choices to together produce a parameter vector that performs well on the cross-validation or test data.

			This view complicates the goal of the learning algorithm: we can no longer use the cost function on the training data as a decisive measure of how good the learning algorithm has performed. The problem therefore no longer remains a calculus optimization problem for a clearly defined function, but rather, involves some consideration of the statistical process by which the data is generated.

			This approach is sometimes necessary, because some techniques to avoid overfitting, such as [[early stopping]], are techniques intrinsic to the learning algorithm and cannot be encoded into the choice of a regularization term to add to the cost function.

Vipul: Created page with "==Definition== The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regulariz..."

2014-06-18T22:22:22Z

Created page with "==Definition== The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regulariz..."

New page

==Definition==

The term '''learning algorithm''' is used to refer to the part of a machine learning problem that specifically involves optimization of the (possibly regularized) cost function using the training data. Learning algorithms may proceed as iterative algorithms, that start with an initial guess for the parameter vector and then refune that guess, or as non-iterative algorithms, that directly proceed to solve for the parameter vector.

==Evaluation of learning algorithms==

There are two ways of judging the success of a learning algorithm:

* Measurement of the cost function on the training data: This is a narrow view of the learning algorithm: the smaller the cost function on the training data, the better the learning algorithm.
* Measurement of the cost function on separate data (cross-validation or test data): This is a broader view of the learning algorithm, namely, it measures how well the algorithm does on data that was withheld from it.

These two different views of how the learning algorithm should be measured relate with different views of the relationship between the learning algorithm and the regularization choice made to avoid [[overfitting]]. One view is that the problems are cleanly separable: regularization (through the addition of a regularization term to the cost function) serves the goal of addressing overfitting, and the learning algorithm's goal is simply to minimize the regularized cost function on the training set. The alternate view is that the regularization method and the learning algorithm are intricately intertwined, and the learning algorithm is responsible not just for doing well with the regularized problem on the training data but also for doing well with the original problem on the cross-validation or test data.