Regression
Definition
Regression refers to a process in statistical estimation where we are given a dependent variable, one or more independent variables, and a class of functions, and we need to find the function in that class of functions that best describes the dependent variable in terms of the independent variables. In short, regression is the process of predicting a continuous value.
Parametric regression
The most typical form of regression is parametric regression. Here, the class of candidate functions is described by a set of parameters, i.e., to identify an individual function within that class, we need to specify numerical values for each parameter. As we vary the values of the parameters, we obtain different functions in the class of candidate functions.
Nonparametric regression
In this form of regression, the class of functions is not defined by parameters. Usually, the class of functions is infinite-dimensional.
Meaning of best fit
The goal of the regression problem is to choose a function in the class of functions that best describes the dependent variable in terms of the independent variables. This is usually done through the provision of a training set of pairs of dependent and independent variables. The idea is to choose a function in the class of function that best fits the training set (note that in machine learning problems, where generalization to new data is a key issue, we also need to avoid overfitting -- however, we ignore the issue here).
Some forms of regression explicitly specify a cost function to be used to determine how good a given choice of function is for fitting the data. However, this cost function does not arise from thin air, rather, it can itself be derived from knowledge of the nature of the error distribution function. What we're looking for is a choice of function that has maximum likelihood for the specified class of functions and error distribution, and in some cases we are able to encode this using a cost function. For instance, if the error distribution for the dependent variable is a normal distribution, then the minimizing the squared error cost function gives the function with the maximum likelihood.
Type of regression
- Linear least squares regression
- Generalized linear regression
- Nonlinear least squares regression
- Logistic regression (though this may be better viewed as a classification problem)