Artificial neuron

Definition

An artificial neuron is a function of the form:

$y=\varphi \left(\sum _{j=0}^{m}w_{j}x_{j}\right)$

where $w_{j}$ are the weights on the neuron and $\varphi$ is the activation function. Artificial neurons form components of artificial neural networks: an artificial neural network is obtained by composing and combining artificial neurons (i.e., using the outputs of some neurons as inputs for other neurons).

Generally, in machine learning problems, the topology of the artificial neural network, as well as the choice of activation function for each neuron, are fixed in advance. The values of the weights are discovered using the training set by minimizing an appropriately chosen cost function.

Bias term

Generally, the variable $x_{0}$ is always taken to be $+1$ , and called the bias term. The weight $w_{0}$ is the bias weight.

Purpose of the weights

The purpose of the weights is to combine the inputs in a way that extracts some information from all of them.

Purpose of the activation function

The purpose of the activation function is to rescale in a manner that extracts the relevant valuable information from the linear combination. In general, the activation function tends to squish the domain down to a smaller subset. The idea is that the goal of the neuron is closer to a classification problem than a problem of finding an exact magnitude, so very large values should get squished down to the same value as intermediate values.

For instance, suppose a self-driving car is trying to determine whether a particular segment of the picture frame represents paved road or a sidewalk. The degree of certainty that the picture is of paved road can be described by a probability that can range from 0 to 1. We may compute this probability using a logistic regression problem: we combine a lot of different pieces of information about the picture frame to compute a real number describing the log-odds of it being paved road, then apply the logistic function to compute the probability. Here, the logistic function plays the role of the activation function.

For an artificial neural network to have some power beyond a single artificial neuron, we must have a nonlinear activation function, because composing linear functions just gives us a linear function.

Common choices of activation function

Name of artificial neuron type	Choice of activation function	Mathematical form	More information
Linear threshold unit or McCulloch-Pitts neuron	Heaviside step function (zero if less than a threshold, one if above the threshold)	for threshold $\theta$ : 0 if $\sum _{j=0}^{m}w_{j}x_{j}<\theta$ , 1 if $\sum _{j=0}^{m}w_{j}x_{j}<\theta$ , 1/2 if $\sum _{j=0}^{m}w_{j}x_{j}=\theta$	This is not continuous at the threshold $\theta$ ; geometrically the region of discontinuity is a hyperplane. Linear threshold units are good for implementing boolean functions.
Logistic neuron	logistic function	$g\left(\sum _{j=0}^{m}w_{j}x_{j}=\theta \right)$ where $g$ is the logistic function $g(t)={\frac {1}{1+e^{-t}}}$	An artificial neural network with just one logistic neuron is equivalent to logistic regression. The continuity and in fact infinite differentiability of the logistic function makes it amenable to gradient descent / backpropagation methods. Artificial neural networks where all neurons are logistic neurons are commonly used in practice.