In this article we will discuss about the process of learning neural network with and without a teacher.

Learning Neural Network with a Teacher:

We now turn our attention to learning paradigms. We begin by considering learning with a teacher, which is also referred to as supervised learning, Fig. 11.23., shows a block diagram which illustrates this form of learning, in conceptual terms, we may think of the teacher as having knowledge of the environment; represented by a set of input-output examples.

The environment is, however, unknown to the neural network of interest. Suppose that the teacher and the neural network are both exposed to a training vector (i.e., example drawn from the environment). By virtue of built-in (prior) knowledge, the teacher is able to provide the neural network with a desired response for that training vector. Indeed, the desired response represents the optimum action to be performed by the neural network. The network parameters are adjusted under the combined influence of the training vector and the error signal.

ADVERTISEMENTS:

The error signal is defined as the difference between the desired response and the actual response of the network. This adjustment is carried out iteratively in a step-by-step fashion with the aim of eventually making the neural network emulate the teacher, the emulation is presumed to be optimum in some statistical sense. In this way knowledge of the environment available to the teacher is transferred to the neural network through training as fully as possible. When this condition is reached, we may then dispense with the teacher and let the neural network deal with the environment completely by itself.

The form of supervised learning we have just described is the error-correction learning. It is a closed-loop feedback system, but the unknown environment is not in the loop. As a performance measure for the system we may think in terms of the mean-square error or the sum of squared errors over the training sample, defined as a function of the free parameters of the system. This function may be visualised as a multidimensional error-performance surface or simply error surface, with the free parameters as coordinates.

The true error surface is averaged over all possible input- output examples. Any given operation of the system under the teacher’s supervision is represented as a point on the error surface. For the system to improve performance over time and therefore learn from the teacher, the operating point has to move down successively toward a minimum point of the error surface; the minimum point may be a local minimum or a global minimum.

A supervised learning system is able to do this with the useful information it has about the gradient of the error surface corresponding to the current behavior of the system. The gradient of an error surface at any point is a vector which points in the direction of steepest descent. In fact, in the case of supervised learning from examples, the system may use an instantaneous estimate of the gradient vector, with the example indices presumed to be those of time.

ADVERTISEMENTS:

The use of such an estimate of results in a motion of the operating point on the error surface which is typically in the form of a ‘random walk’. Nevertheless, given an algorithm designed to minimise the cost function, an adequate set of input-output examples, and enough time permitted to do the training, a supervised learning system is usually able to perform such tasks as pattern classification and function approximation.

Learning Neural Network without a Teacher:

In supervised learning, the learning process takes place under the tutelage of a teacher. However, in the paradigm known as learning without a teacher; as the name implies, there is no teacher to oversee the learning process. That is to say, there are no labeled examples of the function to be learned by the network.

Under this second paradigm, two subdivisions are identified:

A. Reinforcement Learning/Neurodynamic Programming:

ADVERTISEMENTS:

In reinforcement learning, the learning of an input-output mapping is performed through continued interaction with the environment in order to minimise a scalar index of performance, (granting punishment and reward).

Barto (1985) describes a network which learns as follows:

i. The network is presented with a sample input from the training set.

ii. The network computes what it thinks should be the sample output.

ADVERTISEMENTS:

iii. The network is supplied with a real valued judgement by the teacher.

iv. The network adjusts its weights and the process repeats.

A positive value in step 3 indicates good performance, while a negative value indicates bad performance. The network seeks a set of weights which will prevent negative reinforcement in future.

Fig. 11.24., shows the block diagram of one form of a reinforcement learning system built around a critic which (present learning agent converts a primary reinforcement signal received from the environment into a higher quality reinforcement signal called the heuristic reinforcement signal, both of which are scalar inputs.

The system is designed to learn under delayed reinforcement, which means that the system observes a temporal sequence of stimuli (i.e., state vectors) also received from the environment, which eventually result in the generation of the heuristic reinforcement signal. The goal of learning is to minimise a cost-to-go function, defined as the expectation of the cumulative cost of actions taken over a sequence of steps instead of simply the immediate cost.

It may turn out that certain actions taken earlier in that sequence of time steps are in fact the best determinants of overall system behaviour. The function of the learning machine (learning system) which constitutes the second component of the system, is to discover these actions and to feed them back to the environment.

Delayed-reinforcement learning is difficult to perform for two basic reasons:

I. There is no teacher to provide a desired response at each step of the learning process.

II. The delay incurred in the generation of the primary reinforcement signal implies that the learning machine must solve a temporal cred it assignment problem. By this we mean that the learning machine must be able to assign credit and blame individually to each action in the sequence of time steps which led to the final outcome, while the primary reinforcement may only evaluate the outcome.

Not with standing these difficulties, delayed-reinforcement learning is very appealing. It provides the basis for the system to interact with its environment, thereby developing the ability to learn to perform a prescribed task solely on the basis of the outcomes of its experience which result from the interaction.

Reinforcement learning is closely related to dynamic programming, which was developed by Bellman (1957) in the context of optimal control theory. Dynamic programming provides the mathematical formalism for sequential decision-making. By casting reinforcement learning within the framework of dynamic programming, the subject matter becomes all the more richer.

B. Unsupervised Learning:

In unsupervised or self-organised learning there is no external teacher or critic to over-see the learning process, as indicated in Fig. 11.25. Rather, provision is made for a task-independent measure of the quality of representation which the network is required to learn, and the free parameters of the network are optimised with respect to that measure.

Once the network has become tuned to the statistical regularities of the input data, it develops the ability to form internal representations for encoding features of the input and thereby to create new classes automatically.

To perform unsupervised learning we may use a competitive learning rule. For example may use a neural network which consists of two layers-an input layer and a competitive layer. The input layer receives the available data.

The competitive layer consists of neurons which compete with each other (in accordance with a learning rule) for the ‘opportunity’ to respond to features contained in the input data. In its simplest form, the network operates in accordance with a “winner-takes-all” strategy. In such a strategy the neuron with the greatest total input ‘wins’ the competition and turns on; all the other neurons then switch off.

A simple competitive learning algorithm is the following:

i. Present an input vector.

ii. Calculate the initial activation for each output unit.

iii. Let the output units fight until one is active.

iv. Increase the weights on connections between the active output unit and active input units. This makes it more likely that the output unit will be active next time the pattern is repeated.

A problem with this algorithm is that one output unit may learn to be active all the time – it may claim all the space of inputs for its. For example, if all the weights on a unit’s inputs are large, it will tend to bully the other output units into submission. Learning will only further increase the weights. A solution is to ration the weights. The sum of the weights on a unit’s input lines is limited to 1. Increasing the weight of one connection requires that the weight of some other connection be decreased.

The competitive learning algorithm works well in many cases, but it has some problems. Sometimes, one output unit will always be one, despite the existence of more than one cluster of input vectors. If two clusters are close together, it may oscillate between the two clusters. Normally, another output unit will win occasionally and move to claim one of the two clusters.

However, if the other output units are completely unexcitable by the input vectors, they may never win the competition. One solution, called ‘Leaky Learning’ is to change the weights belonging to relatively inactive output units as well as the most active one. An alternative solution is to adjust the sensitivity of an output unit through the use of a bias or adjustable threshold.

Difference between Supervised and Unsupervised Learning:

In supervised learning, a teacher is available to indicate whether a system is performing correctly, or to indicate a desired response, or to validate the acceptability of a system’s responses, or to indicate the amount of error in system performance. This is in contrast with unsupervised learning.

Where no teacher is available and learning must rely on guidance obtained heuristically by the system examining different sample data or the environment. A concrete example of supervised learning, is provided by ‘classification’ problems, whereas ‘clustering’ provides an example of unsupervised learning.