Here is a term paper on ‘Machine Learning’. Find paragraphs, long and short term papers on ‘Machine Learning’ especially written for school and college students.

Term Paper on Machine Learning


Term Paper # 1. Model of Machine Learning Agent:

The field of machine learning has enjoyed a period of continuous growth and progress over past two decades. We would study the principle involved in learning with the help of a learning agent.

It has following four components (Fig. 9.1).

1. Performance Element:

It is responsible for selecting external actions. It takes perception and decides on actions. Perception involves interpreting sights, sounds, smell and touch. Action includes the ability to navigate through the world and manipulate objects.

2. Learning Element:

Reflects the efficiency of the performance element. It is responsible for making improvement in the working of the agent as a whole. It takes some knowledge about the learning agent and some feedback on how the agent is doing to determine how the performance element should be modified for the agent to do better in the future.

ADVERTISEMENTS:

Its design depends very much on the design of the performance element. While designing an agent which should have a learning capability, the first question is not “How am I going to get it to learn this?”, but what kind of performance element will my agent need to do this, once it has learned ‘how’.

3. Critic:

The critic is designed to tell the learning element how well the agent is doing. The critic employs a fixed standard of performance. This is necessary because the percepts themselves provide no indication of agent’s success.

For example, a chess program may receive a percept indicating that it has checkmated its opponent, but it needs a performance standard to know that this is a good standard; the percept itself does not say so. It is important that the performance standard is a fixed measure which is conceptually outside the agent; otherwise the agent could adjust its performance standards to meet its behaviour.

ADVERTISEMENTS:

4. Problem Generator:

It is responsible for suggesting actions which will lead to new and informative experiences. The point is that if the performance element had its way, it would keep doing actions which are best given what it knows.

But if the agent is willing to explore a little and do some perhaps suboptimal actions in the short run, it might discover much better actions for the long run. The problem generator’s job is to suggest these explanatory actions. This is what scientists do when they carry out experiments.

Thus, a learning agent can be viewed as something which is perceiving its environment through sensors and acting upon the environment through effectors. (For example, a human agent has eyes, ears and other organs for sensors and hands, legs, mouth and other body parts for effectors. A robotic agent has cameras and infrared range finders as sensors and various motors for effectors).

ADVERTISEMENTS:

All the four components of the agent are critically important though the most important is the design of learning element.

The design of the learning element is affected by four major parameters:

(a) Which components of the performance element are to be improved?

(b) What representation is used for those components?

ADVERTISEMENTS:

(c) What feedback is available?

(d) What prior information is available?

These parameters are explained below:

Choice of the Components:

(a) In order to meet the function of the performance element its components should include the following characteristics:

1. A direct mapping from conditions on the current state to actions.

2. A means to infer relevant properties of the world from the percept sequence (anything the agent has perceived so far).

3. Information about the way the world grows.

4. Information about the results of possible actions the agent can take.

5. Utility (the quality of being useful) information indicating the desirability of adding more and more world states (preferably control states) to make the system search more judiciously.

6. Action-value information indicating the desirability of world states.

7. Goals which identify classes of states whose achievement maximises the agent’s utility.

Each of these components help in learning, given the appropriate feedback. For example, if the agent does an action and then perceives the resulting state of the environment, this information can be used to learn a description of the results of actions (4). Similarly if the critic can use the performance standard to deduce utility values from the percepts, then the agent can learn a useful representation of its utility function (5). In a sense the performance standard can be seen as defining a set of distinguished percepts which will be interpreted as providing direct feedback on the quality of the agent’s behaviour.

(b) Representation of the Components:

Any of the above components can be represented using standard methods of representation. For example, deterministic descriptions can be linear weighted polynomials for utility functions in game-planning programs, propositional and first-order logical representation for all the components in a logical agent or probabilistic description such as belief networks for the inferential components of a decision theoretic agent. Effective learning algorithms have been devised for all of these, the algorithms are different but the main idea remains the same.

(c) Available Feedback:

For some components, such as those for predicting the outcome of an action, the available feedback generally tells the agent what correct outcome is. That is the agent predicts that a certain action (for example, applying brake of a car) will have a certain outcome (stopping of the car in 10 metres) and the environment immediately provides a percept which describes the actual correct outcome (stopping in 10 ± 2 metres). Any situation in which both the inputs and outputs of a component can be perceived is called supervised learning.

A special type of supervised learning arises when in the condition-action component, the agent receives some evaluation of its action but is not told to correct the action this is called a reinforcement learning.

In this type of learning, we study how agents can learn what to do, particularly when these is no teacher telling the agent what action to take in each circumstance. For example, we know an agent can learn to play chess by supervised learning—by being fed examples of game situations along with the best moves for these situations.

But if there is no friendly teacher providing examples, what can the agent do? In that case by trying random moves the agent can eventually build a predictive model of its environment but without some feedback about what is good and what is bad, the agent needs to know that something good has happened when it wins and that something bad has happened when it loses.

This kind of feedback is called a reward or reinforcement learning. In games like chess, the reinforcement is received only at the end of the game. In other environments the rewards come more frequent. For example, in ping-pong, each point scored can be considered a reward and in learning to crawl any forward motion is an achievement.

The task of reinforcement learning is to use observed rewards to learn an optimal (a nearly so) policy for the environment. Imagine playing a new game whose rules we don’t know; after a hundred or so moves, our opponent announces “you lose”. This is reinforcement learning in a nutshell.

In many complex domains, reinforcement learning is the only feasible way to train a program to perform at high levels. For example, in game playing, it is very hard for a human to provide accurate and consistent evaluations of large number of positions, which would be needed to train an evaluation function directly from examples. Instead, the program can be told when it has won or lost, and it can use this information to learn an evaluation function which gives reasonably accurate estimates of the probability of winning from the given position.

Reinforcement learning can be considered to encompass all of AI: an agent is placed in an environment and must learn to behave successfully therein.

Learning when there is no hint at all about the correct outputs is called unsupervised learning. An unsupervised learner can always learn relationships among its percepts using supervised learning methods, that is it can learn to predict its future percepts given its previous percepts. It cannot learn what to do unless it already has a utility function.

(d) Prior Knowledge:

The majority of learning research in AI, computer science, and psychology has studied the case in which the agent begins with no knowledge at all about what it is trying to learn. It only has access to the examples presented by its experience. This is an important special case, it is by no means the general case. Most human learning takes place in the context of a good deal of background knowledge.

Some psychologists and linguists claim that even newborn babies exhibit knowledge of the world. Whatever the truth of this claim, there is no doubt that prior knowledge can help enormously in learning. A physicist examining a stack of bubble- chamber photographs may be able to induce a theory predicting the existence of a new particle of a certain mass and charge, but an art critic examining the same stack might learn nothing more than that the artist must be some sort of abstract expressionist.

Each of the seven components of the performance element can be described mathematically as a function; we can choose which component of the performance element to improve and how it is to be represented. The available feedback may be more or less useful, and we may or may not have any prior knowledge.

Thus, we arrive at one conclusion that learning is itself a problem solving process. In fact, it is very different to formulate a precise definition of learning which distinguishes it from other problem solving methods but this is sure that the discussion on machine learning amounts to knowledge acquisition and the deployment of some suitable search technique to use the knowledge so that it is a better future machine. In short, all learning can be thought of as learning the representation of a function.

Term Paper # 2. Machine Learning Decision List:

A decision list is a logical expression of a restricted form. It consists of a series of tests, each of which is a conjunction of literals. If a test succeeds when applied to an example description, the decision list specifies the value to be returned. If the test fails, processing continues with the next test in the list. Decision lists resemble decision trees, but they are simpler in structure.

Fig. 9.7 shows a decision list, which represents the following hypothesis (the goal predicate remains the same).

ᗄx Will Wait (x) ↔ Patrons (x, some) Patrons (x, Full) ˄ Sun/Hol(x)

If we allow tests of arbitrary size, the decision lists can represent any Boolean function but when the size of the test is restricted to at most K literals (hence logical expression of restricted form) then it becomes possible for the learning algorithm to generalise successfully from a small number of examples.

This language is called K-DL; the above example in the Fig. 9.7, in 2-DL. It can be shown that K-DL includes, as a subset the language K-DT, the set of all decision trees of depth K, at the most. Particular language referred to by K-DL depends on the attributes used to describe the examples, say K-DL(n) denotes a K-DL language using n Boolean attributes.

How many examples does the decision list need?

First, we want to ensure that K-DL is learnable, that is, any function in K-DL, can be approximated accurately after training on a reasonable number of examples. For this, we need to calculate the number of hypothesis in the language.

Let the language of tests-conjunctions of at most k literals using n attributes-be conj(n, k). Because a decision list is made of tests, and because each test can be attached to either yes or no outcome or can be absent from the decision list, there are almost 3|conj(n, k)|  distinct sets of component tests.

Omitting the calculations we arrive at an expression for the number of examples (m) in the training set:

where ∈ is a small constant, (at most error) with the probability at least (1 – δ).

A successful strategy is to find the smallest test which matches any uniformly classified subset, regardless of the size of the subset.

Computational learning theory has generated a new way of looking at the problem of learning. In the early 1960s, the theory of learning focussed on the problem of identification in the limit. An identification algorithm must return a hypothesis which exactly matches the true function.

The modern approach combines the current- best hypothesis and version space methods. As examples arrive, the learner abandons simpler hypothesis as they become inconsistent. Once it reaches the true function, it will never abandon it. Unfortunately, in many hypothesis spaces, the number of examples and the computation time required to reach the true function is enormous.

Computational learning theory does not insist that the learning agent should find one true law governing its environment, but instead that it finds hypothesis with a certain degree of predictive accuracy. It also brings in the need of trade off between the expressiveness of the hypothesis language and the complexity of learning.

The results of the research in machine learning show that pure inductive learning, without the prior knowledge about the target function is quite tough. The use of prior knowledge to guide inductive learning makes it possible to learn quite large sets of sentences from reasonable number of examples, even in the language as expressive as first order logic.

When the concept space is very large, decision tree learning algorithms run more quickly than their version space cousins.

Term Paper # 3. Role of Knowledge in Machine Learning:

To understand the role of prior knowledge we need to know the logical relationships among hypotheses, example descriptions and classification. Let Descriptions denote the conjunction of all the example description in the training set, and let classifications denote the conjunctions of all the example classifications. Then a hypothesis which explains the observations must satisfy the property

Hypothesis ˄ Descriptions ╞ Classifications .… (9.1)

This relationship is called entailment constraint. Pure inductive learning means solving this constraint, where hypothesis which is unknown and is drawn from some predefined hypothesis space.

The simple knowledge-free picture of inductive learning persisted until the early 1980’s. The modern approach is to design agents which already know some thing and are trying to learn some more. The general idea is depicted in the Fig. 9.11. A cumulative learning process uses and adds to its stock of the background knowledge overtime.

In case, the learning agent is autonomous which uses background knowledge then this agent must have some method for obtaining the background knowledge, which can be used in learning new vistas of knowledge. The learning agent’s life history will therefore be characterised by cumulative or incremental development. Presumably, the agent could start out with nothing, performing inductions in vacuo like a good little pure induction program. But once it has tasted the knowledge, it can use its background to learn more and more effectively. How does the agent do that?

The type of learning making use of the previous knowledge is called explanation based learning. The general rule follows logically from the background knowledge possessed by the agent (explained shortly).

Hence the entailment constraints satisfied by explanation based learning (EBL) are:

Hypothesis ˄ Description ╞ classification

Background ╞ Hypothesis … (9.2)

Since explanation based learning (EBL) makes use of equation 9.1 so it EBL was initially thought to be a better way to learn from examples. But it requires sufficient background knowledge to explain the Hypothesis, which in turn explains the observations. Thus, the agent does not learn anything factually new from the example.

The agent could have derived the example from what it already knew, but that might have required an unreasonable amount of computation. EBL is now viewed as a method for converting first order logic knowledge into useful special purpose knowledge. We now describe the working of Explanation Based Learning (EBL).

For example, consider a chess player who as BLACK has reached the position shown in Fig. 9.12. The position is called a “fork” because the white knight attacks both the black king and the black queen. BLACK must move the king thereby leaving the queen open to capture.

From this single experience, BLACK is able to learn quite a bit about the fork trap: the idea is that if any piece x attacks both the opponent’s king and another piece y then the piece y will be lost. We do not need to see dozens of positive and negative examples of fork positions in order to draw these conclusions. From just one experience we can learn to avoid this trap in future and perhaps to use to our own advantage.

What makes such a single example learning possible? The answer is knowledge. The chess player has plenty of domain specific knowledge which can be brought to bear, including the rules of chess and any previously acquired strategies. That knowledge can be used to identify the critical aspects of the training example. In the case of fork we know that the double simultaneous attack is important while the precise position and the type of attacking piece is not.

Much of the recent work in machine learning has moved away from empirical data-intensive approach  towards this more analytical knowledge-intensive approach. This kind of generalisation process in learning was called Explanation-Based-Learning (EBL).

We may note that the general rule follows logically from the knowledge possessed by any chess player facing the fork problem. An EBL system attempts to learn from a single example x by explaining why x is an example of the target concept (predicate) instead of learning through a large number of examples. The explanation is then generalised and the system’s performance is improved through the availability of this background knowledge. Hence the entailment condition of inductive learning, in EBL becomes as given in equation 9.1.

The process is known as memorisation, a phenomenon so common in computer science. The basic idea of memorisation (or memo-function) is to accumulate a data base of input-output pairs; when the function is called, it first checks the database to see whether it can avoid solving the problem from abnitio.

EBL takes this further to a great deal, by creating general rules, depending on the problem to be solved, which cover an entire class of cases. For example, in the case of differentiation memorisation would remember, that the derivative of u2 with respect to u is 2u, and would generalise that for any arithmetic unknown function; the derivative of u2 is 2u.

Logically, Arithmetic Unknown (u) → Derivative (u2, u) = 2u.

If the knowledge base contains such a rule, then any new case which is an instance (example) of this rule can be solved immediately. This is, of course a simple example of a very general phenomenon: once something is understood it is generalised and reused in other circumstances. This essential step then can be used as a building block in solving similar but complex problems.

The EBL works through the following steps:

1. Given an example, (say of fork or of differentiation), construct a proof such that the goal predicate applies to the example using the available background knowledge.

2. In parallel, construct a generalised proof tree for the variablised goal using the same inference steps as in the original proof.

3. Construct a new rule whose left side consists of the leaves of the proof free and whose right hand side is the variablised goal.

4. Drop any condition(s) which are true regardless of the values of the variables in the goal.

Improving the Efficiency:

EBL works basically in two steps; explain and generalise. Sometimes, a rule can be generalised in a number of ways, which reflects the efficiency of the algorithm.

Three factors contribute to efficiency:

1. Adding large numbers of rules can slow down the reasoning process, because the inference mechanism must still check those rules even in cases where they do not yield a solution. In other words, it increases the branching factor in the search space.

2. To compensate for the slow-down in reasoning the derived rules must offer significant increase in speed for the cases which they cover. This increase arises mainly because the derived rules avoid dead ends which would otherwise be taken.

3. Derived rules should be as general as possible, so that they apply to the largest possible set of cases.

A common approach to ensuring that derived rules are efficient is to insist on the operationality of each sub-goal in the rule. A sub-goal is operational if it is easy to solve. Unfortunately, there is unusually a tradeoff between operationality and generality.

More specific sub-goals are easier to solve but cover fewer cases. Also operationality is a matter of degree, one or two steps is definitely operational, but may not hold good for 10 or 100 steps. Finally, the cost of solving a given sub-goal depends on what other rules are available in the knowledge base.

It can go up or down as more rules are added. Thus, EBL systems face a really complex optimisation problem in trying to maximise the efficiency of the given initial knowledge base. It is, sometime, possible to derive a mathematical model of the effect on overall efficiency of adding a given rule and to use this model to select the best rule to add.

When the recursive rules are involved the analysis can become very complicated. One promising approach can be to address the problem of efficiency empirically, simply by adding several rules and then identifying which are useful and actually would speed up the process.

Thus, by generalising from the past examples EBL makes the knowledge base more efficient for the kind of problems to which it can be applied. So if the EBL system is carefully designed it is possible to obtain significant speedup.

For example, reasonably large prolog-based natural language system designed for speech-to-speech translation between Swedish and English was able to achieve real-time performance only by the application of EBL to the parsing processes.