In this article we will discuss about the reasoning system with uncertain knowledge.

Introduction to Bayesian Network:

The full joint probability distribution can answer any question about the domain, but can become intractably large as the number of variables grows. Moreover, specifying probabilities for atomic events is rather unnatural and can be very difficult unless a large amount of data is available from which the statistical estimates can be formulated.

Furthermore, it has been known that independence and conditional independence relationships among variables can greatly reduce the number of probabilities which need to be specified in order to define the full joint distribution. A better method to represent dependencies among variables and specify more concisely any full joint probability distribution is another data structure, called Bayesian network.

This is another method of reducing the complexity of certainty factors. It is a directed graph in which each node is annotated with quantitative probability information.

ADVERTISEMENTS:

The full specification is as follows:

1. A set of random variables makes up the nodes of the network. Variable may be discrete or continuous.

2. A set of directed links or arrows connects pairs of nodes. If there is an arrow from node x to node y, x is said to be parent of y.

3. Each node Xi has a conditional probability distribution P(Xi | parents (X)) which quantifies the effect of parents on the node.

ADVERTISEMENTS:

4. The graph has no directed cycles (and hence is directed, a cyclic graph or DAG.

The topology of the network—the set of nodes and links, specifies the conditional independence relationships which hold in the domain. The intuitive meaning of an arrow in a properly constructed network is usually that X has a direct influence on Y.

It is usually easy for a domain expert to decide which direct influence exists in the domain much easier than actually specifying the probabilities themselves. After the topology of the Bayesian network is laid down we need only specify a conditional probability distribution for each variable, given its parents. Then, the combination of the topology and the conditional distributions suffices to specify (implicitly) the full joint distribution for all the variables.

Lets recall the world consisting of simple variables Toothache, Cavity, Catch and Weather. Assuming that weather has no influence on toothache, weather is independent of the other variables; toothache and catch are conditionally independent, given a cavity.

ADVERTISEMENTS:

These relationships are represented by the Bayesian network data structure in (Fig. 7.5). Independence of toothache and catch is indicated by the absence of a link between toothache and catch. Intuitively, the network represents the fact that cavity is a direct cause of Toothache and Catch, whereas no direct casual relationship exists between toothache and catch.

Now consider a more complex example:

ADVERTISEMENTS:

You have just installed a burgular alarm at home; because you don’t live in that house quite often as you own a business in USA. The alarm is quite reliable at detecting a burgulary. It also responds to earthquakes, which are not uncommon in your area. Your two neighbours Arisha and Bobby have promised with you to ring up you when they hear alarm.

Arisha confuses the telephone ring with the alarm and even then calls you. Bobby likes music and keeps it on high pitch and thus may confuse the burgulary alarm with the music. We want to estimate the probability of a burgulary when there was an evidence of calling that is who has or has not called. The Bayesian network is shown in Fig. 7.6.

For the time being ignore conditional distributions in the Fig. 7.6 and concentrate on topology of the network. In the case of burgulary network, the topology shows that burgulary and earthquake directly affect the probability of alarm ringing, but whether Arisha and Bobby calls depends only on the alarm?

ADVERTISEMENTS:

The network thus represents our assumptions that they do not perceive any burgularies directly, minor earthquakes are not detected and Arisha and Bobby do not consult each other before calling you. Also there are no nodes regarding Bobby’s listening to loud music or to confusion in the mind of Arisha due to ringing of phone.

These factors are summarised in the uncertainty associated with the links from alarm to Arisha calls and alarm to Bobby calls. This shows both laziness and ignorance. The probabilities actually summarise a potentially infinite set of circumstances in which the alarm might fail to ring (high humidity, power failure, dead battery, cut wires, a dead mouse stuck inside the bell etc.) or Arisha or Bobby might fail to call and report being out to lunch, on vacation, temporarily deaf, passing helicopter, etc.). In brief, the system (agent) deals with large real world problems with uncertainties in the knowledge base. The degree of approximation can be improved upon by adding more relevant information.

Now let us turn to the conditional distributions shown in Fig. 7.6. In the figure, each distribution is shown as a Conditional Probability Table (CPT). This type of table is useful for discrete variables. Each row in a CPT contains the conditional probability of each node value for a conditioning case. A conditioning case is just a possible combination of values for the parent nodes. Each row must sum to 1; because the entries represent an exhaustive set of cases for the variable.

For Boolean variables once we know the probability of a true value is P, the probability of false must be 1-P, so we often omit the second number. In general, a table for a Boolean variable with K Boolean parents contains 2k independently specificable probabilities. A node with no parents has only one row representing the prior probabilities of each possible value of variable.

Semantics of Bayesian Networks:

There are two ways in which we can understand Semantics of Bayesian networks:

1. See the network as representation of the joint probability distribution. This is useful in understanding how to construct networks.

2. See the networks as an encoding of a collection of conditional independence statements. This is useful in designing inference procedures. However, the two ways are equivalent.

Representing the Full Joint Distribution:

We explain it, by calculating the probability that the alarm has sounded, but neither the burgulary nor an earthquake has occurred, and both Arisha and Bobby telephone you.

We can generalise the example by writing:

where parents (x) denotes the specific values of the variables in the parents (x).

If the Bayesian network is a representation of the joint distribution then it too can be used to answer any query, as earlier in the case of inference through probabilities. This does it really so and that too more efficiently. An extension of Baylsian net work is called Decision Net Work or Influence Diagram.