Reasoning with Certainty Factors | Artificial Intelligence

Bayesian reasoning assumes information is available regarding the statistical probabilities of certain events occurring. This makes it difficult to operate in many domains. Certainty factors are a compromise on pure Bayesian reasoning.

The approach has been used successfully, most notably in the MYCIN expert system. MYCIN is a medical diagnostic system which diagnoses bacterial infections of the blood and prescribes drugs for treatment. Here we present its uses as an example of probabilistic reasoning. Its knowledge is represented in rule form and each rule has an associated certainty factor.

For example, a MYCIN rule looks something like this if:

(a) The gram stain of the organism is gram negative,

ADVERTISEMENTS:

(b) The morphology of the organism is rod, and

(c) The aerobicity of the organism is anaerobic then there is suggestive evidence (0.5) that identity of the organism is Bacteroides.

If:

ADVERTISEMENTS:

(a) The stain of the organism is gram-positive,

(b) The morphology of the organism is coccus,

(c) The growth conformation of the organism is clumps, then there is a suggestive evidence (0.7) that the identity of the organism is staphylococcus.

This knowledge in the form of rules is represented internally in easy-to- manipulate LISP list structure:

ADVERTISEMENTS:

Premise:

($AND (SAME CNTXT GRAM GRAMPUS)

(SAME CNTXT MORPH COCCUS)

(SAME CNTXT CONFORM CLUMPS)

ADVERTISEMENTS:

Action:

(CONCLUDE CNTXT IDENT STAPHYLO COCCUS TALLY 0.7)

The interpretation can be postponed till the reader becomes familiar with LISP.

MYCIN uses these rules to reason backward to the clinical data available from its goal of finding significant disease-causing organisms. Once it finds the identities of such organisms, it then attempts to select a therapy by which the disease(s) may be treated.

ADVERTISEMENTS:

In order to understand how MYCIN exploits uncertain information, we need answers to two questions:

“What do certainty factors mean?” and “How does MYCIN combine the estimates of uncertainty in each of its rules to produce a final estimate of the certainty of its conclusions?” A further question that we need to answer, given our observations about the intractability of pure Bayesian reasoning is, “What compromises does the MYCIN technique make and what risks are associated with these compromises?” We answer all these questions now,

A certainty factor (CF [h, e]) is defined in terms of two components:

1. MB [h, e] − a measure (between 0 and 1) of belief in hypothesis h given the evidence e. MB measures the extent to which the evidence supports the hypothesis. It is zero if the evidence fails to support the hypothesis.

2. MD [h, e] − a measure (between 0 and 1) of disbelief in hypothesis h given the evidence e. MD measures the extent to which the evidence supports the negation of the hypothesis. It is zero if the evidence supports the hypothesis.

From these two measures, we can define the certainty factor as:

CF [h, e] = MB[h, e] – MD[h, e]

Since any particular piece of evidence either supports or denies a hypothesis − (but not both), and since each MYCIN rule corresponds to one piece of evidence (although it may be a compound piece of evidence), a single number suffices for each rule to define both the MB and MD and thus the CF.

The CF’s of MYCIN’S rules are provided by the experts who write the rules. They reflect the expert’s assessments of the strength of the evidence in support of the hypothesis. As MYCIN reasons, however, these CF’s need to be combined to reflect the operation of multiple pieces of evidence and multiple rules applied to a problem. Fig. 7.4, illustrates three combination scenarios which we need to consider.

In Fig. 7.4(a), several rules all provide evidence which relates to a single hypothesis. In Fig. 7.4(b), we need to consider our belief in a collection of several propositions taken together. In Fig. 7.4 (c) output of one rule provides the input to another.

What formulas should be used to perform these combinations?

Before we answer that question, we need first to describe some properties which we would like the combining functions to satisfy:

1. Since the order in which evidence is collected is arbitrary, the combining functions should be commutative and associative.

2. Until certainty is reached, additional confirming evidence should increase MB (and similarly for disconfirming evidence and MD).

3. If uncertain inferences are chained together, then the result should be less certain than either of the inferences alone.

Having accepted the desirability of these properties, let’s first consider the scenario in Fig. 7.4 (a), in which several pieces of evidence are combined to determine the CF of one hypothesis.

The measures of belief and disbelief of a hypothesis given, two observations S₁ and S₂ are computed from:

One way to state these formulas in English is that the measure of belief in h is 0 if h is disbelieved with certainty. Otherwise, the measure of belief in h given two observations, is the measure of belief given only one observation plus some increment for the second observation. This increment is computed by first taking the difference between 1 (certainty) and the belief given only the first observation.

This difference is the most which can be added by the second observation. The difference is then scaled by the belief in h given only the second observation. A corresponding explanation can be given, then, for the formula for computing disbelief. From MB and MD, CF can be computed. However, if several sources of corroborating evidence are pooled, the absolute value of CF will increase. If conflicting evidence is introduced, the absolute value of CF will decrease.

Example:

Suppose we make an initial observation corresponding to Fig. 7.4 (a) which confirms our belief in h with MB = 0.3. Then MD [h, s₁] = 0 and CF [h, s₁] = 0.3 Now we make a second observation, which also confirms h, with MB [h, s₂] = 0.2.

Now:

We can see from this example how slight confirmatory evidence can accumulate to produce increasingly larger certainty factors.

Next let’s consider the scenario of Fig. 7.4(b), in which we need to compute the certainty factor of a combination of hypotheses. In particular, this is necessary when we need to know the certainty factor of a rule antecedent which contains several clauses (as, for example, in the staphylococcus rule given above). The combination certainty factor can be computed from its MB and MD.

MYCIN uses for the MB the conjunction and the disjunction of two hypotheses as given below:

MD can be computed analogously.

Finally, we need to consider the scenario in Fig. 7.4(c), in which rules are chained together with the result that the uncertain outcome of one rule provides the input to another. The solution to this problem will also handle the case in which we must assign a measure of uncertainty to initial inputs.

This could/o r example, happen in situations where the evidence is the outcome of an experiment or a laboratory test whose results are not completely accurate. In such a case, the certainty factor of the hypothesis must take into account both the strength with which the evidence suggests the hypothesis and the level of confidence in the evidence.

MYCIN provides a chaining rule which is defined as follows. Let MB'[h, s] be the measure of belief in h given that we are absolutely sure of the validity of s. Let e be the evidence which led us to believe in s (for example, the actual reading of the laboratory instruments or the results of applying other rules).

Then:

Since initial CFs in MYCIN are estimates which are given by experts who write the rules, it is not really necessary to state a more precise definition of what a CF means than the one we have already given.

The original work did, however, provide one by defining MB (which can be thought of as-a proportionate decrease in disbelief in h as a result of e) as:

In turns out that these definitions are incompatible with a Bayesian view of conditional probability. Small changes to them, however, make them compatible.

In particular, we can redefine MB as:

The definition of MD should also be changed similarly.

With these are interpretation, there ceases to be any fundamental conflict between MYCIN’s techniques and those suggested by Bayesian statistics. The pure bayesian statistics usually leads to intractable systems. In MYCIN rules CF represents the contribution of an individual rule to MYCIN’s belief in hypothesis. In a way this represents a conditional probability.

But in Bayesian P (H|E) describes the conditional probability of H given under the only evidence available, E (and joint probabilities in case of more evidences). Thus, Bayesian statistics is easier but not without a fallacy. But MYCIN does work with a greater precision.

The MYCIN formulas for all three combination scenarios of Fig. 7.4, make the assumption that all rules are independent. The burden of guaranteeing independence (at least to the extent that it matters) is on the rule writer. Each of the combination scenarios is vulnerable when this independence assumption is violated.

This can be analysed by reconsidering the scenario in Fig. 7.4(a). Our example rule has three antecedents with a single CF rather than three separate rules; this makes the combination rules unnecessary. The rule writer did this because the three antecedents are not independent.

To see how much difference MYCIN’s independence assumption can make, suppose for a moment that we had instead three separate rules and that the CF of each was 0.6. This could happen and still be consistent with the combined CF of 0.7 if the three conditions overlap substantially. Combining uncertain rules.

If we apply the MYCIN combination formula to the three separate rules, we get:

This is a substantially different result than true value, as expressed by the expert, of 0.7.

Now let’s consider what happens when independence of assumptions is violated in the scenario of Fig. 7.4(c).

Let’s consider a concrete example (of a home-lawn) in which:

S: sprinkler was left on last night

W: grass is wet in the morning

R: it rained last night

We can write MYCIN-style rules which describe predictive relationships among these three events:

R₁: If the sprinkler was left on last night

then there is suggestive evidence (0.9) that

the grass will be wet this morning

Taken alone, R₁ may accurately describe the world. But now consider a second rule:

R₂: If the grass is wet this morning

then there is suggestive evidence (0.8) that it rained last right

Taken alone, R₂ makes sense when rain is the most common source of water on the grass. But, if the two rules, are applied together, using MYCIN’s rule for chaining,

we get

MB [W, S] = 0.8 [sprinkler suggests wet)

MB [R, W] = 0.8. 0.9 = 0.72 [wet suggests rains]

In other words, we believe that it rained because we believe the sprinkler was left on. We get this despite the fact that if the sprinkler is known to have been left on and to be the cause of the grass being wet, then there is actually almost no evidence for rain (because the wet grass has been explained due to sprinkler).

One of the major advantages of the modularity of the MYCIN rule system is that it allows us to consider individual antecedent/consequent relationships independently of others. In particular, it lets us talk about the implications of a proposition without going back and considering the evidence that supported it.

Unfortunately, this example shows that there is a danger in this approach whenever the justifications of a belief are important to determining its consequences. In this case, we need to know why we believe the grass is wet (because we observed it to be wet as opposed to because we know the sprinkler was on) in order to determine whether the wet grass is evidence for it having just rained.

A word of caution; this example illustrates a specific rule structure which almost always causes trouble and should be avoided. Our rule R, describes a causal relationship between wetness and sprinkler (sprinkler causes wet grass). The rule R₁ although, looks the same, actually describes an inverse causality relationship (wet grass is caused by rain and thus is evidence for its cause).

We can form a chain of evidence from cause of an event:

But the evidence should not be used to look for the cause or symptom of an event without any new information. To avoid this problem, many rule-based systems either limit their rules to one structure or clearly partition the two kinds so that they cannot interfere with each other. The Bayesian network suggest a systematic solution to this problem.

We can summarise this discussion of certainty factors and rule-based systems which is very useful but will be appreciated only after you are done with the expert systems. The approach makes strong independence assumptions which make it relatively easy to use; at the same time assumptions create dangers if rules are not written carefully so that important dependencies are captured.

The approach can serve as the basis of practical application programs. It did so in MYCIN. It has also done so in a broad array of other systems which have been built in the EMYCIN platform, which is a generalisation (often called a shell) of MYCIN with all the domain-specific knowledge expressed, rules stripped out. One reason that this framework is useful, despite its limitations, is that it appears that in an otherwise robust system the exact numbers which are used do not matter very much.

The other reason is that the rules were carefully designed to avoid the major pitfalls we have just described. One other interesting thing about this approach is that it appears to mimic quite well the way people manipulate certainties.