Use of Statistics in Communication

After reading this article you will learn about the use of statistics for the improvement in the methods of communication.

Statistics:

Inventors are usually practical people, their forte being their capability to convert theoretical concepts into practical forms. Faraday, although a self-taught man from a humble background, became a famous inventor, but it was left to James Clarke Maxwell to lay the theoretical foundations for Faraday’s work. It is most helpful if the theoretical background is clearly explained and understood.

In physical systems, this usually means understanding the mathematical and statistical analysis involved. We shall, therefore, refresh the elements of statistics and mathematics that we are likely to encounter in our attempt to understand the issues in communications and networking.

ADVERTISEMENTS:

While it is expected that the concepts of differential and integral calculus are sufficiently clear and do not need revision except when used with vectors, the concepts and rules governing vectors and vector analysis need to be reviewed since they are not used regularly and are required for understanding Maxwell’s equations.

Random Processes:

Any realistic model of a real-world phenomenon must take into account the possibility of randomness. Most processes in nature are random. That is, more often than not, the quantities we are interested in will not be predictable in advance but, rather, will exhibit an inherent variation that should be taken into account when analysing this model.

The term random is used to describe something that is erratic and unpredictable. Random signals are encountered in every practical application in communications. Nevertheless, these random events can be represented in such a way that some sense may be made out of their representation.

ADVERTISEMENTS:

Sample Space and Events:

Suppose that we are about to perform an experiment whose outcome is not predictable in advance. While the outcome may not be known in advance, suppose that we know the set of all the possible outcomes. This set of all the possible outcomes is known as the sample space.

The simplest examples of experiments are the tossing of a coin or the tossing of a six sided-die. In the first case, that is, tossing a coin, the sample space consists of {H, T}, where H represents Head and T represents Tail as outcomes. In case of a six-sided die, the sample space consists of {1,2,3,4,5,6}. These facts will help to explain the concept of probability.

Mean, Variance and Standard Deviation:

ADVERTISEMENTS:

We shall concern ourselves with issues of randomness, probability and probability distribution, and discuss the various measures that are used in statistical analysis.

In order to describe the characteristics of any random group, certain measures have been defined. These start with the measure ‘mean’ or average. The mean of any statistical population is the sum of the values describing each observation divided by the total number of observations.

Assume that one wishes to compare the performance of two classes of students in an examination. If the first class of students has n₁ students then the grades obtained by each student in the examination under consideration are added and the sum is divided by the number of students who have taken that examination that is by n₁. Thus the mean is

Similarly, we calculate the corresponding value of the mean for the second class, say µ₂.

ADVERTISEMENTS:

These means can then be compared to judge which of the two classes have performed better. However, the mean may not be an adequate measure of a group. It is possible that a very few students may have got full marks, whereas a very large number may have got abysmally low marks, but the mean may be high because of the few very high marks.

For example, consider a class of say 10 students. Assume that 3 students out of this 100 have got 100 marks each, whereas the rest 7 have all got zeroes. Then the average marks are (3 x 100 + 0)/10 or the mean is 30.

In the second group, while two boys have got 0 each, the rest have got an average of 40 marks each. Therefore, the total marks obtained are 40 x 8 + 0 and the mean is 32. This would tend to imply that both the classes are almost identical.

ADVERTISEMENTS:

Unfortunately, that is not true. Therefore, to measure the dispersion of data, another measure, called variance, is added. Variance v for a group of n₁ students is given by

where x_i represents each individual data observation.

Thus the variance indicates a measure of the dispersion of individual data items about the mean.

For normal descriptive purposes, the square root of the variance—the standard deviation—is taken as the measure of dispersion, since it gives a measure of the dispersion in similar units as the individual data items. Thus mean and standard deviation are taken to be adequate measures to describe any frequency distribution. Standard deviation σ is, therefore

Probability Models:

Consider an experiment whose sample space is For each event E of the sample space S, we assume that a number P(E) is defined and satisfies the following three conditions:

where P(E) is the probability of the event E and P (E_n) is the probability of the event E_n.

An example may explain the above statements more clearly. Assume that we have an unbiased coin. Then the likelihood of getting a head in a toss-is the same as that of getting a tail. The sample space consists of {H, T}. Therefore

On the other hand, if we have a biased coin such that the likelihood of getting a head is twice that of getting a tail then

The definition of probability given above as being a function defined on the events of a sample space is a formal one but it turns out that these probabilities have a nice intuitive property.

If our experiment is repeated again and again then with the probability of 1 (that is, with certainty) the probability of times that event E occurs is just P(E). Since E and E^C are always mutually exclusive and since E U E^C = S, we have from the above definition

that is, the probability that an event does not occur is one minus the probability that it does occur. We shall now consider the condition the probability of P(E U F), that is the probability of all points in E plus the probability of all points in F. Since all points that are both in E as well as in F will be counted twice in P(E) + P(F), we have

It may be noted, however, that when E and F are both mutually exclusive that is when EF = 0, Eqn. 2.6 reduces to

Conditional Probability:

Consider a situation where we have two fair six-sided dice. Since they are fair, when they are both thrown together, there can be 36 possible outcomes each with the probability of occurrence of 1/36. Suppose we observe that the first die shows four.

Knowing this information, what is the probability that the total of the two dice is six? To calculate this probability we reason as follows: Given that the initial die is 4, there can be at most six possible outcomes of our experiment—namely, (4,1), (4,2), (4,3), (4,4), (4,5) and (4,6).

Since each of these outcomes had equal probability of occurring, they should still have that. That is, given that the first die is a 4, then the conditional probability of each of the outcomes is 1/6 while the conditional probability of each of the other points in the sample space is 0. Hence the desired probability is 1 /6.

If we let E and F represent the event that the sum of the dice is six and the event that the first die is 4 respectively, then the probability just obtained is called conditional probability that E occurs, given that F has occurred and is denoted by P(E/F).

A general formula for P(E/F) which is valid for all events E and F is derived in the same manner as above—namely, if the event F occurs, then in order for E to occur it is necessary that the actual occurrence be a point in both E and F, that is, it must be in EF.

Now, as we know that F has occurred, it follows that F is our new sample space and hence the probability that the event EF occurs will equal the probability of EF relative to the probability of F. Thus

It may be noted that Eqn. 2.7 is only defined when P(F) > 0 and hence P(E/F) is only defined when P(F) > 0. Let us see the application of this conclusion illustrated by some examples.

Example 1:

Assume that a box contains 10 identical-looking cards, each card carrying a number 1 through 10 so that no two cards carry the same number. One card is drawn at random and we are told that the number on that card is at least 5. What is the conditional probability that it is 10?

Solution:

Let E denote the event that the number on the drawn card is 10 and let F be the event that it is at least 5. Therefore, the required probability is P (E/F). Now from Eqn. 2.7,

However, EF = E since the number on the card will be both 10 and at least 5 if and only if it is 10.

Therefore:

Example 2:

Three men in a party throw their pens into a pool. This pool is then mixed up and then each man picks up a pen randomly. What is the probability that none of the three picks up his own pen?

Solution:

We shall calculate the probability that at least one man picks up his own pen. Let us assume that E_i, i = 1,2,3, the event that the ith man ends up with his own pen. In order to calculate the probability P(E₁ U E₂ U E3), we note that

Consider Eqn 2.8. Consider first P(E_iE_j) = P(E_i)P(E_j/E_i). The probability that the ith man selects his own pen is clearly 1/3 since he is equally likely to select any of the three pens.

On the other hand, given that the ith man selected his own pen, there are two remaining pens that the jth man may select from, one of the two pens being his own. Therefore, the probability that he will select his own pen is 1/2. Therefore, P(E_j/E_i) = 1/2 and so

To calculate P(E₁ E₂ E₃) we note that

However, given that the first two men get their own pens, it follows that third man will automatically get his own pen. Therefore, P(E₃ /E₁ E₂) = 1. Therefore

Now we have

Baye’s Formula:

Let E and F be events. We may express E as E = EF U EF^C because for a point to be in E it must either be in both E and F or it must be in E and not in F. Since EF and EF^C are mutually exclusive, we have

The probability that E will occur is the sum of the probabilities that it has occurred, when F has occurred and when F has not occurred.

Statement 2.9 states that the probability of the event E is a weighted average of the conditional property of E, given that F has occurred and the conditional probability of E given that F has not occurred, each conditional probability being given as much weight as the event it is conditioned on has of occurring.

Equation 2.10 can be further generalized as follows:

Assume that F₁, F₂, …, F_n are mutually exclusive events such that

In other words, exactly one of the events F₁, F₂, …, F_nwill occur. We thus obtain

Equation 2.11 shows how, for given events F₁, F₂,…, F_nof which one and only one must occur, we can calculate P(E) by first obtaining the condition upon which one of the F_ioccurs.

That is, it states that P(E) is equal to a weighted average of P(E/E_i), each term being weighted by the probability of the event on which it is conditioned. Suppose that E has occurred and we are interested in determining which one of the F_j has also occurred. By Eqn. 2.11 we have that

This equation is known as Baye’s Formula.

Random Variables:

The word “Random” is frequently used to describe erratic and apparently unpredictable variations in observations in some experiments. A function whose domain is a sample space and whose range is some set of real numbers is called a random variable of the experiment.

We need a probabilistic description of random variables that works equally well for both discrete and continuous random variables. For example, the random variable is X and the probability of the event is X ≤ x, when x is given. We denote the probability of this event by P(X ≤ x). Obviously, this probability is a function of x which is a dummy variable.

It frequently occurs that in performing an experiment we are mainly interested in some function of the outcome of the experiment. To take a very simple example suppose we roll two dice, then we are interested in the resultant total obtained and not in the value obtained by each die. These quantities of interest are real-valued functions defined on the sample space and are known as random variables.

Since the value of a random variable is determined by the outcome of the experiment, we may assign probabilities to the possible values of the random variables. These random variables may be discrete or continuous. In conducting an experiment it is convenient to assign a variable to the experiment whose outcome determines the value of the experiment.

We need to do so because we do not have advance knowledge of the outcome of the experiment, other than that it may take on a value within a certain range. A function, whose domain is a sample space and whose range is some set of real numbers is called a random variable of the experiment.

We also need to understand the term Probability density function or Probability mass function. This is a mathematical function that describes the probability that a system will take on a specific value, or set of values.

Suppose, for example, the probability, say dP that a molecule of gas will be found with velocity components u, v and w in the x, y and z coordinates respectively, is given by the product of the distribution function and the infinitesimal volume dudv dw; that is dP = f(u, v, w) dudvdw, in which f(u, v, w) is the distribution function describing the velocity of the molecule and dP is the probability of finding the x component of velocity between u and u + du, the y component between v and v + dv, and z component between w and w + dw.

This example describes how probability distribution functions can be used. Incidentally, this example is from Maxwell-Boltzmann distribution law.

Discrete Random Variables:

A random variable that can take on a countable number of possible values is said to be discrete. For a discrete random variable X, we define a probability mass function p(a) of X by

This implies that

Since X must take on one of the values of Xi, we have

Similarly, the cumulative distribution function F can be expressed in terms of p(a) by

While the above details explain the concept of discrete random variables, the examples taken are by no means complete. It must be remembered that there are many discrete random distributions that occur naturally. We will discuss some of them.

Bernoulli Random Variable:

Suppose that a trial or an experiment, whose outcome can be classified as either a “success” or a “failure” is performed. If we let X = 1 in case the outcome is a success and 0 in case the outcome is a failure, then the probability mass function of X is given by

where p, 0 ≤ p ≤ 1, is the probability of that the trial is a “success“.

A random variable X is said to be a Bernoulli random variable if the probability mass function is given by Eqn. 2.14 for some p ϵ (0,1).

Binomial Random Variable:

Suppose that n independent trials each of which results in a “success” with probability p and in failure with probability 1 — p, are to be performed. If X represents the number of successes that occur in n trials, then X is said to be a binomial random variable with parameters (n, p). The probability mass function of a binomial random variable having parameters (n, p) is given by

equals the number of i objects that can selected from a set of n objects. The validity of Eqn. 2.15 may be verified by first noting that the probability of any particular sequence of the n outcomes containing i successes and n — i failures is, by the assumed independence of trials, pⁱ( 1 — p)^n-I. Equation 2.15 then follows since there are different sequences of the n outcomes leading to i successes and n — i failures. For example, if n = 3 and i = 2, then there are= 3 ways in which the three trials can result in two successes. Namely, any one of the three outcomes (s, s, f), (s, f, s), (f, s, s) can occur, where the outcome (s, s, f) means that the first two trials are successes and the third one a failure. Since each of the three outcomes has a probability p²(1 — p) of occurring; the desired probability is thus

It may be noted that from the binomial theorem, the probabilities sum to one, that is

If X is a binomial random variable with parameters (n,p), then we say that X has a binomial distribution with parameters (n,p).

Example:

It is known that all items produced by a certain machine will be defective with probability 0.1, independently of each other. What is the probability that in a sample of three items, at most one will be defective?

Solution:

If X is the number of defective items in the sample, then X is a binomial random variable with parameters (3,0.1). Hence, the desired probability is given by

Geometric Random Variable:

Suppose that independent trials, each having the probability p of succeeding, are performed until a success occurs. If we let X be the number of trials required until the first success occurs, then X is said to be a geometric random variable. Its probability mass function is given by

Equation 2.17 above follows since in order for X to equal n it is necessary and sufficient that the first n — 1 trials are failures and the nth trial is a success. Equation 2.18 follows since the outcomes of the successive trials are assumed to be independent. To check that p(n) is a probability mass function, we note that

Poisson Random Variable:

A random variable X taking on one of the values 0,1,2,3, ………….is said to be a Poisson random variable with parameter ʎ, if for some ʎ > 0

Equation 2.19 defines a probability mass function since

The Poisson random variable has a wide range of applications in a diverse number of areas.

The important property of the Poisson random variable is that it may be used to approximate a binomial random variable when the nominal parameter n is large and p is small. To see this, suppose that X is a binomial random variable with parameters (n,p), and let ʎ = np. Then

Now if n is large and p is small

Continuous Random Variables:

Continuous Random Variables are those whose set of possible values are uncountable. Let X be a continuous random variable; then there exists a non-negative function f(x), defined for all real x ϵ (-∞, ∞), having the property that for any set of B of real numbers

The function f(x) is called the probability density function of the random variable X.

In other words, Eqn. 2.24 states that the probability that X will be in B may be obtained by integrating the probability density function over the set B. Since X must assume some value, f(x) must satisfy

All probability statements about X can be answered in terms of f(x). For instance, letting B = (a, b), we obtain from Eqn. 2.24

If we let a = b in the above equation, we obtain ββwhere P{X — a} is equal to the above expression.

In words this implies that the probability that a continuous random variable will assume any particular value is zero. Let us now look at some examples of continuous random variables that occur frequently in probability theory.

Uniform Random Variable:

A random variable is said to be uniformly distributed over the interval (0,1) if its probability density function is given by

The above statement implies that the probability that X is in any particular subinterval of (0,1) is equal to length of that subinterval. In general, we can say that if X is a random variable on the interval (a, b), its probability density function is given by

Example:

Calculate the cumulative distribution function of a random variable uniformly distributed over (α, β).

Exponential Random Variables:

A continuous random variable whose probability density function is given, for some ʎ> 0, by

is said to be an exponential random variable with parameter A. We will make use of the study of this random variable when we study channel allocation.

For the present we shall only give the cumulative distribution function F:

Gamma Random Variables:

A continuous random variable whose probability density function is given by:

for some ʎ> 0, α> 0 is called a gamma random variable with parameters α, ʎ. The quantity Г(α) is called the gamma function defined by

Normal Random Variable:

X is said to be a normally distributed random variable with parameters µ and σ² if the density of X is given by

This probability density function is a bell-shaped curve symmetrical around the mean µ.

An important fact about normal random variables is that if X is normally distributed with parameters µ and σ² then Y = αx + β is normally distributed with parameters αµ+ β and α²σ².

Expectation of a Random Variable:

While the expected value of a random variable is the same in case the variable is discrete or continuous, we shall Consider the discrete case first. If X is a discrete random variable having a probability mass function p(x) then the expected value of X is defined by

The expected value of X is thus the weighted average of the possible values of X, each value being weighted by the probability of its happening.

Similarly, in the continuous case, the expected value is

This completes the review of statistics that we are likely to encounter while discussing various topics in communication and networking.

Use of Statistics in Communication | Computer Science