Traffic Engineering: Probability, Sampling, Normal Distribution and allied Notes!

Traffic engineering deals extensively with data obtained from surveys and studies related to traffic, which are useful in the design and operation of highways. Statistical methods have proved to be a powerful tool in the analysis and interpretation of these data. Elementary concepts of statistics are considered here, with particular reference to their application to traffic studies, wherever possible.

1. Probability:

Knowledge of the concept of probability is basic to the understanding of statistical methods. Probability theory applies to events that happen by chance or randomly. Possible outcomes of the toss of a coin, a card game, or dice are examples. Some ideas of probability are encountered in the laws of combinations and permutations. Suppose that we want to know the probability of heads occurring when we toss a coin.

There are only two possibilities of occurrence-heads and tails, and assuming that circumstances do not favour either, the probability is one in two, or 1/2, or 0.5. If we have to choose one of four alternatives at random, and if we are not biased towards any one of them, the chance or probability of any one of them happening is one in four (1/4 or 0.25). Thus, the probability of any event is the ratio of the number of favourable outcomes to the total number of all possible outcomes.

ADVERTISEMENTS:

A probability of zero indicates an impossibility while a value of unity indicates a certainty. A probability of 1/2 indicates equal chance for the occurrence or otherwise of an event.

Laws of Probability:

(i) The minimum and maximum values of the probability are zero and 1 –

0 ≤ p(x) ≤ 1

ADVERTISEMENTS:

Here, p(x) means the probability of the occurrence of an event x, or a value of the random variable x.

(ii) If p(x) is the probability of occurrence of an event x, the probability of its non-occurrence is 1 – p(x); this is denoted by the symbol p(–x).

(iii) If an event X is a composite event, a collection of simple events x1, x2, …, xn, then the probability of the event X is the sum of the probabilities of the simple events which contribute X –

p(X) = p(x1) + p(x2) +….p(xn)

ADVERTISEMENTS:

(iv) The total probability of one of two events x1 and x2, which are mutually exclusive (that is, if x1 occurs, x2 does not, and vice versa), is the sum of the probabilities of x1 and x2

p(x1 or x2) = p(x1) + p(x2)

This can be extended to more than two mutually exclusive events in a similar way.

(v) If two events x1 and x2 are independent, that is the occurrence of one has no influence on the occurrence of the other, the probability that both will occur together is –

ADVERTISEMENTS:

p(x1 and x2) = p(x1x2) = p(x1) × p(x2)

This is called joint Probability.

Probability Distribution:

Let us consider a simple count of vehicles arriving at a particular point on a highway in a chosen fixed interval of time, say 30 seconds. Let the probabilities of 1, 2, 3, 4, 5, 6, 7 and 8 or more vehicles arriving at the point be –

If these values are plotted with the number of vehicles (n) on the x-axis and the corresponding probability p(n) on the y-axis, we get a figure as shown below (Fig. 4.1).

In this case, the random variable—the number of vehicles arriving at a point in a chosen interval of time—is a discrete quantity; therefore, the probabilities are represented by vertical lines (just as rectangles in the case of a histogram).

If the random variable is a continuous function (for example, headway between vehicles moving at different and variable speeds), the relation between the variables and their probabilities can be plotted on a smooth curve. If the continuous variable is designated x and the probability is p(x), their relationship can be represented by a curve, which is called the probability function (Fig. 4.2).

2. Sampling:

If the distribution of the variable is known, all the parameters needed for its analysis can be determined. But it is not always known.

The number of observations is necessarily finite; this is called the sample size. The larger the sample size, the better the estimates of the statistical parameters determined from the observations.

Sample Statistics:

Two groups of ‘sample statistics’ may be calculated- One for location and one for dispersion.

Sample Statistics for Location:

They tend to locate a representative value of the sample.

These are given below:

(i) Mean:

This is the simplest and the most commonly used. It is defined as –

Here xi are the observations, n the sample size, and x̅ is the mean. The mean of a set of independent observations is a good estimate of the distribution mean, m, of the population of the random variable (i.e., when the number is infinitely large.)

(ii) Median:

This is another measure of location. The median, xm, is obtained by arranging the values in the sample in the order of magnitude. If the sample size n is odd, the middle value is the median; if it is even, the mean of the middle two values is the median. In other words, there are as many observations larger than the median as there are smaller than it.

(iii) Mode:

This is the value that occurs most often in the random sample.

(iv) Midrange:

The difference between the largest and the smallest values of the observation is called the ‘range’. The value midway along the range is the midrange, which can sometimes be used as a measure of the location. It is simply the arithmetic mean of the largest and smallest values of the observations.

Sample Statistics for Dispersion:

Dispersion, scatter or spread is an important characteristic of a sample.

The following sample statistics are used as measures of dispersion:

(i) Range:

We have already defined range as the simplest measure of dispersion. But the following are considered better measures.

(ii) Mean Deviation:

This is conventionally called the average error. It is defined as the arithmetic mean of the absolute values of the deviations from any measure of location (usually, the mean).

Thus, the average deviation from the mean for a sample of observations is-

(vi represents the deviation from of observation xi from the mean x̅ ).

(iii) Standard Deviation:

This is a geometrically significant parameter of a distribution (σx). It is defined as

This is the standard deviation of any one observation. The standard deviation of the mean, x̅, on the other hand, is given by-

clip_image012

(This is also called the standard error.)

This is always less than that for a single observation. It is the most useful measure of dispersion.

(iv) Variance:

This is the square of the standard deviation. Like the standard deviation, it can be calculated for a single observation on for the sample mean.

3. Normal Distribution:

This is a standard probability curve in statistical analysis of the distribution of a continuous variable. Normal or nearly normal distributions are of common occurrence in physical phenomena. It is useful in sampling, since it is observed that, irrespective of the distribution of the population of a continuous variable, means of several random samples of the population tend to assume a normal distribution. Another advantage is that it provides a good approximation for other distributions.

A typical graphical representation of normal distribution is shown in Fig. 4.3.

The equation for a normal distribution curve is that of the Gaussian function, given below –

For – ∞ < x < ∞

This is characterised by two parameters — the distribution mean μx and the standard deviation σx. The value of p(x) is maximum at x = μx, about which the curve is symmetrical. The curve is bell-shaped and μx fixes the location along the x-axis.

The points of inflection of the curve lie at a distance σx on either side of μx, which is a parameter of location. The standard deviation, σx, is a measure of the spread or dispersion of the distribution. It reflects the degree of variation in the measurements. The larger the value of σx, the larger the range of variation in the measurements. The square of the standard deviation is called the variance.

A special case arises when μ = 0 and σ = 1, which yields the standard normal distribution, shown in Fig. 4.4.

If (x – μ)/σ designated as the standard normal variate, z, the equation of the standard normal distribution curve is given by-

For this standard normal curve, the area under any part of the curve may be got from tables specially prepared for this purpose. Cumulative values of these, P(z), are also available. The data from these tables find application in spot speed studies and calculations to determine the probability value for the speed to be between certain limits.

4. Relative Precision and Confidence Limits:

Relative precision of a particular observation is the ratio of its precision to the value of the observation itself. Thus, if d is the value of an observation and σd its standard deviation, (σd/d) is the relative precision of that observation. Relative precision is dimensionless; it is expressed as a fraction or as a percentage. This may be specified first and used to determine tolerances before observations are made. This helps the engineer to decide upon both the equipment and observation techniques.

Precision can also be understood by the shape or spread of the probability distribution of the observations. A small spread indicates high precision, while a large spread implies low precision, as shown in Fig. 4.5. In other words, the higher the precision, the smaller the standard deviation and vice-versa.

The concept of uncertainty or confidence limits is another useful index of the reliability of an observation. For a normal distribution, it is expressed with reference to the standard deviation of the observation.

From Fig. 4.6, it is observed that the area under the curve between the ordinates at (μ – σ) and (μ + σ) is 0.6826. Therefore, the probability that an observation falls between these values is 0.6826. This may be expressed thus –

It is common to express uncertainty by fixing the probability to a round values such as 0.5 or 0.9, and determining the corresponding multiplier of σ.

Fig. 4.7 shows four such confidence limits – 50%, 90%, 95%, and 99%.

These are expressed as follows:

The range for the 50% uncertainty used to be called the ‘probable error’ but now only the terms uncertainty or confidence limits are being used.

The 50% uncertainty is the range -0.674 σ to 0.674 σ and the 90% uncertainty is from -1.645 σ to 1.645 σ.

This means that the probability is 50% that an observation is within ± 0.674 σ of the population mean or true value μ, and it is 90% that the measurement is within ± 1.645 σ of the true value.

5. Weight of an Observation:

In practice, some observations are more precise than others because of better equipment, improved techniques, or favourable field conditions. It is therefore desirable to assign numbers to observations indicating their relative degrees of reliability or trustworthiness. Such a number is the weight of the observation.

We have seen that there is an inverse relationship between precision and standard deviation; on the other hand, there is a direct relationship between precision and weight of
an observation; the more the weight of an observation, the higher its precision, and vice-versa.
This should be obvious from the definition of the weight. Consequently, the weight must be inversely related to the variance.

If there are several independent observations with their variances known, their relative weights may be calculated directly as the reciprocals of their variances. It is common to assign a unit weight to one of the observations, and adjust the weights of all the others in such a way that they are round numbers greater than unity.

In some cases, variances are not known, and weights must be assigned to observations based on their relative precision. The engineer uses a scientific approach and logic wherever possible, or else his direction and experience. Weights are useful in adjusting measurements or observations for possible errors.

6. The Most Probable Value:

From the preceding discussion, some general laws of probability may be stated:

(i) Small errors are more probable, i.e., such errors occur more often than large ones.

(ii) Large errors are less probable as they occur infrequently; for normally distributed errors, unusually large values could be mistakes rather than random errors.

(iii) Positive and negative errors of the same magnitude occur with equal frequency, i.e., they are equally probable.

In certain cases, the true value of a quantity is never known. However, the most probable value of the quantity may be calculated from a set of observations for this quantity. This set may be unweighted and hence considered to be of equal or unit weight; or it may be weighted, with the weights assigned on an acceptable or reasonable basis by the engineer. The procedure for determining the most probable value is a little different in the two cases.

For observations of equal weight, the most probable value is simply the arithmetic mean of the set of observations. For observations that have been assigned unequal weights, the most probable value is the weighted arithmetic mean of the observations.

These results can be intuitively inferred from the general laws of probability, or they can be derived from the principle of least squares.

7. The Principle of Least Squares:

The most probable value of a quantity can be determined using the ‘principle of least squares’. ‘Residuals’ can be computed if the most probable value is determined. A residual is simply the difference between the measured or observed value of a quantity and its most probable value.

In fact, using residuals, the principle of least squares may be stated thus:

“The most probable value of a quantity is such that the sum of the squares of the residuals is a minimum.”

For weighted observations, it is stated:

“The most probable value of a quantity is such that the sum of the weighted squares of the residuals is a minimum.”

Residuals are theoretically similar to errors except that the former can be calculated, Errors cannot be calculated because the true value of a quantity is never known. Hence, in the analysis and adjustment of measurements or observations, one deals with residuals rather than errors.

Detailed consideration of the principle of least squares and its application to the determination of the most probable value for interrelated or conditioned quantities, which maybe directly or indirectly observed, is beyond the scope of the present treatment.

8. Other Distributions:

A few distributions other than normal distribution also find application in traffic engineering problems. Two such are the ‘binomial distribution’ and the ‘Poisson distribution’.

(i) Binomial Distribution:

This is based on a few simple assumptions and it is valid only for discrete values of the number of times an event can occur in a given number of trials.

This can be represented as given below:

(ii) Poisson Distribution:

This is the limiting case of the more generalised binominal distribution for the probability of the occurrence of a particular event in a specified number of trials. This was enunciated by Poisson, a French mathematician. As r becomes large, while the value of np is a finite constant, the binominal distribution approaches the Poisson distribution as a limit.

Poisson distribution is stated as:

Poisson distribution is found to be very useful in dealing with the random properties of traffic; for example, the arrival pattern of vehicles on a highway.

For evolving relationships between random variables based on a, set of observations, linear regression analysis (determining the best fit straight line between an independent variable and a dependent variable), multiple linear regressing analysis (involving two or more independent variables), and nonlinear regression analysis (determining the best-fit curve as the presumed relationship between the independent and dependent variables) may be used along with the principle of least squares to solve certain complex problems in traffic engineering; however, detailed treatment of these methods is beyond the scope of the present treatment.