# Distribution of Sample Means in large samples

In practical sampling work, the composition of the universe from which the sample is drawn is seldom known. Therefore, the sampling distribution of the means will also be unknown. This would seem to rule out the possibility of constructing interval estimates of the type discussed above.

However, mathematicians have derived a theorem called the Central Limit Theorem, which makes it possible to construct interval estimates whose properties are known sufficiently well for most practical purposes. In rather crude terms, the Central Limit Theorem states that the distribution of sample means for large samples will be approximately a normal distribution.

This implies that, in most practical situations, the investigator can assume the distribution of sample means (derived from probability samples) will be adequately approximated by a normal distribution. Before explaining how this is done, it will be necessary to explore certain features of this approximating normal distribution.

Characteristics Features of the Approximating Normal Distributions:

To construct a confidence interval estimate, researchers must have knowledge of several features of the approximating distribution. These will be illustrated in Figure.

Three distributions are shown here. Curve A represents the (hypothetical) universe of all household incomes in a large suburb of a major city. This distribution will be highly skewed (asymmetrical). The overwhelming majority of incomes will be less than the average household income. A few very affluent households will have extremely large household incomes, so the tail of this universe will extend very far to the right. (Such positive skewness is characteristic of the distribution of item values in many universes encountered in marketing research).

The investigator wishes to estimate the mean income of households (i.e. the mean of the A curve shown) based on a random sample from that universe. Shown in figure are normal distribution of sample means for a medium sized sample of 100 (curve B), and a larger one of 400 (curve C). Figure will be used to illustrate certain general features of the distribution of sample means, based on large simple random samples. There are four important characteristics to be noted:

1) The mean of the distribution of the sample means is equal to the universe mean. Figure shows the universe mean to lie at the vertically dashed line. The means of curve B and curve C lie there, also, because the sample mean is an unbiased estimate of the universe mean.
2) The distributions of sample means (curve B, curve C) are symmetrically distributed about the universe mean.
3) In a distribution of sample means, there is a general tendency for sample means to occur in the vicinity of the universe mean (i.e. small deviations of sample means from the universe mean are more likely than large deviations).
4) As the sample size used gets larger, the distribution of sample means becomes more lightly clustered around the universe mean. Comparison of the shapes of curve B and curve C illustrates this feature.

To summarize, the normal distribution approximately to the distribution of sample means is a symmetrical distribution centered about the mean of the universe sampled. In general, small deviations of the sample mean from the universe mean are more probable than large deviations. The larger the sample size, the more tightly clustered around the universe mean will be the distribution of sample means. The validity of these conclusions is assured by the Central Limit Theorem for most purposes when large simple random samples are used.

Standard Deviation:

When constructing a confidence interval estimate of the universe mean, researchers also use another characteristic of the approximating normal distribution. That characteristic is its standard deviation. Recall that the standard deviation (σ) of any distribution or universe of items is a measure of the dispersion or variability of the items in that universe. It is defined as:

σ = √ Σ (xi – M)2/ N – 1

where, σ = Universe standard deviation
xi = value of the ith item in the universe
M = Universe mean
N= Number of items in universe