applet-magic.com
Thayer Watkins
Silicon Valley
& Tornado Alley
USA

The Probability Distribution of the Sample Quartiles
as a Function of Sample Size

The purpose of the material below is to illustrate what happens to the distribution of sample quartiles as the size of the samples increases. This purpose is accomplished by drawing 2000 samples, computing their lower quartiles and constructing the histogram of those sample quartiless. (Each time the screen is refreshed a new batch of 2000 samples is created.)

The random variable is uniformly distributed from -0.5 to +0.5; i.e.,


p(x) = 1 for -0.5≤x≤+0.5
p(x) = 0 for all other values of x
 

For a sample of size n the quartile is found by ranking the sample values. For n=4k+1 the quartile is the value in the place in the ranking. Thus for n=5 the second value in the ranking is taken. For n=9 the average of the third place value in the ranking is taken.

Statistical Simulations

Below are shown the histograms for samples of various sizes.

Analysis

Let p(x) be the probability density function for a random variable X and let P(x) be the cumulative probability function; i.e.,


P(x) = ∫−∞xp(z)dz.
 

The quartile of a distribution, denoted as xquart, is defined as the value of x such that is of getting a larger value and getting a smaller value than xquart. In other words, P(xquart)=0.5. one fourth probability of getting lesser value and three quarters probability of getting a greater value.

The probability density q(x) that the sample quartile has a value of x for a sample size of n, n being odd, is the probability density p(x) times the probability that (n-1)/2 of the sample are above x and (n-1)/2 are below x; i.e.,


q(x) = cnP(x)(n-1)/4(1-P(x))3(n-1)/4p(x)
which is the same as
q(x) = cn[P(x)(1-P(x))3](n-1)/4p(x)
 

where cn is a coefficient that represents the number of ways a sample of 3(n-1)/4 values above x and (n-1)/4 below x can be arranged.

The term [P(x)(1-P(x))3](n-1)/4 reaches its maximum at the same value of x for which P(x)(1-P(x))3 reaches its maximum. This for a value of P such that the derivative of P(1-P)3 with respect to P is zero; i.e.,

(1-P)3 − P[3(1-P)²] = 0
dividing by (1-P)² gives
(1-P) − 3P = 0
which reduces to 4P = 1
and hence P = 1/4.

Thus the term term [P(x)(1-P(x))3](n-1)/4 reaches its maximum for x such that P(x)=1/4; i.e., for x equal to the quartile, xquart.

As the sample size increases the probability density function for the sample quartile becomes more concentrated near the quartile for the probability density function p(x). Likewise the dispersion of the probability density function for the sample quartile becomes smaller as the sample size increases and the limit is zero dispersion as the sample size increases without bound.

For large enough n the value of p(x) away from xquart becomes irrelevant; the quartile of the sample has to be arbitrarily close to xquart. Likewise the dispersion of the distribution of the sample quartile has to become smaller and smaller as the sample size increases and approaches zero as a limiting value.

The limiting of the analysis to sample of only an odd size is not a significant limitation.

Conclusions

The expected value of the quartile of the sample is equal to the quartile of the distribution p(x). In other words, the sample quartile is an unbiased estimate of the population quartile. Furthermore the limit of the dispersion of the distribution of sample quartiles as sample size increases without bound is zero.


HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins