& Tornado Alley
as a Function of Sample Size
The purpose of the material below is to illustrate what happens to the distribution of sample medians as the size of the samples increases. This purpose is accomplished by drawing 2000 samples, computing their medians and constructing the histogram of those sample medians. (Each time the screen is refreshed a new batch of 2000 samples is created.)
The random variable is uniformly distributed from -0.5 to +0.5; i.e.,
For a sample of size n the median is found by ranking the sample values. For n odd the median is the value in the (n+1)/2 place in the ranking. For n even the median is taken to be the average of the values at (n-1)/2 and (n+1)/2 places in the ranking. Thus for n=3 the second value in the ranking is taken. For n=4 the average of the second and third in the ranking is taken.
Below are shown the histograms for samples of various sizes.
Let p(x) be the probability density function for a random variable X and let P(x) be the cumulative probability function; i.e.,
The median of a distribution, denoted as xmed, is defined as the value of x such that are equal probabilities of getting a larger value and getting a smaller value than xmed. In other words, P(xmed)=0.5.
The probability density q(x) that the sample median has a value of x for a sample size of n, n being odd, is the probability density p(x) times the probability that (n-1)/2 of the sample are above x and (n-1)/2 are below x; i.e.,
where cn is a coefficient that represents the number of ways a sample of (n-1)/2 values above x and (n-1)/2 below x can be arranged.
The term [P(x)(1-P(x))](n-1)/2 reaches its maximum for the value of x such that P(x)=0.5; i.e. for the median of the probability distribution p(x). Denote that median value as xmed. Because q(x) is the product of [P(x)(1-P(x))](n-1)/2 and p(x), q(x) might reach a maximum for some value of x other than xmed. But as n increases the term [P(x)(1-P(x))](n-1)/2 becomes more and more concentrated aroung xmed. All of the value of P(x)(1−P(x)) are less than or equal to 0.5. For values of x not equal to xmed the values of P(x)(1−P(x)) are smaller than 0.5 and get smaller faster for higher powers than does the value for xmed.
For large enough n the value of p(x) away from xmed becomes irrelevant; the median of the sample has to be arbitrarily close to xmed. Likewise the dispersion of the distribution of the sample median has to become smaller and smaller as the sample size increases and approaches zero as a limiting value.
The limiting of the analysis to sample of only an odd size is not a significant limitation.
The expected value of the median of the sample is equal to the median of the distribution p(x). Furthermore the limit of the dispersion of the distribution of sample medians as sample size increases without bound is zero.
For the distribution of other sample statistics see Sample Statistics, Sample Quartile and Sample Percentile.
HOME PAGE OF Thayer Watkins