San José State University
Department of Economics
Thayer Watkins
Silicon Valley
& Tornado Alley

The Effect of Averaging Variables
Which Are the Cumulative Sum
of Random Disturbances

Consider variables which are of the form

T(t) = T(t-1) + U(t)
and thus
T(t) = U(0) + U(1) + U(2) + … + U(t-1) + U(t)

where the U(s)'s are independent variables, random or otherwise.

Now considering averaging over intervals. First take two-period intervals.

T(t) = U(0) + U(1) + U(2) + … + U(t-1) + U(t)
T(t+1) = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + U(t+1)

Therefore the moving average T(t) is given by

T(t) = ½[T(t)+T(t+1)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + ½U(t+1)

The weight of U(t) in the average is twice that of U(t+1).

The formulas for three-period and four-period averages are

T(t) = (1/3)[T(t)+T(t+1)+T(t+2)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + (2/3)U(t+1) + (1/3)U(t+2)
T(t) = (1/4)[T(t)+T(t+1)+T(t+2)+T(t+3)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + (3/4)U(t+1) + (1/2)U(t+2) + (1/4)U(t+3)

The weight of the first disturbance, U(t), is three and four times, respectively, of the last disturbance in the average.

The general formula is clear

(1/n)[T(t)+T(t+1)+T(t+2)+ … +T(t+(n-1))] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + [(n-1)/n]U(t+1) + [(n-2)/n]U(t+2) + … + (1/n)U(t+(n-1))

For annual averages the disturbances during Januaries have twelve times the weight of disturbances during Decembers and disturbances on January firsts have 365 times the weght of disturbances occurring on December thirtyfirsts. Likewise for daily averages the disturbances occurring between midnight and 1 A.M. have 24 times the weight of disturbances occurring between 11 P.M. and midnight. This suggests that for statistical analysis it is not a good idea to work with interval averages. Instead the values at a specified point in the interval, say the ends of the interval or the midpoints of the intervals, should be used.

The First Differences of Interval Averages

Consider the two-period averages T(t)=½[T(t)+T(t+1)].


T(t)=½[T(t)+T(t+1)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + ½U(t+1)
T(t+1)=½[T(t+1)+T(t+2)] = U(0) + U(1) + U(2) + … + U(t-1) + U(t) + U(t+1) + ½U(t+2)

it follows that

T(t+1)T(t) = U(t+1)+½U(t+2)−½U(t+1)
or, equivalently
T(t+1)T(t) = ½U(t+1)+½U(t+2)


T(t+2)T(t+1) = ½U(t+2)+½U(t+3)

Because (T(t+1)T(t)) and (T(t+2)T(t+1)) both depend upon ½U(t+2) there is a positive serial correlation for the first differences of the averages even if there is no serial correlation for the U(t)'s.

Also since (T(t+1)T(t)) and T(t) both depend upon U(t+1) there will be a positive correlation between the change in T(t) and its value. There would be no such correlation between the unaveraged T(t) and (T(t+1)−T(t)). Thus averaging introduces spurious correlations into the statistical series.

The serial correlation can extend beyond a one period lag. Consider now an averaging over a three period interval. Then


This means

T(t) = U(0) + U(1) + U(2) + … + U(t-2) + U(t-1) + (2/3)U(t) + (1/3)U(t+1)
where this can be more clearly represented as
T(t) = T(t-2) + (3/3)U(t-1) + (2/3)U(t) + (1/3)U(t+1)


T(t+1) = T(t-2) + U(t-1) + (3/3)U(t) + (2/3)U(t+1) + (1/3)U(t+2)
and therefore
T(t+1)T(t) = (0)U(t-1) + (1/3)U(t) + (1/3)U(t+1) + (1/3)U(t+2)
or, equivalently
T(t+1)T(t) = (1/3)[U(t)+U(t+1)+U(t+2)]


T(t+2)T(t+1) = (1/3)[U(t+1)+U(t+2)+U(t+3)]
T(t+3)T(t+) = (1/3)[U(t+2)+U(t+3)+U(t+4)]

This means there will be a positive correlation between [T(t+1)T(t)] and [T(t+2)T(t+1)] and also between [T(t+1)T(t)] and [T(t+3)T(t+2)] because of their common dependencies.

The Trend Derived from the Difference of the Averages over an Interval

Suppose an estimate of the trend in a variable T(t) is defined by

k = [T(t+s)T(t)]/s

where T(t) is the average of T over an interval n and T(t) is given by

T(t) = T(t-1) + U(t)

The true trend is defined to be the common expected value of the U(t)'s; i.e., K=E(U(t)}. One question is whether k is an unbiased estimate of K; i.e. is the expected value of k equal to K? A second question is what is the value of the standard deviation of k and how does it depend upon s and n.

From its definition of T(t) being equal to T(t-1) + U(t) and T(t)=1/n)[T(t)+T(t+1)+T(t+2)+ … +T(t+(n-1))] it was shown previously that

T(t) = T(t-1) + Σj=0n-1 [(n-j)/n]U(t+j)
and consequently
T(t+s) = T(t+s-1) + Σj=0n-1 [(n-j)/n]U(t+s+j)

Therefore the difference of the averages is

T(t+s)T(t) = Σj=0n-1 [(n-j)/n]U(t+s+j) + [T(t+s-1)−T(t-1)] − Σj=0n-1 [(n-j)/n]U(t+j)

The term [T(t+s-1)−T(t-1)] is just the sum of the values for U from t to t+s-1. The values for t to t+n-1 correspond to the values in the second summation on the right. Thus, with a little rearrangement,

T(t+s)T(t) = Σj=0n-1 [j/n]U(t+j) + Σj=nt+s-1 U(t+j) +Σj=0n-1 [(n-j)/n]U(t+s+j)

Stated differently

T(t+s)T(t) = Σj=0s+n-1 wjU(t+j)


The question is what is the sum of the weights, wj. Let H is the number of intervals, H=s/n. There are H-1 intervals for which the weights are unity. Therefore the sum of their weights is n(H-1). The weights in the first and last interval can be combined into pairs whose weights sum to unity; therefore the sum of the weights in the first and last intervals is equal to n. Thus the sum of all the weights is equal to n(H-1)+n or nH which is the same as s.

Therefore the expected value of the trend k is

E{k} = [Σj=0s+n-1 wj/s]E{U(t+j)} = (s/s)K = K

Thus k is an unbiased estimate of K.

For serially uncorrelated U(t)'s the variance σk² of k is given by

σk² = (Σj=0s+n-1 wj²)σ²

where σ² is the common variance of the U(t)'s.

The sum of the squared weights is given by

j=0s+n-1 wj²) = (n-1)n(2n-1)/(3s²) + (s-n-1)/s²
which can be rewritten as
j=0s+n-1 wj²) = (1/s) + [(n-1)(2n-1)/(3n) - (n+1)]/s²
j=0s+n-1 wj²) = (1/s){1 + ([(n-1)(2n-1)/(3n) - (n+1)]/s}

Thus the variance of k will be larger than would be the case for equal weights by a factor that decreases with s but changes in an uncertain direction with n.

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins