San José State University
Department of Economics

applet-magic.com
Thayer Watkins
Silicon Valley
USA

 Estimating the Parameters of a Bent Line in Regression Analysis

## The Unconstrained Case

Suppose y is a function of x but the slope of the relationship changes at x=k1. A regression line for such a function is achieved by defining a new variable x such that

#### x1 = 0 if x<k1and x1 = x − k1 for k1≤x

The variable x1 can be defined more succinctly as x1=u(x-k1), where u(z) is the function such that u(z)=0 if z<0 and u(z)=z for z≥0.

The regression of y on x and x1 gives an equation such as

#### y = d0 + c0x + c1x1

The coefficient c1 gives the change in the slope of the relationship at k1.

Thus the slopes of the relationship are:

#### dy/dx = c0 for x<k1and dy/dx = c0+c1 for x≥k1

For more than one bendpoint the procedure is analogous and the slope of the relationship is the sum of the coefficients up to the different levels of x.

## The Constrained Case

This covers the case in which the slopes of the relationship over different intervals are required to be the same. For example, suppose y is a function of time such that there are bend points at k1 and k2. Furthermore suppose the slope of the relationship in the third interval (from k2<x has to be the same as the slope in the first interval, from 0<x<k1.

Now consider the case in which a regression of y on variable x, z and w has to be of the form

#### y = a + bx + bz + cw

This form is the same as

#### y = a + b(x+z) + cw

That is to say, y must be regressed on (x+z) and w. Adding two variables together forces the regression coefficients to be the same.

Likewise if the regression has to be of the form

#### y = a + bx −bz + cw then y = a + b(x-z) +cw

Now consider again the previous example in which the slope in the third interval of a trend line is required to be the sames as the slope in the first interval. First two additional variables x1 and x2 need to be defined as

#### x1 = u(x-k1) and x2 = u(x-k2)

An unconstrained regression would yield an equation of the form

#### y = d0 + c0x + c1x1 + c22

The slope of the relationship in the first interval is c0 and in the third interval it is c0+c1+c2. For the slopes in the first and third interval to be equal requires that

#### c0 = c0+c1+c2which reduces to c2 = −c1

Therefore the regression equation is of the form2

#### y = d0 + c0x + c1x1 − c1x2which reduces to y = d0 + c0x + c1(x1−x2)

Thus y must be regressed on x and (x1−x2).

This method can be generalized.

Suppose the relationship cyclic relationship is of the form Then the regression would use a generated variable of the form The other variable in the regression would just be the trend variable.