San José State University
Thayer Watkins
Silicon Valley
& Tornado Alley

The Introduction to Matrix Calculus:
The Extention of Calculus Operations
to Matrices

The calculus operation of taking derivatives when applied to matrix functions is a whole order of complexity greater than that of ordinary calculus. Take for example the simplest of matrix functions

N = X²
where N and M
are n×n matrices

The increments ΔN are given in terms of the increments ΔM by

ΔN = MΔM + ΔM·M + (ΔM)²

As the increments ΔM reduce to a matrix of infinitesimals dM the relation reduces to

dN = MdM + dM·M

This is quite different from what might be expected in analogy with the calculus of real variables; i.e.,

dN = 2MdM

Something of this sort can be achieved with the use of the commutator function for matrices,

[A, B] = AB − BA
and hence
BA = AB − [A, B]
BA = AB + [B, A]

Thus dM·M is equal to MdM − [M, dM] and

dN = 2MdM − [M, dM]

What immediately is clear is that derivatives involving matrices are awkward. Something in the nature of (∂N/∂M) as a matrix would be an n²×n² matrix consisting of n² blocks each of which represent (∂nij/∂M). It would be better to consider (∂N/∂M) as a four dimensional object called a tensor. But better yet the relationship between the changes in N due to changes in M should be displayed, as above, in terms of the relationship between the differentials dN and dM.

We see that if P=M³ then

dP = M²dM + MdM·M + dM·M²

The generalization to Q=Mk is obvious though awkward.

The useful matrix exponential function is defined as

exp(M) = I + M + M²/2! + M³/3! + higher order terms of the form Mk/k!

where I is the n×n identity matrix.

Clearly the increment in the exponential function exp(M+ΔM)−exp(M) is going to be complicated even when the increments in M are infinitesimal.

For the special case of a ΔM that commutes with M

exp(M+ΔM) = exp(ΔM)exp(M) = exp(M)exp(ΔM)

(To be continued.)