applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

The Introduction to Matrix Calculus:
The Extention of Calculus Operations
to Matrices

The calculus operation of taking derivatives when applied to matrix functions is a whole order of complexity
greater than that of ordinary calculus. Take for example the simplest of matrix functions

N = X²
where N and M
are n×n matrices

The increments ΔN are given in terms of the increments ΔM by

ΔN = MΔM + ΔM·M + (ΔM)²

As the increments ΔM reduce to a matrix of infinitesimals dM the relation reduces to

dN = MdM + dM·M

This is quite different from what might be expected in analogy with the calculus of real variables; i.e.,

dN = 2MdM

Something of this sort can be achieved with the use of the commutator function for matrices,

[A, B] = AB − BA
and hence
BA = AB − [A, B]
or
BA = AB + [B, A]

Thus dM·M is equal to MdM − [M, dM]
and

dN = 2MdM − [M, dM]

What immediately is clear is that derivatives involving matrices are awkward. Something in the nature of (∂N/∂M) as a matrix would be
an n²×n² matrix consisting of n² blocks each of which represent (∂n_{ij}/∂M). It would be better to consider
(∂N/∂M) as a four dimensional object called a tensor. But better yet the relationship between the changes in N due to changes in
M should be displayed, as above, in terms of the relationship between the differentials dN and dM.

We see that if P=M³ then

dP = M²dM + MdM·M + dM·M²

The generalization to Q=M^{k} is obvious though awkward.

The useful matrix exponential function is defined as

exp(M) = I + M + M²/2! + M³/3! + higher order terms of the form M^{k}/k!

where I is the n×n identity matrix.

Clearly the increment in the exponential function exp(M+ΔM)−exp(M) is going to be complicated even when the increments in M are
infinitesimal.