3. QR Decomposition#

3.1. Overview#

This lecture describes the QR decomposition and how it relates to

  • Orthogonal projection and least squares

  • A Gram-Schmidt process

  • Eigenvalues and eigenvectors

We’ll write some Python code to help consolidate our understandings.

3.2. Matrix Factorization#

The QR decomposition (also called the QR factorization) of a matrix is a decomposition of a matrix into the product of an orthogonal matrix and a triangular matrix.

A QR decomposition of a real matrix A takes the form

A=QR

where

  • Q is an orthogonal matrix (so that QTQ=I)

  • R is an upper triangular matrix

We’ll use a Gram-Schmidt process to compute a QR decomposition

Because doing so is so educational, we’ll write our own Python code to do the job

3.3. Gram-Schmidt process#

We’ll start with a square matrix A.

If a square matrix A is nonsingular, then a QR factorization is unique.

We’ll deal with a rectangular matrix A later.

Actually, our algorithm will work with a rectangular A that is not square.

3.3.1. Gram-Schmidt process for square A#

Here we apply a Gram-Schmidt process to the columns of matrix A.

In particular, let

A=[a1a2an]

Let ||·|| denote the L2 norm.

The Gram-Schmidt algorithm repeatedly combines the following two steps in a particular order

  • normalize a vector to have unit norm

  • orthogonalize the next vector

To begin, we set u1=a1 and then normalize:

u1=a1,   e1=u1||u1||

We orgonalize first to compute u2 and then normalize to create e2:

u2=a2(a2·e1)e1,   e2=u2||u2||

We invite the reader to verify that e1 is orthogonal to e2 by checking that e1e2=0.

The Gram-Schmidt procedure continues iterating.

Thus, for k=2,,n1 we construct

uk+1=ak+1(ak+1·e1)e1(ak+1·ek)ek,   ek+1=uk+1||uk+1||

Here (ajei) can be interpreted as the linear least squares regression coefficient of aj on ei

  • it is the inner product of aj and ei divided by the inner product of ei where eiei=1, as normalization has assured us.

  • this regression coefficient has an interpretation as being a covariance divided by a variance

It can be verified that

A=[a1a2an]=[e1e2en][a1·e1a2·e1an·e10a2·e2an·e200an·en]

Thus, we have constructed the decomposision

A=QR

where

Q=[a1a2an]=[e1e2en]

and

R=[a1·e1a2·e1an·e10a2·e2an·e200an·en]

3.3.2. A not square#

Now suppose that A is an n×m matrix where m>n.

Then a QR decomposition is

A=[a1a2am]=[e1e2en][a1·e1a2·e1an·e1an+1e1ame10a2·e2an·e2an+1e2ame200an·enan+1enamen]

which implies that

a1=(a1e1)e1a2=(a2e1)e1+(a2e2)e2an=(ane1)e1+(ane2)e2++(anen)enan+1=(an+1e1)e1+(an+1e2)e2++(an+1en)enam=(ame1)e1+(ame2)e2++(amen)en

3.4. Some Code#

Now let’s write some homemade Python code to implement a QR decomposition by deploying the Gram-Schmidt process described above.

import numpy as np
from scipy.linalg import qr
def QR_Decomposition(A):
    n, m = A.shape # get the shape of A

    Q = np.empty((n, n)) # initialize matrix Q
    u = np.empty((n, n)) # initialize matrix u

    u[:, 0] = A[:, 0]
    Q[:, 0] = u[:, 0] / np.linalg.norm(u[:, 0])

    for i in range(1, n):

        u[:, i] = A[:, i]
        for j in range(i):
            u[:, i] -= (A[:, i] @ Q[:, j]) * Q[:, j] # get each u vector

        Q[:, i] = u[:, i] / np.linalg.norm(u[:, i]) # compute each e vetor

    R = np.zeros((n, m))
    for i in range(n):
        for j in range(i, m):
            R[i, j] = A[:, j] @ Q[:, i]

    return Q, R

The preceding code is fine but can benefit from some further housekeeping.

We want to do this because later in this notebook we want to compare results from using our homemade code above with the code for a QR that the Python scipy package delivers.

There can be be sign differences between the Q and R matrices produced by different numerical algorithms.

All of these are valid QR decompositions because of how the sign differences cancel out when we compute QR.

However, to make the results from our homemade function and the QR module in scipy comparable, let’s require that Q have positive diagonal entries.

We do this by adjusting the signs of the columns in Q and the rows in R appropriately.

To accomplish this we’ll define a pair of functions.

def diag_sign(A):
    "Compute the signs of the diagonal of matrix A"

    D = np.diag(np.sign(np.diag(A)))

    return D

def adjust_sign(Q, R):
    """
    Adjust the signs of the columns in Q and rows in R to
    impose positive diagonal of Q
    """

    D = diag_sign(Q)

    Q[:, :] = Q @ D
    R[:, :] = D @ R

    return Q, R

3.5. Example#

Now let’s do an example.

A = np.array([[1.0, 1.0, 0.0], [1.0, 0.0, 1.0], [0.0, 1.0, 1.0]])
# A = np.array([[1.0, 0.5, 0.2], [0.5, 0.5, 1.0], [0.0, 1.0, 1.0]])
# A = np.array([[1.0, 0.5, 0.2], [0.5, 0.5, 1.0]])

A
array([[1., 1., 0.],
       [1., 0., 1.],
       [0., 1., 1.]])
Q, R = adjust_sign(*QR_Decomposition(A))
Q
array([[ 0.70710678, -0.40824829, -0.57735027],
       [ 0.70710678,  0.40824829,  0.57735027],
       [ 0.        , -0.81649658,  0.57735027]])
R
array([[ 1.41421356,  0.70710678,  0.70710678],
       [ 0.        , -1.22474487, -0.40824829],
       [ 0.        ,  0.        ,  1.15470054]])

Let’s compare outcomes with what the scipy package produces

Q_scipy, R_scipy = adjust_sign(*qr(A))
print('Our Q: \n', Q)
print('\n')
print('Scipy Q: \n', Q_scipy)
Our Q: 
 [[ 0.70710678 -0.40824829 -0.57735027]
 [ 0.70710678  0.40824829  0.57735027]
 [ 0.         -0.81649658  0.57735027]]


Scipy Q: 
 [[ 0.70710678 -0.40824829 -0.57735027]
 [ 0.70710678  0.40824829  0.57735027]
 [ 0.         -0.81649658  0.57735027]]
print('Our R: \n', R)
print('\n')
print('Scipy R: \n', R_scipy)
Our R: 
 [[ 1.41421356  0.70710678  0.70710678]
 [ 0.         -1.22474487 -0.40824829]
 [ 0.          0.          1.15470054]]


Scipy R: 
 [[ 1.41421356  0.70710678  0.70710678]
 [ 0.         -1.22474487 -0.40824829]
 [ 0.          0.          1.15470054]]

The above outcomes give us the good news that our homemade function agrees with what scipy produces.

Now let’s do a QR decomposition for a rectangular matrix A that is n×m with m>n.

A = np.array([[1, 3, 4], [2, 0, 9]])
Q, R = adjust_sign(*QR_Decomposition(A))
Q, R
(array([[ 0.4472136 , -0.89442719],
        [ 0.89442719,  0.4472136 ]]),
 array([[ 2.23606798,  1.34164079,  9.8386991 ],
        [ 0.        , -2.68328157,  0.4472136 ]]))
Q_scipy, R_scipy = adjust_sign(*qr(A))
Q_scipy, R_scipy
(array([[ 0.4472136 , -0.89442719],
        [ 0.89442719,  0.4472136 ]]),
 array([[ 2.23606798,  1.34164079,  9.8386991 ],
        [ 0.        , -2.68328157,  0.4472136 ]]))

3.6. Using QR Decomposition to Compute Eigenvalues#

Now for a useful fact about the QR algorithm.

The following iterations on the QR decomposition can be used to compute eigenvalues of a square matrix A.

Here is the algorithm:

  1. Set A0=A and form A0=Q0R0

  2. Form A1=R0Q0 . Note that A1 is similar to A0 (easy to verify) and so has the same eigenvalues.

  3. Form A1=Q1R1 (i.e., form the QR decomposition of A1).

  4. Form A2=R1Q1 and then A2=Q2R2 .

  5. Iterate to convergence.

  6. Compute eigenvalues of A and compare them to the diagonal values of the limiting An found from this process.

Remark: this algorithm is close to one of the most efficient ways of computing eigenvalues!

Let’s write some Python code to try out the algorithm

def QR_eigvals(A, tol=1e-12, maxiter=1000):
    "Find the eigenvalues of A using QR decomposition."

    A_old = np.copy(A)
    A_new = np.copy(A)

    diff = np.inf
    i = 0
    while (diff > tol) and (i < maxiter):
        A_old[:, :] = A_new
        Q, R = QR_Decomposition(A_old)

        A_new[:, :] = R @ Q

        diff = np.abs(A_new - A_old).max()
        i += 1

    eigvals = np.diag(A_new)

    return eigvals

Now let’s try the code and compare the results with what scipy.linalg.eigvals gives us

Here goes

# experiment this with one random A matrix
A = np.random.random((3, 3))
sorted(QR_eigvals(A))
[-0.6339352457654999, 0.2857791957067247, 1.3315117001699257]

Compare with the scipy package.

sorted(np.linalg.eigvals(A))
[-0.6339352457654991, 0.2857791957067246, 1.3315117001699248]

3.7. QR and PCA#

There are interesting connections between the QR decomposition and principal components analysis (PCA).

Here are some.

  1. Let X be a k×n random matrix where the jth column is a random draw from N(μ,Σ) where μ is k×1 vector of means and Σ is a k×k covariance matrix. We want n>>k – this is an “econometrics example”.

  2. Form X=QR where Q is k×k and R is k×n.

  3. Form the eigenvalues of RR, i.e., we’ll compute RR=P~ΛP~.

  4. Form XX=QP~ΛP~Q and compare it with the eigen decomposition XX=PΛ^P.

  5. It will turn out that that Λ=Λ^ and that P=QP~.

Let’s verify conjecture 5 with some Python code.

Start by simulating a random (n,k) matrix X.

k = 5
n = 1000

# generate some random moments
𝜇 = np.random.random(size=k)
C = np.random.random((k, k))
Σ = C.T @ C
# X is random matrix where each column follows multivariate normal dist.
X = np.random.multivariate_normal(𝜇, Σ, size=n)
X.shape
(1000, 5)

Let’s apply the QR decomposition to X.

Q, R = adjust_sign(*QR_Decomposition(X.T))

Check the shapes of Q and R.

Q.shape, R.shape
((5, 5), (5, 1000))

Now we can construct RR=P~ΛP~ and form an eigen decomposition.

RR = R @ R.T

𝜆, P_tilde = np.linalg.eigh(RR)
Λ = np.diag(𝜆)

We can also apply the decomposition to XX=PΛ^P.

XX = X.T @ X

𝜆_hat, P = np.linalg.eigh(XX)
Λ_hat = np.diag(𝜆_hat)

Compare the eigenvalues that are on the diagonals of Λ and Λ^.

𝜆, 𝜆_hat
(array([   8.7138404 ,  139.48713424,  696.92519758, 1241.07419969,
        7679.48828897]),
 array([   8.7138404 ,  139.48713424,  696.92519758, 1241.07419969,
        7679.48828897]))

Let’s compare P and QP~.

Again we need to be careful about sign differences between the columns of P and QP~.

QP_tilde = Q @ P_tilde

np.abs(P @ diag_sign(P) - QP_tilde @ diag_sign(QP_tilde)).max()
2.9559688030644793e-15

Let’s verify that XX can be decomposed as QP~ΛP~Q.

QPΛPQ = Q @ P_tilde @ Λ @ P_tilde.T @ Q.T
np.abs(QPΛPQ - XX).max()
7.73070496506989e-12