A.  MATRIX STRUCTURE AND NOTATION

1) A matrix is a rectangular arrangement of numbers.  The matrix is usually denoted by a capital letter.

                        A =                              D =

2) The dimensions of a matrix are given by the number of rows and columns in the matrix (i.e. the dimensions are r by c). For the matrices above,

                                    A  is  2 by 2

                                    D  is  4 by 3

3) The individual elements of a matrix can be referred to by specifying the row and column in which it occurs. Lower case numbers are used to represent individual elements, and should match the upper case letter used to denote matrix. For example, individual elements from matrices A and D above can be referred to as,

                                    a11 =  1

                                    a21  =  7

                                    d22  =  6

                                    d12  =  2

B.  TYPES OF MATRICES

1) Square matrix - the number of rows and columns are equal.  Matrix A above is a square matrix (2 by 2), matrix D is not (4 by 3).  A symmetric matrix is an important variation of the square matrix.  In a symmetric matrix, the value in position “ij" equals the value in position “ji" (where i ¹ j).  For example, if c31 = 5 then c13 is also 5. 

2) Scalar - a single number can be thought of as a 1 by 1 matrix and is called a scalar.

3) Vector - a single column or single row of numbers is called a vector.  The dimensions of a row vector are (1 by c), where "c" is the number of columns, and the dimensions of a column vector (r by 1), where "r" is the number of rows.

4) Identity matrix - this special square matrix consists of all ones on the main diagonal, or principal diagonal, and zeros in all the off diagonal positions. The following are examples of identity matrices,

                        E  =                        F  = 

The diagonal matrix is a generalization of the identity matrix. A diagonal matrix can have any value on the main diagonal, but also has zeros in the off diagonal positions.

C. MATRIX TRANSPOSE

The transpose of a matrix consists of a new matrix such that the rows of the original matrix become the columns of the transpose matrix.  The transpose matrix is denoted with the same letter as the original matrix followed by a prime (e.g. the transpose of X is X).

                        D =                                    D  =   

D.  MATRIX ADDITION AND SUBTRACTION

Matrices to be added or subtracted must be of the same dimensions.  Each element of the first matrix, (a) is added (or subtracted) from the corresponding element of the second matrix, (b).

            A =    B =    A+B =  =

E.  MATRIX MULTIPLICATION

Multiplication by a scalar - in this type of multiplication each element of the matrix is simply multiplied, element by element, by the scalar value.

            A =    B  =  [7]           A * B  =  7 * =    

Element by element multiplication - matrix multiplication is not usually done by matching each i,jth element of one matrix with the corresponding ijth element of the second matrix.  This is called elementwise multiplication and it is not the normal mode of matrix multiplication and should not be used unless specifically requested.  

The standard method of matrix multiplication requires that the number of columns in the first matrix equal the number of rows in the second matrix.  If the first matrix is (r by c) and the second is (r by c), in order to multiply the matrices, c must equal r. The resulting matrix will have the dimensions (r by c).

Multiplication is accomplished by summing the cross products of each row of the first matrix and each column of the second matrix.

                        A  =  X  =   

Since A is 3 rows by 2 columns, and X is 2 by 2, then the columns of the first matrix equals the rows of the second matrix, and the matrices may be multiplied. 

A*X =  *  =   =

 

the new dimensions for the product of A * X are,

 

 

¯

must be equal

 ¯

 

 

(3

x

2)

x

(2

x

3)

­


 

 

new dimensions

 

 

­


 

Note that though we can multiply A * X, we could not have done the multiplication the other way (i.e. X * A), since the dimensions would not have matched.  That is, we could pre-multiply by A, but could not pre-multiply by X. 

F. SIMPLE MATRIX INVERSION (2 by 2 matrix only)

Matrices are not “divided", but may be inverted.  Instead of “dividing" A by B, one would multiply A by the inverse of B.  The inverse of a (2 by 2) matrix is given by,

                        A  =                A–1  =   

The scalar value resulting from the calculation “(a%d) – (b%c)" is called the determinant. The matrix cannot be inverted unless the inverse of the determinant exists (is defined).  It will not exist in a case such as the one below since (1+0) is not defined.   

            A  =                 A–1  =   

This occurs in regression when two variables are linearly related. 

An example of the inversion of a 2 * 2 matrix is given below. 

            B  =      B–1  =  

Note that a matrix times its inverse (i.e.  B % B–1) results in an identity matrix.  By definition, the inverse of a matrix G is a matrix which when multiplied by G produces an identity matrix, or G%G–1=I. 

G. SIMPLE LINEAR REGRESSION

Solving a simple linear regression with matrices requires the same values used for an algebraic solution from summation notation formulas.  These are;

            n ,               ,       ,       ,         ,    

where n is the size of the sample of data. To obtain these values in the matrix form we start with the matrix equivalent of the individual values of X and Y, the raw data matrices. 

                                      

The column of ones is necessary, and represents the intercept. Omitting this column would force the regression through the origin. The next step in the calculations is to obtain the X¢X, X¢Y and Y¢Y matrices. These calculations provide the sums of squares and cross products.

             =

             =

           

The regression coefficients, b0 and b1, are then given by, B = (X¢X)–1X¢Y, where

           

and since

                         

where Sxx is the corrected sum of squares of X. 

Then                

and the regression coefficients can be obtained by,

 =

                        =  =  =  =

The remaining calculations usually needed to complete the compliment of calculations for the simple linear regression is the sum of squared deviations or error term.  The matrix formula is

            SSE  =  Y¢Y – B¢X¢Y  = SY2 –  [b0  b1]

                        = SY2 –  (b0*SY + b1*SXY) =    UCSSTotal  –  UCSSReg

 

These calculations produce the same algebraic equations for b0, b1, and SSE that are given in most statistics texts.  The advantage of using the matrix version of the formulas is that the matrix equations given above will work equally well for multiple regression with two or more independent variables.

The ANOVA table calculated with matrix formulas is

 

Uncorrected

 

Corrected

 

Source

 d.f.

    Sum of Squares

d.f.

Sum of Squares

Regression

   2

   B¢X¢Y

1

B¢X¢Y – CF

Error

n–2

  Y¢Y–B¢X¢Y

n–2

Y¢Y – B¢X¢Y

Total

   n

  Y¢Y

n–1

Y¢Y – CF

            where the correction factor is calculated as usual,

The value for R2 is calculated as , and is often expressed as a percent. Note that this calculation employs corrected sums of squares for both SSRegression and SSTotal. 

The Mean Squares (MS) for the SSRegression and SSError are calculated by dividing the SS (corrected sum of squares) by their d.f. (degrees of freedom).  The test of hypotheses for [H0:= b1] is then calculated as; 

           

or

           

where  is obtained from the VARIANCE  COVARIANCE matrix. 

The VARIANCE  COVARIANCE matrix is calculated as from the (X¢X)–1 matrix. 

           

where the cij values are called Gaussian multipliers.  The VARIANCE-COVARIANCE matrix is then calculated from this matrix by multiplying by the MSError. 

           

The individual values then provide the variances and covariances such that

            MSE*c00  =  Variance of b0  =  VAR(b0)

            MSE*c11  =  Variance of b1  =  VAR(b1),   so

            MSE*c01  =  MSE*c10  =  Covariance of b0 and b0\1  =  COV(b0,b1)

It is important to note that the variances and covariances calculated from the (X¢X)–1 are for the bi (bi estimates), not for the Xi values. Also, COV(b0,b1) ¹ COV(X0,X1).

 


Application of matrix procedures to multiple regression first requires calculation of the X¢X, X¢Y and Y¢Y matrices, where for dependent variable Y and independent variables X1 and X2.  For a 2 factor multiple regression, these matrices are;

                   

As with the simple linear regression, these sums, sums of squares and cross products are required by any method of fitting multiple regression. Once these values are obtained, application of formulas for an algebraic solution is relatively easy for a two-factor model.  However, matrix procedures are more easily expanded to more than two independent variables than are summation notation formulas.

The inversion technique we will use is called the sweepout technique, and it requires the application of “row operations".  Row operations consist of (1) multiplying any row by a scalar value, and (2) adding or subtracting any row from any other row. These are the only operations required to complete the sweepout technique after the matrices have been obtained and augmented. 

Obtaining a maximum of information from the technique requires reducing the X¢X matrix one column at a time to an identity matrix. However, values of the regression coefficients, error sum of squares and inverse matrix will be correct even of the row operations are not applied in a column by column reduction.

By “sweeping" out each column of the X¢X matrix one by one to obtain an identity matrix, the sequentially adjusted sums of squares error can also be obtained.  This requires augmenting the X¢X matrix with the X¢Y matrix and an identity matrix prior to applying the row operations. The complete augmented matrix is given below.  The matrix has separate sections that are recognizable as matrices seen earlier.  This type of sectioned matrix is called a partitioned matrix.  

Sections of the matrix may be left off if less information is required.  For example, if only the regression coefficients are needed, then the sweepout technique need be applied only to the matrix, ,

and if only the inverse is required, the only matrix needed is

.

The regression coefficients and sum of squares error can be obtained by sweeping out the matrix,

.

If the above matrix is swept out column by column, then it will also provide the sequentially adjusted sums of squares.  Only the use of the complete augmented matrix provides the inverted X¢X matrix necessary to obtain the variance - covariance matrix, confidence limits and other types of sums of squares. 

The technique will be illustrated with an example using data from Snedecor and Cochran (1981; ex. 17.2.1). The example will employ the complete augmented matrix.  The original data matrices are;

            X¢X =    X¢Y=          Y¢Y= [103075]

The augmented matrix to be swept is then,

The first step in the sweepout technique is to multiply through the first row by the inverse of 17.  This will result in a value of 1 in the first row - first column.  A multiple of this new first row is then subtracted from each of the other rows (2, 3 and 4).  The multiplier should be such that value(i,1)–[value(1,1)*multiplier] = 0 for i ≠ 1.

 

The multiplier which accomplishes this is simply the value(i,1) since the new value(1,1) is unity (1). Therefore, every value(i,j) will be processed in the same way. The calculations would be,

            for row 2: value(2,j) – (value(1,j)  * 118.2)

            for row 3: value(3,j) – (value(1,j)  *   700)

            for row 4: value(4,j) – (value(1,j)  *  1295)

 

After applying these transformations we obtain the following matrix,

COLUMN 1 SWEEP

At this point the effect of X (the intercept) has been removed from the model.  The value replacing Y¢Y is 4426.471.  This is the corrected sum of squares of Y (i.e. Y was 103075, and has now been corrected for the mean,  yielding 4426.47). 

The sweepout now proceeds to the second column.  A value of 1 is needed in the second column - second row to proceed with the development of the identity matrix. This is obtained by multiplying through the second row by the inverse of the value presently in that position (i.e. 1519.30).  Then, the appropriate multiple of the new row 2 is subtracted from each of the other rows. Note that the first column remains unchanged since the value subtracted is always a multiple of zero.

 

COLUMN 2 SWEEP

The sweep then proceeds with the third column. Once again a value of 1 is required in row 3, column 3, and all rows other than row 3 will have a multiple of row 3 subtracted from them.

 

COLUMN 3 SWEEP

 

Once this swept out matrix has been obtained, most commonly desired calculations follow easily.  Some of these results are discussed below. 

 

There are also several checks which can be done on the calculations.  As the matrix is swept out, the null matrix (matrix of zeroes in the original augmented matrix) is replaced by the negative values of the regression coefficients if the calculations have been done correctly. As a second check, the product of the original X¢X matrix and its inverse should produce an identity matrix. (i.e. X¢X * (X¢X)-1 = I )

REGRESSION COEFFICIENTS

The regression coefficients are produced during the sweepout, replacing the X¢Y matrix. The model for the analysis above is,

                       

                       

SEQUENTIALLY ADJUSTED SUMS OF SQUARES 

As each column is swept out, the sums of squares are “adjusted" for the factor removed. The first sweep adjusts for the intercept (i.e. 1 = n) on the diagonal of X¢X, so the reduction in the Y is the correction factor or the adjustment for the mean.

                        e.g.  C.F. = 103075 - 4426.470 = 98648.530

The second sweep adjusts for the second term in the X matrix, usually X, and the reduction in the error term is that sum of squares attributable to X (given that X is already in the model). 

                        e.g.  SS(X|X) = 4426.470 - 2131.236 = 2295.234

The third sweep adjusts for X and the reduction in the sum of squares is attributable to X (given that X and X are already in the model).

                        e.g.  SS(X|X X) = 2131.236 - 2101.291 = 29.945 

Finally, the remaining sum of squares is the error sum of squares

                        SSE = 2101.291

Note that since the variables are adjusted sequentially, the sums of squares obtained are dependent on the order in which the variables are entered. That is, if we had entered X first and X second, the sums of squares attributable to these two variables would not be the same as the results obtained above.  Only the correction factor would be the same (since it would have been entered first in both models).

Each adjustment of the sum of squares takes one degree of freedom. The residual sum of squares has (nk) degrees of freedom, where n is the number of observations, and k is the number of sweeps, or the number of columns in the X¢X matrix.  The mean square error is then,

                       

PARTIAL SUMS OF SQUARES

Since the sequentially adjusted sums of squares are dependent on the order in which the variables are entered, another value of interest is the partial sum of squares or the uniquely attributable sum of squares.  This is simply the sum of squares that would be accounted for by each variable if it had been entered into the model in last place.  This value could be obtained by reversing the sweep operation, and observing the change in the sum of squares as each variable was swept back into the model. 

The only change in sum of squares when a variable is swept back into the model is,  bc,

So this calculation will give the partial SS due to variable X without actually doing all the calculations necessary to reverse the sweepout technique.  The elements (c) are obtained from the (X¢X)-1 matrix and are called Gaussian multipliers. 

The partial SS due to X above does not change since it was the variable in the last position.  The partial SS due to X would be calculated as,

                         

 

VARIANCE  COVARIANCE MATRIX 

Another major result of the sweepout technique is the inverse of the X¢X matrix.  Multiplying this matrix by the mean square error (MSE) gives the variance - covariance matrix of the regression coefficients.

            e.g.       VarCov  =  MSE * (X¢X)-1 =

                       

so, Var(b0)=97.0149,  Var(b1)=0.1175,  Var(b2)=0.0618,  Var(b12)=0.0340, etc. 

The variance - covariance matrix can also be used to obtain confidence intervals about estimates of  for particular values of X and X. The most versatile approach is to use matrix algebra in these calculations.  The equation is

                       

where L is a vector of values for X corresponding to .  It may also be a vector of hypothesized X values for which a variance is needed. 

For example, if we wish to predict the response () and its variance when X = 4 and X = 24, first we would calculate the response,

                         

Using L = [ 1  4  24 ], (note that a 1 is included for the intercept) the variance of the estimate is then,

                       

and the standard error is     =  4.9677. 

 

The sweepout technique is not the only method of matrix inversion. However, its application to the augmented matrix described above is a relatively simple and versatile method of obtaining most of the results commonly desired from a multiple regression analysis. 

 

REFERENCES

Goodnight, J. H. 1978. The Sweep Operator: Its importance in statistical computing. in Proc. Eleventh Annual Symposium on the INTERFACE. Gallant A. R. and T. M. Gerig (ed), Inst. Statistics, N. C. State University, Raleigh, N. C.




Modified: August 16, 2004
James P. Geaghan