220C - Spring 2009
Example R code for multivariate regression

This gives a quick review of working with multivariate models in R. 

If you want some fancier graphical displays, you can also check out the 'heplots' package.  

 
Code related to HW3

Below is some code you might find useful for homework 3.

## defining a matrix:
##
## The 'matrix' command allows you to pass a list of data and format it into
## a matrix.  It will fill the data down rows by default unless you set
## byrow=TRUE.  If you tell it to make a matrix with more entries than the
## the data you provide will fill, it will repeat the data.

## Example 1:
x = matrix( c(1,2,3,4), nrow=2, ncol=2)  # Creates a 2x2 matrix
x

x = matrix(c(1,2,3,4), nrow=4,ncol=2) # Repeats the data to create a 4x2 matrix
x

## Example 2:
x = matrix( c(1,2,3,4), nrow=2, ncol=2, byrow=TRUE)
x

x = matrix(c(1,2,3,4), nrow=4,ncol=2, byrow=TRUE)
x

## Example 3:
x = matrix (c(6,9,10,6,8,3), nrow=3, ncol=2, byrow=TRUE)
x


## Matrix multiplication is performed using the %*% notation.  You can take a
## transpose of a matrix using the function t().  You can find the inverse of
## a matrix using the solve() command.
##
## All of the usual operators will work on a matrix, by on an
## element-by-element basis, NOT via matrix operations.

## Example 4:

x*x  # Note that this shouldn't be possible, since the dimensions don't
     # conform.  This performs element-wise multiplication.

x%*%x   # Here we try to do a matrix multiplication and get an appropriate
      # error message.

x%*%t(x) # By taking a transpose we can obtain a valid matrix multiplication

## Other useful functions include:  rowMeans
##                                  colMeans
##                                  rowSums
##                                  colSums
## Note that the capitalization is important.  These functions give you
## row-by-row and column-by-column means and sums.

## Example 5: center our data x using colMeans and assembling it into a
## matrix of the right size.

(means = matrix(colMeans(x), byrow=TRUE, nrow=3, ncol=2))

(Centered = x - means)

## Example 6: Create the S-matrix:

(S = (1/2)*t(Centered)%*%Centered)

## We can find S^-1 using solve:

(Sinv = solve(S))

## Now calculate the T statistic for the point (0,0)

U0 = matrix( c(0,0), nrow=1)

(U0mean = colMeans(x)-U0)

3*U0mean%*%Sinv%*%t(U0mean)

## This will be distributed as a constant times an f distribution with
## 2 and (3-1) degrees of freedom: (n-1)p / (n-p) * F_(p,n-p)

qf(.95, 2, 2)

(3-1)*2/(3-2) * qf(.95, 2, 2)

## We have that our T^2 value of 112 exceeds 76, so we reject the
## hypothesis that the data is centered.

## Let's look at a different set of data:

x = matrix(c(3,4,2,6,8,2,4,5,1,2,4,7), nrow=4, ncol=3)

## Now let's say we wish to investigate jointly x1-x2 and x1-x3.  Make the
## contrast vector a:

A = matrix( c(1, -1, 0, 1, 0, -1), nrow=2, byrow=T)

x_new = t(A%*%t(x))

## Are they both centered at zero?  Calculate a T^2:
## First get the centered x_new:

Centered = x_new - matrix(colMeans(x_new), byrow=T, nrow=4, ncol=2)

## Now our S matrix and its inverse
S = (1/3)*t(Centered)%*%Centered
Sinv = solve(S)

## Now our mean vector that we want to test
U0 = matrix(c(0,0),nrow=1)

## Finally calculate our T^2 stat
(T2 = 4*(colMeans(x_new) - U0)%*%Sinv%*%t(colMeans(x_new)-U0))

qf(0.95, 2, 2)

((4-1)*2 / (4-2)) * qf(0.95, 2, 2)

## So we accept the null hypothesis that the differences x1-x2 and x1-x3
## both seem to be centered at (0,0)

################# MANOVA code:

# Here we construct some data, 20 cases from 1 MVN distribution and
# 20 from another.  In this case we're introducing independence, but this works
# fine with dependence as well.
x1 = c(rnorm(20, 0, 1), rnorm(20, 3, 1))
x2 = c(rnorm(20, 1, 4) ,rnorm(20, -1, 2))
x3 = c(rnorm(20, 12, 1), rnorm(20, 10, 1))

# Now combine them all into 1 response.
Y = cbind(x1,x2,x3)

# We'll make a set of indicator variables -- zeros and ones -- to distinguish
# the two groups
indic = c(rep(0,20), rep(1,20))

# Now call manova in much the same way as we do e.g. lm
m1 = manova(Y~indic)

# Show a test summary, indicating that (unsurprisingly) our "indic" variable
# is highly significant, indicating that the means in the two groups are
# different.  We're using Wilk's test since that's what's in the book.  There's
# also the Pillai, Roy, and Hotelling-Lawley tests.
summary(m1, test="Wilks")

# Show an AOV breakdown for each individual response.  Note that x2 isn't as
# significant, which shouldn't be surprising since the variances were fairly
# large.
summary.aov(m1)

 
Office hours
Office hours will be from 1pm to 2pm in my office, South Hall graduate student tower, 5431-R.