|
Below is some code you might find useful for homework 3. ## defining a matrix: ## ## The 'matrix' command allows you to pass a list of data and format it into ## a matrix. It will fill the data down rows by default unless you set ## byrow=TRUE. If you tell it to make a matrix with more entries than the ## the data you provide will fill, it will repeat the data.
## Example 1: x = matrix( c(1,2,3,4), nrow=2, ncol=2) # Creates a 2x2 matrix x
x = matrix(c(1,2,3,4), nrow=4,ncol=2) # Repeats the data to create a 4x2 matrix x
## Example 2: x = matrix( c(1,2,3,4), nrow=2, ncol=2, byrow=TRUE) x
x = matrix(c(1,2,3,4), nrow=4,ncol=2, byrow=TRUE) x
## Example 3: x = matrix (c(6,9,10,6,8,3), nrow=3, ncol=2, byrow=TRUE) x
## Matrix multiplication is performed using the %*% notation. You can take a ## transpose of a matrix using the function t(). You can find the inverse of ## a matrix using the solve() command. ## ## All of the usual operators will work on a matrix, by on an ## element-by-element basis, NOT via matrix operations.
## Example 4:
x*x # Note that this shouldn't be possible, since the dimensions don't # conform. This performs element-wise multiplication.
x%*%x # Here we try to do a matrix multiplication and get an appropriate # error message.
x%*%t(x) # By taking a transpose we can obtain a valid matrix multiplication
## Other useful functions include: rowMeans ## colMeans ## rowSums ## colSums ## Note that the capitalization is important. These functions give you ## row-by-row and column-by-column means and sums.
## Example 5: center our data x using colMeans and assembling it into a ## matrix of the right size.
(means = matrix(colMeans(x), byrow=TRUE, nrow=3, ncol=2))
(Centered = x - means)
## Example 6: Create the S-matrix:
(S = (1/2)*t(Centered)%*%Centered)
## We can find S^-1 using solve:
(Sinv = solve(S))
## Now calculate the T statistic for the point (0,0)
U0 = matrix( c(0,0), nrow=1)
(U0mean = colMeans(x)-U0)
3*U0mean%*%Sinv%*%t(U0mean)
## This will be distributed as a constant times an f distribution with ## 2 and (3-1) degrees of freedom: (n-1)p / (n-p) * F_(p,n-p)
qf(.95, 2, 2)
(3-1)*2/(3-2) * qf(.95, 2, 2)
## We have that our T^2 value of 112 exceeds 76, so we reject the ## hypothesis that the data is centered.
## Let's look at a different set of data:
x = matrix(c(3,4,2,6,8,2,4,5,1,2,4,7), nrow=4, ncol=3)
## Now let's say we wish to investigate jointly x1-x2 and x1-x3. Make the ## contrast vector a:
A = matrix( c(1, -1, 0, 1, 0, -1), nrow=2, byrow=T)
x_new = t(A%*%t(x))
## Are they both centered at zero? Calculate a T^2: ## First get the centered x_new:
Centered = x_new - matrix(colMeans(x_new), byrow=T, nrow=4, ncol=2)
## Now our S matrix and its inverse S = (1/3)*t(Centered)%*%Centered Sinv = solve(S)
## Now our mean vector that we want to test U0 = matrix(c(0,0),nrow=1)
## Finally calculate our T^2 stat (T2 = 4*(colMeans(x_new) - U0)%*%Sinv%*%t(colMeans(x_new)-U0))
qf(0.95, 2, 2)
((4-1)*2 / (4-2)) * qf(0.95, 2, 2)
## So we accept the null hypothesis that the differences x1-x2 and x1-x3 ## both seem to be centered at (0,0) ################# MANOVA code: # Here we construct some data, 20 cases from 1 MVN distribution and # 20 from another. In this case we're introducing independence, but this works # fine with dependence as well. x1 = c(rnorm(20, 0, 1), rnorm(20, 3, 1)) x2 = c(rnorm(20, 1, 4) ,rnorm(20, -1, 2)) x3 = c(rnorm(20, 12, 1), rnorm(20, 10, 1))
# Now combine them all into 1 response. Y = cbind(x1,x2,x3)
# We'll make a set of indicator variables -- zeros and ones -- to distinguish # the two groups indic = c(rep(0,20), rep(1,20))
# Now call manova in much the same way as we do e.g. lm m1 = manova(Y~indic)
# Show a test summary, indicating that (unsurprisingly) our "indic" variable # is highly significant, indicating that the means in the two groups are # different. We're using Wilk's test since that's what's in the book. There's # also the Pillai, Roy, and Hotelling-Lawley tests. summary(m1, test="Wilks")
# Show an AOV breakdown for each individual response. Note that x2 isn't as # significant, which shouldn't be surprising since the variances were fairly # large. summary.aov(m1)
|