|
Thursday, 27 November 2008 07:59 |
attach(stackloss) library(MASS)
## We won't cover doing least squares since everyone should be pretty ## comfortable with that at this point.
###################################### ## Huber method ######################################
## The MASS library contains the rlm function, ("robust linear models") ## which will let us do the Huber estimator without a lot of grief.
m_huber <- rlm(stack.loss~Air.Flow+Water.Temp+Acid.Conc., data=stackloss) summary(m_huber)
## If we want to play with the constant c, then we need to invoke rlm ## a bit differently
# First we inspect psi.huber to see the defaults psi.huber
# define a new weight function with a different constant, in this case 1.1 psi.huber2<-function(u)psi.huber(u, k=1.1)
# Observe the difference psi.huber(c(1,1.1,1.11, 1.345, 1.346)) psi.huber2(c(1,1.1,1.11, 1.345, 1.346))
# refit m_huber_2 <- rlm(stack.loss~., data=stackloss, psi=psi.huber2)
###################################### ## LAD ######################################
## We could just recycle the rlm function, since it does basic m-estimation, ## by passing it a different m function. Recall that we can get a weight ## function of 1/u for LAD...
psi.LAD <- function(u)1/u
m_LAD <- rlm(stack.loss~., data=stackloss, psi=psi.LAD)
## But we run into a missing weights problem (division by zero). There's two ## ways to fix this. The 'cheap trick' way is to simply assign the weight ## function the biggest weight possible if u=0...
psi.LAD2 <- function(u, deriv=0) return(abs(pmin(.Machine$double.xmax, 1/u))) m_LAD_hack <- rlm(stack.loss~., data=stackloss, psi=psi.LAD2) summary(m_LAD_hack)
## A better way is probably to get the quantreg package and use the rq ## function in that.
library(quantreg) m_LAD <- rq(stack.loss~., data=stackloss) summary(m_LAD)
## It turns out our nasty hack wasn't actually all that bad an approximation.
################### ## Least trimmed squares ###################
## Faraway recommends the 'lqs' package for this, which has since been rolled ## into MASS, so no need to install yet another package. You can use the ## ltsreg as in faraway, or you can use the lqs function (which does exactly ## the same thing; ltsreg is just a 'wrapper' function for lqs at this time)
## There is a catch, though. m_lts1 = ltsreg(stack.loss ~., data=stackloss) m_lts2 = ltsreg(stack.loss ~., data=stackloss) m_lts1$coeff m_lts2$coeff
## if you have more than one predictor, it's a complex problem to get the ## minimization exact. R does a random search to try and minimize it, with ## the net effect that you might see a small difference in results each time ## you run it. For SMALL data sets you can add 'nsamp=exact' to the command:
m_lts_exact = ltsreg(stack.loss ~., data=stackloss, nsamp='exact')
## But the computational time goes up something link k^n, where n is the size ## of the data set, so it doesn't take much to make it totally unreasonable.
############################################################ ### Bootstrapping ############################################################ ## I'm only doing these two for the linear model. You should also think about what might ## be appropriate to do for the other models/estimation techniques. ################### ##### Case resampling ###################
repl = 1000 coeff=matrix(0,1000,4)
for(i in 1:repl){ temp_model <- lm(stack.loss ~ ., data = stackloss[sample(21, rep=T),]) coeff[i,] = temp_model$coeff }
par(mfrow=c(2,2)) plot(density(coeff[,1]), main="Intercept") plot(density(coeff[,2]), main="Air.Flow") plot(density(coeff[,3]), main="Water.temp") plot(density(coeff[,4]), main="Acid.Conc.")
################### ##### Error resampling ###################
## We'll just use the raw residuals. Is this the best choice?
lm_resids = lm(stack.loss ~., data=stackloss)$resi lm_coeffs = lm(stack.loss ~., data=stackloss)$coeff
for(i in 1:repl){ newYi = as.matrix(cbind(rep(1,21), Air.Flow, Water.Temp, Acid.Conc.)) %*% (as.matrix(lm_coeffs)) + lm_resids[sample(21, rep=T)] temp_model <- lm(newYi ~ Air.Flow + Water.Temp + Acid.Conc.) coeff[i,] = temp_model$coeff } par(mfrow=c(2,2)) plot(density(coeff[,1]), main="Intercept") plot(density(coeff[,2]), main="Air.Flow") plot(density(coeff[,3]), main="Water.temp") plot(density(coeff[,4]), main="Acid.Conc.")
|
|
Thursday, 20 November 2008 18:39 |
|
Here's the code: ## Local regression # # When we can't fit a linear model, often it makes sense to fit a piecewise # model, where we estimate the function at a particular point by using # information from nearby points. # # Two examples of this are spline fitting and LOESS (or LOWESS) regression. # We already applied spline fitting to the mcycle data in MASS (and Dr. Wang teaches # an excellent class on smoothing splines), so let's look at LOESS. # ######### # LOESS regression ######### # We can approximate any function in a neighborhood about a single data point # by it's taylor series expansion. Denote the expansion about x at the point xi # # f(xi|x) ~~ a0 + a1(xi-x) + 1/2 a2(xi - x)^2 + 1/6 a3(xi - x)^3 + ... # # We will truncate this at the second term (although we can use more if we # want) and try to find the coefficients a0, a1, a2. # # We want to consider points "in the neighborhood" of x, and apply more # weight to those points that are closer to x. There are many weight functions # that can be used, but we're going to use the most common one, the "tri-cube" # function: # # w(x) = (1 - abs(x)^3)^3 for |x| < 1 # 0 for |x| >= 1 # # Let's also introduce a "bandwidth" parameter -- h -- which governs "how # far" in each direction we're willing to look (the size of the "neighborhood" # around x). # # For each point xi around x, we'll assign it weight w((xi-x)/h); note # that for |xi-x|>h, this will make the weight 0 (since w(x) = 0 for |x| > 1).
library(MASS) data(mcycle) attach(mcycle) plot(mcycle)
# Let's find the local fit around 14 msec; roughly when the first 'elbow' in # in the data is, and use a bandwidth of 1.5.
BW = 5 ctr = 15
# Here's our weight function w <- function(x) max((1 - abs(x)^3)^3,0)
# And here's a plot of it tx = as.matrix(seq(-1,1, len=100)) plot(apply(tx,1,w))
# generate the scaled delta-times dt = as.matrix((times - ctr)/BW)
# and the list of weights we need for WLS wlist = apply(dt, 1, w)
# Now make our predictors: (xi-x), (xi-x)^2 X = cbind( times-ctr, (times-ctr)^2)
# And use the 'weights' option in lm to do WLS regression M_local = lm(accel~X, weights = wlist) summary(M_local) B = M_local$coeff
local_x = as.matrix(seq(ctr-BW/5, ctr+BW/5, len=5*BW)) local_fit<-function(x) B[1] + B[2]*(x-ctr) + (B[3]/2)*(x-ctr)^2
pred = apply(local_x, 1, local_fit) plot(local_x,pred, xlim=c(min(times), max(times)), ylim=c(min(accel), max(accel))) points(mcycle)
## Obviously, if we shift the center a bit we should expect a different fit, # but it should be close.
ctr = 13 BW = 5
B2 = lm(accel~cbind( times-ctr, (times-ctr)^2), weights = apply(as.matrix((times - ctr)/BW), 1, w))$coeff local_fit2<-function(x) B2[1] + B2[2]*(x-ctr) + (B2[3]/2)*(x-ctr)^2 local_x2 = as.matrix(seq(ctr-BW/5, ctr+BW/5, len=5*BW)) pred2 = apply(local_x2, 1, local_fit2)
plot(c(local_x, local_x2), c(pred, pred2), xlim=c(min(times), max(times)), ylim=c(min(accel), max(accel))) points(mcycle, pty=2)
ctr = 16 BW = 5
B3 = lm(accel~cbind( times-ctr, (times-ctr)^2), weights = apply(as.matrix((times - ctr)/BW), 1, w))$coeff local_fit3<-function(x) B3[1] + B3[2]*(x-ctr) + (B3[3]/2)*(x-ctr)^2 local_x3 = as.matrix(seq(ctr-BW/5, ctr+BW/5, len=5*BW)) pred3 = apply(local_x3, 1, local_fit3)
plot(c(local_x, local_x2, local_x3), c(pred, pred2,pred3), xlim=c(min(times), max(times)), ylim=c(min(accel), max(accel))) points(mcycle, pch=46)
local_fit3(16)
## Of course, we can also just use the built-in R package. "span" is the smoothing parameter, and in this case # we are using 25% of the data per point, for a bandwidth of about 14 mloess = loess(accel~times, degree=2, span=0.25) x = seq(min(times), max(times), len=200) plot(times, mloess$fit, type='l',xlim=c(min(times), max(times)), ylim=c(min(accel), max(accel))) points(mcycle)
summary(mloess)
####################### ### #######################
library(assist) ssr(accel~times, kernel = cubic(times), scale = 1)
times2 = (times-min(times))/diff(range(times))
spline = ssr(accel~times2, rk = cubic(times2), scale = 1)
plot(spline, ask=1) 2 0 points(mcycle$accel~times2) |
|
Thursday, 06 November 2008 05:13 |
|
Here's the R code for today's section: library(MASS) attach(mcycle) plot(accel~times)
# This is obviously not linear, fitting a # linear model is inappropriate
m1 = lm(accel~times) summary(m1)
par(mfrow=c(2,2)) plot(m1)
# Terrible: trends in the residuals, clear # nonconstant variance, poor normality # # # Let's try fitting a polynomial model: # y ~ 1 + x + x^2 + x^3 # # We'll use the I() function in the model; this # insures that the actual timepoint squared gets fit # rather than the model term being squared
# lm(y ~ x*y) <- fit y = b1 + b2*x + b3*y +b4*x*y # lm(y ~ I(x*y)) <- fit y= b1 + b2*x*y
m2 = lm(accel ~ times + I(times^2) + I(times^3))
summary(m2) plot(m2)
par(mfrow=c(1,1)) plot(m2$fit~times, type='l') points(accel~times)
# We still have something that looks significant # but the diagnostics don't look that good; the # pattern is still evident in the residuals v # fitted plot. # # Fit a higher polynomial?
m3 = lm(accel ~ times + I(times^2) + I(times^3) + I(times^4) + I(times^5)) par(mfrow=c(2,2)) summary(m3) plot(m3)
par(mfrow=c(1,1)) plot(m3$fit~times, type='l') points(accel~times)
m4 = lm(accel ~ times + I(times^2) + I(times^3) + I(times^4) + I(times^5) + I(times^6) + I(times^7) + I(times^8) + I(times^9)) par(mfrow=c(2,2)) summary(m4) plot(m4)
par(mfrow=c(1,1)) plot(m4$fit~times, type='l') points(accel~times)
X = as.matrix(cbind(rep(1,133), times, times^2, times^3, times^4)) eigen(solve((t(X)%*%X)))
# Note the huge difference in eigenvalues
X2 = as.matrix(cbind(rep(1,133), times, times^2, times^3, times^4, times^5, times^6, times^7)) eigen(solve((t(X2)%*%X2)))
# And the difference gets even worse; near singlular # matrix --> bad fits and high variance
# Another problem -- we've got (obviously) highly # colinear predictors; note the near-zero eigenvalues # # So what can we do? # This is a case where linear models can't help. This # would be more appropriately handled via time series # analysis or nonparametric analysis (which we won't # get into here, but here's a spline model just for # fun).
library(assist) times2 = times/max(times) splinemodel = ssr(accel~times2, rk=cubic(times2)) plot(splinemodel, ask=TRUE) 2 0 points(accel~times2)
################################################################# ################################################################# #################################################################
attach(trees)
LVolume = log(Volume) m1 = lm(LVolume ~ Girth*I(Girth^2)) m2 = lm(LVolume ~ Girth+I(Girth^2)+I(Girth^3))
summary(m1) summary(m2)
# Note that these work out the same -- including # interaction terms between two polynomial terms is # slightly pointless.
m3 = lm(LVolume ~ (Girth+I(Girth^2))*(Height + I(Height^2))) summary(m3) par(mfrow=c(2,2)) plot(m3)
drop1(m3, test='Chisq') drop1(m3, test='F') drop1(m3)
drop1(m3, k=log(31))
step(m3, direction="both")
|
|
Thursday, 30 October 2008 01:03 |
|
Here's the R code from class today, covering two different kinds of nonparametric randomization tests. A = c(13.2, 8.2, 10.9, 14.3, 10.7, 6.6, 9.5, 10.8, 8.8, 13.3) B = c(14, 8.8, 11.2, 14.2, 11.8, 6.4, 9.8, 11.3, 9.3, 13.6)
Diff = A - B Dlist = NULL
for (lc in 1:10000) { flips = 2*rbinom(10, 1, 0.5) - 1 Dlist = c(Dlist, mean(Diff*flips)) }
md = mean(Diff)
sum(Dlist<md)/10000 sum(Dlist>-md)/10000
#########
library(boot)
head(claridge) hist(claridge$hand)
RH = mean(claridge[claridge[,2]==1, 1]) LH = mean(claridge[claridge$hand!=1, 1])
HDlist = NULL
for(i in 1:50000) { hand_r = sample(claridge$hand) HDlist = c(HDlist, mean(claridge[hand_r == 1,1]) - mean(claridge[hand_r!=1,1])) }
sum( (RH-LH) > HDlist)/10000 sum( -(RH-LH) < HDlist)/10000
sum( (RH-LH) > HDlist)/50000 + sum( -(RH-LH) < HDlist)/50000
|
|
Wednesday, 22 October 2008 23:39 |
|
Here's the R code for today: ### First we fix the seed so that we get the same random numbers every time set.seed(1) N = 5 ## Generate a fairly tidy regression xreal = (runif(N)-0.5)*10 yreal = 2.2 * xreal + rnorm(N, 0, 1)
## Nice and linear plot(xreal,yreal)
## Add a high-leverage point that doesn't follow the orginal regression x = c(xreal, 10) y = c(yreal, 3.2*10)
## Note that -- when we 'zoom out' like this -- it still looks pretty linear plot(x,y)
## Here's our "real" linear fit m1 = lm(yreal~xreal) summary(m1)
## And here's our fit with the outliers; notice that it's still ## highly significant. m2 = lm(y~x) summary(m2)
## If we just look at the residuals, we don't see anything wrong. The high ## leverage of the outlier has forced the fit to conform to that outlying ## point, masking it's effect plot(x,m2$res)
## Make our predictor variable matrix... xmat = matrix(c(rep(1, N+1), x), N+1, 2) ## Transform it to a hat matrix... hat = xmat%*%solve(t(xmat)%*%xmat)%*%t(xmat) ## And use the diag command to extract just the diagonal elements; notice the ## high value for the last point diag(hat)
## Or save time and use the influence function... infs = influence(m2) infs$hat
## The diagonal matrix sums to the number of parameters in the model (2: a mean ## term and an intercept term sum(diag(hat))
## We have 12 data points, so our cutoff is... 2*2/(N+1) ## about 1/3 is our cutoff for high leverage. Our outlier qualifies.
## Let's look at standardized residuals: plot(stdres(m2)) ## This still doesn't look too outrageous
## If we plot the jackknife residuals: plot(studres(m2)) ## Now our bad point is obviously out of place.
## If you've forgotten the incredibly useful 'names' function, now's a good time ## to remind yourself of it... m2summary = summary(m2) names(m2summary) m2summary$sig
## Calculate standardized residuals by hand, just for practice r = m2$resid/(m2summary$sigma * sqrt(1 - infs$hat))
cbind(r, stdres(m2))
## Cook's distance: cd = cooks.distance(m2) plot(x,cd)
################################# ################################# ################################# ## Co-linearity N=20 x1 = (runif(N)-0.5)*10 x2 = x1 + 3+(runif(N)-0.5)*2
y = 3*x1 + x2 + rnorm(N)
## Checking the correlations we see that they're actually pretty bad... cor(cbind(y,x1,x2))
## But fit the mode anyway m3 = lm(y~x1+x2) summary(m3) # Note the large variance on each term
xmat = matrix( c(rep(1, N), x1, x2), N, 3) eigen((t(xmat)%*%xmat)) ## We have eigenvalues that differ by an order of magnitude or so
## The cheap fix: drop one or the other co-linear terms. This generally isn't the # ideal approach, but it can sometimes help. Ridge regression is better. m3a = lm(y~x1) summary(m3a)
m3b = lm(y~x2) summary(m3b)
|
|
Thursday, 16 October 2008 18:25 |
|
Here's the R code I used in section. You can just copy and paste this to a script file: library(MASS) data(mammals) attach(mammals)
plot(body,brain)
plot(density(body)) plot(density(brain))
##These look exponential-ish, so let's try a log transform ## and look at the densities again plot(density(log(body))) plot(density(log(brain)))
plot(log(body), log(brain))
lbrain=log(brain) lbody=log(body)
linmodel <- lm(brain~body) logmodel <- lm(lbrain~lbody)
summary(linmodel) summary(logmodel)
qqnorm(body) qqnorm(lbody)
qqnorm(brain) qqnorm(lbrain)
##Look at all the additional data stored about the linear model names(linmodel)
## We can access them with the '$' operator
linmodel$coefficients linmodel$fitted.values linmodel$residuals
## Some commands that can be used with lm objects
## Diagnostic plots par(mfrow=c(2,2)) plot(linmodel) plot(logmodel)
## ANOVA tests anova(linmodel) anova(logmodel)
## Confidence intervals for parameters confint(linmodel) confint(logmodel)
## Extract residuals through a function resid(linmodel) resid(logmodel)
## Standardized residuals plot(stdres(linmodel)) plot(stdres(logmodel))
## Studentized residuals plot(studres(linmodel)) plot(studres(logmodel))
## cook's distance of the data points plot(cooks.distance(linmodel)) plot(cooks.distance(logmodel))
## Find prediction and confidence intervals for new data
newlindata = data.frame(body=c(1, 10, 100)) predict(linmodel, newlindata,interval="prediction")
newlogdata = data.frame(lbody=c(0.1, 0.2, 0.3, 0.4)) predict(logmodel, newlogdata,interval="prediction")
predict(logmodel, interval="confidence", level=0.99999)
##################### ##################### ##################### rm(list=ls(all=TRUE))
data(cement)
attach(cement)
x0=rep(1,13)
x=as.matrix(cbind(x0,x1,x2,x3,x4))
(x)%*%solve(t(x)%*%x)%*%t(x)
### remember that our hat matrix x[(x'x)^-1]x' is what we use to calculate our ### regression coefficients. This looks ok. But...
solve(t(x)%*%x)
### There's a large difference here; nearly singular eigen(t(x)%*%x)
summary(lm(y~x))
### Really sensitive to small changes x2=x x2[12, 4]=11 x2[12, 2]=9
summary(lm(y~x2))
par(mfrow=c(2,2)) plot(lm(y~x)) plot(lm(y~x2))
|
|
Friday, 26 September 2008 20:12 |
|
Hi, and welcome to 220A. My office hours will be from 1:30 to 2:30 pm on Wednesdays. I'll be using this website mostly to post R code and examples. If you don't already have R, it's available on computers in the Rachev room, on most computers in the HSSB computer lab, and also freely available here. This will be the main environment used in the 220ABC series, so you should become familiar with it as quickly as possible. A useful 'cheat sheet' is available here, as well as some examples here (note that some of the later examples use GTK+, which may not be installed on your computer; it's pretty large, so I recommend against installing it unless you know what it is and know you'll use it later). If you have any questions, please email me and I'll respond as quickly as possible. |
|