A retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. There are roughly two controls per case of CHD. Many of the CHD positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their CHD event. In some cases the measurements were made after these treatments. These data are taken from a larger dataset, described in Rousseauw et al, 1983, South African Medical Journal.
sbp
systolic blood pressure
tobacco
cumulative tobacco (kg)
ldl
low densiity lipoprotein cholesterol
adiposity
famhist
family history of heart disease (Present, Absent)
typea
type-A behavior
obesity
alcohol
current alcohol consumption
age
age at onset
chd
response, coronary heart disease
Let's import the data.
#You can import data directly from the EOSL website
#loc="http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.data"
loc="C:/Users/bli/Desktop/AppStat/datasets/SAheart.data"
tmp1=read.table(loc, sep=",",head=T,row.names=1)
head(tmp1)
## sbp tobacco ldl adiposity famhist typea obesity alcohol age chd
## 1 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1
## 2 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1
## 3 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 0
## 4 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 1
## 5 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 1
## 6 132 6.20 6.47 36.21 Present 62 30.77 14.14 45 0
Let's first change the famhist
to binary 0 and 1, and save the \(X\) and \(Y\) separately (convenient for glmnet
working environment.)
table(tmp1$famhist)
##
## Absent Present
## 270 192
tmp1$famhist=as.numeric(tmp1$famhist=="Present")
table(tmp1$famhist)
##
## 0 1
## 270 192
x=as.matrix(tmp1[,1:9])
y=tmp1[,10]
library(glmnet)
fit = glmnet(x, y, family = "binomial", alpha=1) #LASSO type penalty
plot(fit, xvar="norm") #Solution path
Let's run CV to select optimal \(\lambda\).
set.seed(1) #able to reproduce CV results
cv.fit = cv.glmnet(x, y, family = "binomial", type.measure = "class")
plot(cv.fit)
cv.fit
##
## Call: cv.glmnet(x = x, y = y, type.measure = "class", family = "binomial")
##
## Measure: Misclassification Error
##
## Lambda Index Measure SE Nonzero
## min 0.04824 15 0.2662 0.01612 5
## 1se 0.05295 14 0.2771 0.01512 5
coef(cv.fit, s = "lambda.min")
## 10 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) -3.026249239
## sbp .
## tobacco 0.042431208
## ldl 0.077881736
## adiposity .
## famhist 0.484727396
## typea 0.004487505
## obesity .
## alcohol .
## age 0.031404116
pred1=predict(cv.fit,newx=x,s="lambda.min",type="class") #class
table(pred1,y)
## y
## pred1 0 1
## 0 282 99
## 1 20 61
pred2=predict(cv.fit,newx=x,s="lambda.min",type="response") #probability
library(AUC)
plot(roc(pred2,factor(y)))
legend("bottomright",paste("AUC:",round(auc(roc(pred2,factor(y))),d=2),sep=""))