1          ***********************************************************************;
2 *** EXST7034 Homework Example ***;
3 *** Problem from Neter, Kutner, Nachtsheim, Wasserman 1996, #14.9 ***;
4 ***********************************************************************;
5
6 OPTIONS LS=99 PS=256 NOCENTER NODATE NONUMBER;
7
8 DATA ONE; INFILE CARDS MISSOVER;
9 TITLE1 'EXST7034 - Class Example, NKNW 14.9 : Toxicity experiment';
10 * LABEL X = 'Dose level (log scale)';
11 * LABEL R = 'No of insects which died';
12 * LABEL N = 'No of insects exposed';
13 * LABEL P = 'Mortality proportion';
14 INPUT X R N;
15 P = R / N;
16 LOGIT = LOG(P/(1-P));
17 WT = N*P*(1-P);
18 *** NOTE: expected variance of P is 1/(n*p*(1-p)) ***;
19 CARDS;
NOTE: The data set WORK.ONE has 6 observations and 6 variables.
NOTE: DATA statement used:
real time 0.06 seconds
cpu time 0.06 seconds
19 ! RUN;
26 ;
27 PROC PRINT DATA=ONE; VAR X R N P; TITLE2 'Sorted Raw Data Listing'; RUN;
NOTE: There were 6 observations read from the data set WORK.ONE.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used:
real time 0.04 seconds
cpu time 0.04 seconds
EXST7034 - Class Example, NKNW 14.9 : Toxicity experiment Sorted Raw Data Listing Obs X R N P 1 1 28 250 0.112 2 2 53 250 0.212 3 3 93 250 0.372 4 4 126 250 0.504 5 5 172 250 0.688 6 6 197 250 0.788 28 PROC REG DATA=ONE; TITLE2 'As a Simple linear regression';
29 MODEL P = X; RUN;
NOTE: 6 observations read.
NOTE: 6 observations used in computations.
29 ! QUIT;
NOTE: The PROCEDURE REG printed page 2.
NOTE: PROCEDURE REG used:
real time 0.09 seconds
cpu time 0.09 seconds
EXST7034 - Class Example, NKNW 14.9 : Toxicity experiment As a Simple linear regression The REG Procedure Model: MODEL1 Dependent Variable: P Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 0.34862 0.34862 677.88 <.0001 Error 4 0.00206 0.00051429 Corrected Total 5 0.35068 Root MSE 0.02268 R-Square 0.9941 Dependent Mean 0.44600 Adj R-Sq 0.9927 Coeff Var 5.08472 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.04800 0.02111 -2.27 0.0854 X 1 0.14114 0.00542 26.04 <.0001 30 PROC REG DATA=ONE; TITLE2 'As a weighted regression on logits';
31 MODEL LOGIT = X; WEIGHT WT;
32 OUTPUT OUT=NEXT1 PRED=YHAT; RUN;
NOTE: 6 observations read.
NOTE: 6 observations used in computations.
NOTE: The data set WORK.NEXT1 has 6 observations and 7 variables.
NOTE: The PROCEDURE REG printed page 3.
NOTE: PROCEDURE REG used:
real time 0.08 seconds
cpu time 0.08 seconds
EXST7034 - Class Example, NKNW 14.9 : Toxicity experiment As a weighted regression on logits The REG Procedure Model: MODEL1 Dependent Variable: LOGIT Weight: WT Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 293.43459 293.43459 809.32 <.0001 Error 4 1.45028 0.36257 Corrected Total 5 294.88487 Root MSE 0.60214 R-Square 0.9951 Dependent Mean -0.13651 Adj R-Sq 0.9939 Coeff Var -441.09139 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -2.64011 0.09501 -27.79 <.0001 X 1 0.67308 0.02366 28.45 <.0001 NOTE: exp(0.673076) = 1.960257809
NOTE: LD50 = Prob @ 50% : Log odds = log(0.5 / (1-0.5)) = log(1) = 0
Then: 0 = -2.64011 + 0.67308*Dose50 so Dose50 = 2.64011 / 0.67308 = 3.923

33 data next1; set next1;
34 title1 'Predicted values for Logistic regression';
35 odds = exp(yhat);
36 PredProb = odds / (1 + odds);
37 run;
NOTE: There were 6 observations read from the data set WORK.NEXT1.
NOTE: The data set WORK.NEXT1 has 6 observations and 9 variables.
NOTE: DATA statement used:
real time 0.05 seconds
cpu time 0.05 seconds
38 proc print data=next1; run;
NOTE: There were 6 observations read from the data set WORK.NEXT1.
NOTE: The PROCEDURE PRINT printed page 4.
NOTE: PROCEDURE PRINT used:
real time 0.01 seconds
cpu time 0.01 seconds
39 OPTIONS PS=45;
39 ! PROC PLOT DATA=NEXT1; PLOT P*X='o' PredProb*X='p' / OVERLAY;
40 TITLE2 'Plot of the raw data'; RUN; OPTIONS PS=256;
NOTE: There were 6 observations read from the data set WORK.NEXT1.
NOTE: The PROCEDURE PLOT printed page 5.
NOTE: PROCEDURE PLOT used:
real time 0.00 seconds
cpu time 0.00 seconds

Predicted values for Logistic regression Pred
Obs X R N P LOGIT WT YHAT odds Prob
1 1 28 250 0.112 -2.07047 24.864 -1.96703 0.13987 0.12271
2 2 53 250 0.212 -1.31291 41.764 -1.29395 0.27418 0.21518
3 3 93 250 0.372 -0.52365 58.404 -0.62088 0.53747 0.34958
4 4 126 250 0.504 0.01600 62.496 0.05220 1.05358 0.51305
5 5 172 250 0.688 0.79079 53.664 0.72527 2.06530 0.67377
6 6 197 250 0.788 1.31291 41.764 1.39835 4.04852 0.80192

Predicted values for Logistic regression
Plot of the raw data

Plot of P*X. Symbol used is 'o'.
Plot of PredProb*X. Symbol used is 'p'.
P |
|
0.8 + o
p
|
|
| o
|
p
|
0.6 +
|
|
| o
p
|
|
0.4 +
| o
|
p
|
|
|
0.2 + o
p
|
|
p
| o
|
|
0.0 +
|
---+--------------+--------------+--------------+--------------+--------------+--
1 2 3 4 5 6
X


1 ***********************************************;
2 *** Logistic Regression - Flu Shot example ***;
3 *** NKNW example 14.11 ***;
4 ***********************************************;
5
6 options ps=256 ls=111 nocenter nodate nonumber;
7
8 TITLE1 'Logistic Regression - NKNW Problem 14.11';
9 data FLU; infile cards missover;
10 input GotShot Age HealthAwareness;
11 Cards;
NOTE: The data set WORK.FLU has 50 observations and 3 variables.
NOTE: DATA statement used:
real time 0.05 seconds
cpu time 0.05 seconds
62 ;
63 proc logistic data=flu DESCENDING alpha=0.01;
64 TITLE2 'Logistic regression on Flu shot data';
65 model GotShot = Age HealthAwareness;
66 output out=next1 PREDICTED=yhat Lower=lcl Upper=ucl;
67 run;
NOTE: PROC LOGISTIC is modeling the probability that GotShot=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 50 observations read from the data set WORK.FLU.
NOTE: The data set WORK.NEXT1 has 50 observations and 7 variables.
NOTE: The PROCEDURE LOGISTIC printed page 1.
NOTE: PROCEDURE LOGISTIC used:
real time 0.21 seconds
cpu time 0.10 seconds



Logistic Regression - NKNW Problem 14.11
Logistic regression on Flu shot data

The LOGISTIC Procedure

Model Information
Data Set WORK.FLU
Response Variable GotShot
Number of Response Levels 2
Number of Observations 50
Model binary logit
Optimization Technique Fisher's scoring

Response Profile
Ordered Total
Value GotShot Frequency
1 1 21
2 0 29
Probability modeled is GotShot=1.

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 70.029 38.416
SC 71.941 44.152
-2 Log L 68.029 32.416


Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 35.6129 2 <.0001
Score 26.4760 2 <.0001
Wald 11.6027 2 0.0030

Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -21.5826 6.4176 11.3102 0.0008
Age 1 0.2218 0.0744 8.8958 0.0029
HealthAwareness 1 0.2035 0.0627 10.5248 0.0012

Odds Ratio Estimates
Point 99% Wald
Effect Estimate Confidence Limits
Age 1.248 1.031 1.512
HealthAwareness 1.226 1.043 1.441

Association of Predicted Probabilities and Observed Responses
Percent Concordant 92.3 Somers' D 0.847
Percent Discordant 7.6 Gamma 0.849
Percent Tied 0.2 Tau-a 0.421
Pairs 609 c 0.924



68
69 proc sort data=next1 nodupkey; by AGE HealthAwareness; run;
NOTE: 3 observations with duplicate key values were deleted.
NOTE: There were 50 observations read from the data set WORK.NEXT1.
NOTE: The data set WORK.NEXT1 has 47 observations and 7 variables.
NOTE: PROCEDURE SORT used:
real time 0.05 seconds
cpu time 0.04 seconds
70 proc print data=next1; var AGE HealthAwareness yhat lcl ucl;
71 TITLE2 'Listing of one kept value for each value of exam1';
72 run;
NOTE: There were 47 observations read from the data set WORK.NEXT1.
NOTE: The PROCEDURE PRINT printed page 2.
NOTE: PROCEDURE PRINT used:
real time 0.01 seconds
cpu time 0.01 seconds



Logistic Regression - NKNW Problem 14.11
Listing of one kept value for each value of exam1

Health
Obs Age Awareness yhat lcl ucl

1 28 58 0.02736 0.00072 0.52409
2 31 48 0.00710 0.00011 0.32114
3 33 45 0.00601 0.00009 0.29040
4 34 50 0.02046 0.00069 0.38684
5 34 54 0.04502 0.00242 0.47852
6 35 44 0.00763 0.00014 0.29425
7 36 71 0.70022 0.17618 0.96228
8 38 40 0.00659 0.00012 0.27249
9 38 45 0.01801 0.00065 0.34060
10 39 59 0.28328 0.06179 0.70344
11 40 64 0.57712 0.19315 0.88610
12 41 36 0.00568 0.00009 0.26275
13 41 70 0.85241 0.37321 0.98246
14 42 32 0.00315 0.00003 0.23639
15 42 42 0.02361 0.00109 0.34831
16 42 61 0.53594 0.20534 0.83770
17 43 49 0.11144 0.01650 0.48389
18 44 27 0.00178 0.00001 0.21931
19 45 35 0.01119 0.00028 0.31012
20 45 41 0.03695 0.00236 0.38406
21 46 43 0.06712 0.00672 0.43350
22 46 44 0.08104 0.00943 0.44961
23 46 47 0.13969 0.02527 0.50423
24 46 56 0.50339 0.23553 0.76932
25 46 59 0.65114 0.33557 0.87338
26 46 68 0.92096 0.53292 0.99167
27 47 47 0.16853 0.03525 0.52925
28 47 56 0.55856 0.28192 0.80307
29 48 33 0.01444 0.00041 0.34474
30 49 50 0.36770 0.14313 0.66937
31 49 59 0.78403 0.45460 0.94052
32 49 70 0.97147 0.64250 0.99845
33 52 60 0.89642 0.55594 0.98356
34 52 61 0.91385 0.57467 0.98813
35 53 32 0.03496 0.00151 0.46393
36 53 49 0.53530 0.23928 0.80838
37 53 69 0.98539 0.70558 0.99947
38 54 48 0.53984 0.22911 0.82241
39 54 71 0.99215 0.74156 0.99982
40 56 46 0.54890 0.20498 0.85169
41 56 57 0.91943 0.56618 0.99008
42 57 54 0.88553 0.50840 0.98301
43 57 59 0.95536 0.62068 0.99644
44 63 40 0.62892 0.14248 0.94532
45 63 43 0.75733 0.23614 0.96923
46 64 34 0.38424 0.04117 0.90068
47 64 70 0.99895 0.82875 0.99999



74 proc sort data=flu; by age; run;
NOTE: There were 50 observations read from the data set WORK.FLU.
NOTE: The data set WORK.FLU has 50 observations and 3 variables.
NOTE: PROCEDURE SORT used:
real time 0.03 seconds
cpu time 0.03 seconds
75 proc sort data=next1; by age; run;
NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used:
real time 0.00 seconds
cpu time 0.00 seconds
76 proc means data=flu; var GotShot age HealthAwareness;
77 run;
NOTE: There were 50 observations read from the data set WORK.FLU.
NOTE: The PROCEDURE MEANS printed page 3.
NOTE: PROCEDURE MEANS used:
real time 0.02 seconds
cpu time 0.02 seconds




Logistic Regression - NKNW Problem 14.11
Listing of one kept value for each value of exam1

The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------------
GotShot 50 0.4200000 0.4985694 0 1.0000000
Age 50 46.4200000 8.7972398 28.0000000 64.0000000
HealthAwareness 50 51.2400000 11.7726074 27.0000000 71.0000000
-------------------------------------------------------------------------------------




78 proc means data=flu noprint; by age; var GotShot;
79 output out=next2 n=n mean=mean var=var; run;
NOTE: There were 50 observations read from the data set WORK.FLU.
NOTE: The data set WORK.NEXT2 has 25 observations and 6 variables.
NOTE: PROCEDURE MEANS used:
real time 0.04 seconds
cpu time 0.04 seconds
80 data next2; set next2;
81 PredLogOddsAge = -21.5826+0.2218*age+0.2035*51.24;
82 FluShotOddsAge = exp(PredLogOddsAge);
83 FluShotProbAge = FluShotOddsAge / (1 + FluShotOddsAge);
84 run;
NOTE: There were 25 observations read from the data set WORK.NEXT2.
NOTE: The data set WORK.NEXT2 has 25 observations and 9 variables.
NOTE: DATA statement used:
real time 0.05 seconds
cpu time 0.05 seconds
85 proc print data=next2;
86 TITLE2 'Listing of one kept value for each value of exam1';
TITLE3 'Note: Predicted probabilities are adjusted for HealthAwareness';
87 run;
NOTE: There were 25 observations read from the data set WORK.NEXT2.
NOTE: The PROCEDURE PRINT printed page 4.
NOTE: PROCEDURE PRINT used:
real time 0.01 seconds
cpu time 0.01 seconds
88
89 data two; set next1 next2; run;
NOTE: There were 47 observations read from the data set WORK.NEXT1.
NOTE: There were 25 observations read from the data set WORK.NEXT2.
NOTE: The data set WORK.TWO has 72 observations and 15 variables.
NOTE: DATA statement used:
real time 0.03 seconds
cpu time 0.03 seconds
90 options ps=56 ls=111;
91 proc plot data=next2; plot FluShotProbAge*age='x' mean*age='o' / overlay;
92 TITLE2 'Plot of observed probs (o) and predicted values (p)';
93 TITLE3 'Note: observed probabilities unadjusted for HealthAwareness';
94 run;




Logistic Regression - NKNW Problem 14.11
Listing of one kept value for each value of AGE
Note: Predicted probabilities are adjusted for HealthAwareness

Flu Flu
PredLog Shot Shot
Obs Age _TYPE_ _FREQ_ n mean var OddsAge OddsAge ProbAge
1 28 0 1 1 0.00000 . -4.94486 0.0071 0.00707
2 31 0 1 1 0.00000 . -4.27946 0.0139 0.01366
3 33 0 1 1 0.00000 . -3.83586 0.0216 0.02113
4 34 0 2 2 0.00000 0.00000 -3.61406 0.0269 0.02624
5 35 0 1 1 0.00000 . -3.39226 0.0336 0.03254
6 36 0 1 1 0.00000 . -3.17046 0.0420 0.04029
7 38 0 3 3 0.00000 0.00000 -2.72686 0.0654 0.06141
8 39 0 1 1 1.00000 . -2.50506 0.0817 0.07550
9 40 0 2 2 0.00000 0.00000 -2.28326 0.1020 0.09252
10 41 0 2 2 0.50000 0.50000 -2.06146 0.1273 0.11290
11 42 0 3 3 0.00000 0.00000 -1.83966 0.1589 0.13709
12 43 0 1 1 0.00000 . -1.61786 0.1983 0.16550
13 44 0 1 1 0.00000 . -1.39606 0.2476 0.19844
14 45 0 2 2 0.00000 0.00000 -1.17426 0.3090 0.23609
15 46 0 6 6 0.66667 0.26667 -0.95246 0.3858 0.27839
16 47 0 2 2 0.50000 0.50000 -0.73066 0.4816 0.32505
17 48 0 1 1 0.00000 . -0.50886 0.6012 0.37546
18 49 0 3 3 1.00000 0.00000 -0.28706 0.7505 0.42872
19 52 0 2 2 1.00000 0.00000 0.37834 1.4599 0.59347
20 53 0 3 3 0.66667 0.33333 0.60014 1.8224 0.64569
21 54 0 2 2 0.50000 0.50000 0.82194 2.2749 0.69465
22 56 0 3 3 0.66667 0.33333 1.26554 3.5450 0.77998
23 57 0 2 2 1.00000 0.00000 1.48734 4.4253 0.81568
24 63 0 2 2 0.50000 0.50000 2.81814 16.7457 0.94365
25 64 0 2 2 0.50000 0.50000 3.03994 20.9040 0.95435

Logistic Regression - NKNW Problem 14.11
Plot of observed probs (o) and predicted values (p)
Note: observed probabilities unadjusted for HealthAwareness

Plot of FluShotProbAge*Age. Symbol used is 'x'.
Plot of mean*Age. Symbol used is 'o'.
FluShotProbAge |
|
1.0 +
o o o o
|
| x x
|
|
|
| x
0.8 +
| x
|
|
| x
| x
o o o
|
0.6 + x
|
|
|
o o o o o
|
|
| x
0.4 +
| x
|
| x
| x
|
| x
0.2 + x
| x
| x
| x
| x x
| x
| x x x x
0.0 + x x
o o o o o o o o o o o o o
|
---+----------+----------+----------+----------+----------+----------+----------+----------+--
25 30 35 40 45 50 55 60 65
Age