1          **************************************************************************;
2          *** Logistic regression example -                                      ***;
3          *** Data is from Statistical Methods II classes in recent years        ***;
4          *** The objective is to determine the probability of getting an "A"    ***;
5          ***    in the class from the grade on the first exam.                  ***;
6          **************************************************************************;
7
8          options ps=256 ls=88 nocenter nodate nonumber;
9
10         data grades; infile
10       ! "C:\Geaghan\EXST\EXST7015New\Fall2002\SAS\05-LogisticReg.DAT" missover;
11              TITLE1 'EXST7015: Probability  of A grade in EXST7015';
12           input Semester $ Exam1 Grade_A $;
13              if exam1 eq . then delete;
14              interval = 5; Score1 = int(exam1/interval)*interval + (interval/2);
15              if score1 gt 100 then score1=100;
16              indicator = 0; if Grade_A eq 'TRUE' then indicator = 1;
17         cards;
NOTE: The infile "C:\Geaghan\EXST\EXST7015New\Fall2002\SAS\05-LogisticReg.DAT" is:
      File Name=C:\Geaghan\EXST\EXST7015New\Fall2002\SAS\05-LogisticReg.DAT,
      RECFM=V,LRECL=256
NOTE: 424 records were read from the infile
      "C:\Geaghan\EXST\EXST7015New\Fall2002\SAS\05-LogisticReg.DAT".
      The minimum record length was 0.
      The maximum record length was 14.
NOTE: The data set WORK.GRADES has 423 observations and 6 variables.
NOTE: DATA statement used:
      real time           0.06 seconds
      cpu time            0.06 seconds
17       !        run;
18         ;
19         proc sort data=grades; by exam1; run;
NOTE: There were 423 observations read from the data set WORK.GRADES.
NOTE: The data set WORK.GRADES has 423 observations and 6 variables.
NOTE: PROCEDURE SORT used:
      real time           0.05 seconds
      cpu time            0.05 seconds
21         proc freq data=grades; table score1*Grade_A / norow nocol nopercent;
22              TITLE2 'Simple frequencies by 5 point groupings';
23         run;
NOTE: There were 423 observations read from the data set WORK.GRADES.
NOTE: The PROCEDURE FREQ printed page 1.
NOTE: PROCEDURE FREQ used:
      real time           0.09 seconds
      cpu time            0.09 seconds
 


EXST7015: Probability  of A grade in EXST7015
Simple frequencies by 5 point groupings
 
The FREQ Procedure
Table of Score1 by Grade_A
Score1     Grade_A
Frequency|FALSE   |TRUE    |  Total
---------+--------+--------+
    52.5 |      1 |      0 |      1
---------+--------+--------+
    57.5 |      4 |      1 |      5
---------+--------+--------+
    62.5 |      5 |      0 |      5
---------+--------+--------+
    67.5 |     12 |      1 |     13
---------+--------+--------+
    72.5 |     25 |      1 |     26
---------+--------+--------+
    77.5 |     40 |      7 |     47
---------+--------+--------+
    82.5 |     51 |     14 |     65
---------+--------+--------+
    87.5 |     41 |     45 |     86
---------+--------+--------+
    92.5 |     23 |     88 |    111
---------+--------+--------+
    97.5 |      7 |     51 |     58
---------+--------+--------+
     100 |      2 |      4 |      6
---------+--------+--------+
Total         211      212      423


25         proc means data=grades mean max min std stderr print; var exam1;
26              TITLE2 'Raw data mean';
27         run;
NOTE: There were 423 observations read from the data set WORK.GRADES.
NOTE: The PROCEDURE MEANS printed page 2.
NOTE: PROCEDURE MEANS used:
      real time           0.02 seconds
      cpu time            0.02 seconds
 
EXST7015: Probability  of A grade in EXST7015
Raw data mean
 
The MEANS Procedure
                         Analysis Variable : Exam1
        Mean         Maximum         Minimum         Std Dev       Std Error
----------------------------------------------------------------------------
  85.8628842     100.0000000      52.0000000       9.0178926       0.4384649
----------------------------------------------------------------------------
 
 
29         proc logistic data=grades DESCENDING; TITLE2 'Logistic regression';
30            model Grade_A = exam1;
31            output out=next1 PREDICTED=yhat Lower=lcl Upper=ucl;
32         run;
NOTE: PROC LOGISTIC is modeling the probability that Grade_A='TRUE'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 423 observations read from the data set WORK.GRADES.
NOTE: The data set WORK.NEXT1 has 423 observations and 10 variables.
NOTE: The PROCEDURE LOGISTIC printed page 3.
NOTE: PROCEDURE LOGISTIC used:
      real time           0.08 seconds
      cpu time            0.08 seconds
 
 
EXST7015: Probability  of A grade in EXST7015
Logistic regression
 
The LOGISTIC Procedure
 
              Model Information
Data Set                      WORK.GRADES
Response Variable             Grade_A
Number of Response Levels     2
Number of Observations        423
Model                         binary logit
Optimization Technique        Fisher's scoring
 
          Response Profile
 Ordered                      Total
   Value     Grade_A      Frequency
       1     TRUE               212
       2     FALSE              211
 
Probability modeled is Grade_A='TRUE'.
 
                    Model Convergence Status
         Convergence criterion (GCONV=1E-8) satisfied.
 


         Model Fit Statistics
                              Intercept
               Intercept         and
Criterion        Only        Covariates
AIC              588.400        425.407
SC               592.448        433.502
-2 Log L         586.400        421.407
 
        Testing Global Null Hypothesis: BETA=0
Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio       164.9934        1         <.0001
Score                  132.7164        1         <.0001
Wald                    96.1179        1         <.0001
 
             Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept     1    -16.9098      1.7443       93.9760        <.0001
Exam1         1      0.1952      0.0199       96.1179        <.0001
 
           Odds Ratio Estimates
             Point          95% Wald
Effect    Estimate      Confidence Limits
Exam1        1.216       1.169       1.264
 
Association of Predicted Probabilities and Observed Responses
Percent Concordant     82.8    Somers' D    0.681
Percent Discordant     14.7    Gamma        0.698
Percent Tied            2.4    Tau-a        0.341
Pairs                 44732    c            0.841
 
 
 
34         proc sort data=next1 nodupkey; by exam1; run;
NOTE: 380 observations with duplicate key values were deleted.
NOTE: There were 423 observations read from the data set WORK.NEXT1.
NOTE: The data set WORK.NEXT1 has 43 observations and 10 variables.
NOTE: PROCEDURE SORT used:
      real time           0.04 seconds
      cpu time            0.04 seconds
35         proc print data=next1; var yhat lcl ucl;
36              TITLE2 'Listing of one kept value for each value of exam1';
37         run;
NOTE: There were 43 observations read from the data set WORK.NEXT1.
NOTE: The PROCEDURE PRINT printed page 4.
NOTE: PROCEDURE PRINT used:
      real time           0.02 seconds
      cpu time            0.02 seconds
 
 


EXST7015: Probability  of A grade in EXST7015
Listing of one kept value for each value of exam1
 

Obs      yhat       lcl        ucl
  1    0.00116    0.00029    0.00469
  2    0.00208    0.00058    0.00748
  3    0.00307    0.00092    0.01021
  4    0.00373    0.00116    0.01192
  5    0.00453    0.00146    0.01392
  6    0.00550    0.00185    0.01625
  7    0.00983    0.00371    0.02579
  8    0.01192    0.00468    0.03005
  9    0.01752    0.00743    0.04073
 10    0.02121    0.00936    0.04736
 11    0.02567    0.01178    0.05501
 12    0.03103    0.01482    0.06383
 13    0.03747    0.01862    0.07396
 14    0.04518    0.02337    0.08557
 15    0.05439    0.02928    0.09883
 16    0.06535    0.03663    0.11392
 17    0.07833    0.04571    0.13103
 18    0.09363    0.05689    0.15032
 19    0.11156    0.07057    0.17197
 20    0.13243    0.08717    0.19613
 21    0.15651    0.10716    0.22290
 22    0.18403    0.13095    0.25238
 23    0.21516    0.15891    0.28459
 24    0.24995    0.19128    0.31951
 25    0.28829    0.22807    0.35706
 26    0.32993    0.26906    0.39710
 27    0.37442    0.31367    0.43940
 28    0.42114    0.36104    0.48368
 29    0.46932    0.41001    0.52950
 30    0.51807    0.45935    0.57629
 31    0.56648    0.50791    0.62325
 32    0.61365    0.55474    0.66941
 33    0.65879    0.59923    0.71372
 34    0.70121    0.64098    0.75520
 35    0.74045    0.67981    0.79309
 36    0.77617    0.71563    0.82693
 37    0.80825    0.74845    0.85656
 38    0.83670    0.77830    0.88205
 39    0.86165    0.80529    0.90365
 40    0.88332    0.82954    0.92174
 41    0.90198    0.85120    0.93673
 42    0.91794    0.87045    0.94904
 43    0.93149    0.88747    0.95909


 
 
39         proc sort data=grades; by score1; run;
NOTE: There were 423 observations read from the data set WORK.GRADES.
NOTE: The data set WORK.GRADES has 423 observations and 6 variables.
NOTE: PROCEDURE SORT used:
      real time           0.04 seconds
      cpu time            0.04 seconds
40         proc sort data=next1; by score1; run;
NOTE: There were 43 observations read from the data set WORK.NEXT1.
NOTE: The data set WORK.NEXT1 has 43 observations and 10 variables.
NOTE: PROCEDURE SORT used:
      real time           0.03 seconds
      cpu time            0.03 seconds
41         proc means data=grades noprint; by score1; var indicator;
42              output out=next2 n=n mean=mean var=var; run;
NOTE: There were 423 observations read from the data set WORK.GRADES.
NOTE: The data set WORK.NEXT2 has 11 observations and 6 variables.
NOTE: PROCEDURE MEANS used:
      real time           0.04 seconds
      cpu time            0.04 seconds
43        
44         data two; set next1 next2; run;
NOTE: There were 43 observations read from the data set WORK.NEXT1.
NOTE: There were 11 observations read from the data set WORK.NEXT2.
NOTE: The data set WORK.TWO has 54 observations and 15 variables.
NOTE: DATA statement used:
      real time           0.05 seconds
      cpu time            0.05 seconds
45         options ps=56 ls=111;
46         proc plot data=two;  plot yhat*exam1='x' mean*score1='o' / overlay;
47              TITLE2 'Plot of observed means (o) and predicted values (p)';
48         run;
 
 


EXST7015: Probability  of A grade in EXST7015
Plot of observed means (o) and predicted values (p)
 
                                   Plot of yhat*Exam1.   Symbol used is 'x'.
                                   Plot of mean*Score1.  Symbol used is 'o'.
 
      |
      |
  1.0 +
      |
      |                                                                                                     x
      |                                                                                                 x x
      |                                                                                               xo
      |                                                                                             x
      |                                                                                           x
E 0.8 +                                                                                         x
s     |                                                                                      ox
t     |                                                                                     x
i     |                                                                                   x
m     |
a     |                                                                                 x                   o
t     |
e 0.6 +                                                                               x
d     |                                                                             x
      |
P     |                                                                           xo
r     |
o     |                                                                         x
b     |                                                                       x
a 0.4 +
b     |                                                                     x
i     |                                                                   x
l     |
i     |                                                                 x
t     |                                                               x
y     |                                                             x    o
  0.2 +                o
      |                                                           x
      |                                                       xox
      |                                                     x
      |                                                 x x
      |                                    o      x x x
      |                                 x x x x x    o
  0.0 +     xo    x   x x x x    ox x
      |
      --+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+--
       50        55        60        65        70        75        80        85        90        95        100
                                                        Exam1
NOTE: 54 obs had missing values.
 


 


Modified: August 16, 2004
James P. Geaghan