The dataset for today is ex1123.csv.  I recommend you use PROC REG. The data step is as follows. 

data AirPollution; length city $ 18;

   infile input1 missover DSD dlm="," firstobs=2;

   input CITY $ MORT PRECIP EDUC NONWHITE NOX SO2;

datalines;

run;

 

The variable “city” will not play a roll in this analysis, but may be useful as a graphics indicator and for locating individual observations. 

 

1) Start by sorting the raw data by CITY and listing the raw data.  The alphabetical order will make locating individual cities easier. 

2) First fit the full model, “MODEL MORT = PRECIP EDUC NONWHITE NOX SO2;”.  From this beginning, do the following. 

3) On this full model, check the VIF (variance inflation factor).  Do they indicate a problem? 

4) If there are any non-significant variables (a=0.05) make a copy of the PROC REG above and remove the least significant variable.  Do not delete the original full model above, leave it in your program.  Repeat this step as often as necessary (one variable at a time) until all variables in the model are significant. 

5) For the last model from part 4 (with all terms significant) output the values for “student, rstudent, cookd, leverage and dffits.  The “keywords” are the same as those used in the preceding sentence except for leverage which has the keyword “h”.   You are welcome to borrow code from my examples (see example on SAT scores). 

            As VREF reference values use “2.7” for student and rstudent (approximate 99% level of t for degrees of freedom around 56), use “1” for cookd and dffits and use “0.17” for leverage (approximate median level for 2*p/n where p is 4 to 6).   

6) Plot these values.  Again, borrowed code may save you some time. 

 

Questions to answer in homework or email or on this page to be turned in (your choice). 

Leave all models in your program and email me a copy.  I will check to see which one you left as having all significant variables. 

1) Did the VIF indicate problems?  Indicate    YES  or  NO

2) Were the atmospheric pollutants, NOX and SO2, implicated as having a possible correlation with mortality?  In your answer specify which one or both.  Of course, such a correlation does not prove a relationship.  Give the P-value _______________

3) Are there any problems suggested by the residual and influence diagnostics?  YES  or  NO

            Which one(s) indicate problems?  __________________________

4) Which city had the largest RSTUDENT value?  _______________

5) Which city had the largest LEVERAGE value?  _______________

6) Which city had the largest DIFFITS value?  _______________

7) Which city had the largest COOKD value?  _______________