Original Program from program editor.
**********************************************************************;
*** EXST7005 Regression Example ***;
*** Redfin Pickerel, and other fish, accumulate parasites ***;
*** on their fins. These parasites attach and stay with ***;
*** the fish throughout its life until the fish is eaten ***;
*** and the parasite continues its life cycle. ***;
*** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -***;
*** If parasites are accumulated at a constant rate, older ***;
*** fish should have more parasites. Test this hypothesis. ***;
*** OBJECTIVES: ***;
*** 1) Determine if older fish have more parasites. ***;
*** 2) Estimate the rate of accumulation of parasites. ***;
*** 3) Place a confidence interval on this estimate ***;
*** 4) Estimate the intercept with confidence interval. ***;
*** 5) Determine how many parasites a 10 year old fish would have. ***;
*** 6) Place a confidence interval on the 10 year old fish estimate***;
*** 7) Determine of a linear model is adequate. ***;
*** 8) An old published article states that the rate of accumul. ***;
*** should be about 5 per year. Test our estimate against 5. ***;
**********************************************************************;
options ps=256 ls=99 nocenter nodate nonumber nolabel;
TITLE1 'Example of Simple linear Regression (SLR)';
DATA ONE; INFILE CARDS MISSOVER;
TITLE2 'Rate of parasite accumulation in Redfin Pickerel';
INPUT AGE PARASITE;
LABEL AGE = 'Fish age from scales reading';
LABEL PARASITE = 'Pectoral fin parasites / sq cm';
CARDS;
1 3
2 7
3 8
3 12
3 10
4 15
4 14
5 16
6 17
6 15
6 16
7 19
7 21
8 18
9 17
9 20
0 .
10 .
;
PROC PRINT DATA=ONE;
TITLE3 'Data Listing for Fish Parasite Regression'; RUN;
PROC REG DATA=ONE LINEPRINTER;
TITLE3 'Fish Parasite example using REG with CLM';
MODEL PARASITE=AGE / clb; *** CLI CLM P R; ID AGE;
TEST AGE=5;
OUTPUT OUT=NEXT P=P R=E STUDENT=student rstudent=rstudent
lcl=lcl lclm=lclm ucl=ucl uclm=uclm;
RUN; OPTIONS PS=35; TITLE4 'Plots of raw data & residuals';
PLOT PREDICTED.*AGE='P' PARASITE*AGE='O' / OVERLAY;
PLOT RESIDUAL.*AGE='E';
RUN; QUIT;
proc print data=next;
TITLE4 'Listing of output from PROC REG';
var age parasite P E student rstudent lcl lclm ucl uclm; run;
OPTIONS PS=61;
PROC UNIVARIATE DATA=NEXT NORMAL PLOT; VAR E;
TITLE4 'Residual analysis with PROC UNIVARIATE';
RUN;
PROC GLM DATA=ONE;
TITLE3 'Fish Parasite example using GLM with CLI';
MODEL PARASITE=AGE / P CLI ALPHA=.01; ID AGE;
CONTRAST 'HO: B1 = 5' AGE 5;
RUN; QUIT;
GGOPTIONS DEVICE=CGMflwa GSFMODE=REPLACE GSFNAME=OUT NOPROMPT noROTATE
ftext='TimesRoman' ftitle='TimesRoman';
FILENAME OUT1 'F:\Fall2003\_Disk_Fall03\slrci2.cgm';
PROC GPLOT DATA=one; TITLE1 'Regression with confidence bands';
PLOT parasite*age=1 parasite*age=2 / OVERLAY HAXIS=AXIS1 VAXIS=AXIS2;
AXIS1 LABEL=('Age (years)') ORDER=0 TO 10 BY 1;
AXIS2 LABEL=('Parasites') ORDER=0 TO 25 BY 5;
SYMBOL1 V=dot c=red I=RLclm95 L=1 W=5 mode=include;
SYMBOL2 V=none c=blue I=RLcli95 L=1 W=5 mode=include; run;
GOPTIONS GSFNAME=OUT2;
FILENAME OUT2 'F:\Fall2003\_Disk_Fall03\resplot2.cgm';
PROC GPLOT DATA=next;
TITLE1 'Residual plot';
PLOT e*age / HAXIS=AXIS1 VAXIS=AXIS2 vref=0;
AXIS1 LABEL=('Age (years)') ORDER=0 TO 10 BY 1;
AXIS2 LABEL=('Parasite residuals');
SYMBOL1 V=dot c=red I=none L=1 W=5 mode=include; run;
quit;
Below is output from the SAS log (bold) and output from the SAS Output window.
1 **********************************************************************; 2 *** EXST7005 Regression Example ***; 3 *** Redfin Pickerel, and other fish, accumulate parasites ***; 4 *** on their fins. These parasites attach and stay with ***; 5 *** the fish throughout its life until the fish is eaten ***; 6 *** and the parasite continues its life cycle. ***; 7 *** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -***; 8 *** If parasites are accumulated at a constant rate, older ***; 9 *** fish should have more parasites. Test this hypothesis. ***; 10 *** OBJECTIVES: ***; 11 *** 1) Determine if older fish have more parasites. ***; 12 *** 2) Estimate the rate of accumulation of parasites. ***; 13 *** 3) Place a confidence interval on this estimate ***; 14 *** 4) Estimate the intercept with confidence interval. ***; 15 *** 5) Determine how many parasites a 10 year old fish would have. ***; 16 *** 6) Place a confidence interval on the 10 year old fish estimate***; 17 *** 7) Determine of a linear model is adequate. ***; 18 *** 8) An old published article states that the rate of accumul. ***; 19 *** should be about 5 per year. Test our estimate against 5. ***; 20 **********************************************************************; 21 22 options ps=256 ls=99 nocenter nodate nonumber nolabel; 23 TITLE1 'Example of Simple linear Regression (SLR)'; 24 25 DATA ONE; INFILE CARDS MISSOVER; 26 TITLE2 'Rate of parasite accumulation in Redfin Pickerel'; 27 INPUT AGE PARASITE; 28 LABEL AGE = 'Fish age from scales reading'; 29 LABEL PARASITE = 'Pectoral fin parasites / sq cm'; 30 CARDS; NOTE: The data set WORK.ONE has 18 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 49 ; 50 PROC PRINT DATA=ONE; 51 TITLE3 'Data Listing for Fish Parasite Regression'; RUN; NOTE: There were 18 observations read from the data set WORK.ONE. NOTE: The PROCEDURE PRINT printed page 1. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Data Listing for Fish Parasite Regression Obs AGE PARASITE 1 1 3 2 2 7 3 3 8 4 3 12 5 3 10 6 4 15 7 4 14 8 5 16 9 6 17 10 6 15 11 6 16 12 7 19 13 7 21 14 8 18 15 9 17 16 9 20 17 0 . 18 10 . 52 53 PROC REG DATA=ONE LINEPRINTER; 54 TITLE3 'Fish Parasite example using REG with CLM'; 55 MODEL PARASITE=AGE / clb; *** CLI CLM P R; ID AGE; 56 TEST AGE=5; 57 OUTPUT OUT=NEXT P=P R=E STUDENT=student rstudent=rstudent 58 lcl=lcl lclm=lclm ucl=ucl uclm=uclm; 59 RUN; NOTE: 18 observations read. NOTE: 2 observations have missing values. NOTE: 16 observations used in computations. 59 ! OPTIONS PS=35; TITLE4 'Plots of raw data & residuals'; 60 PLOT PREDICTED.*AGE='P' PARASITE*AGE='O' / OVERLAY; 61 PLOT RESIDUAL.*AGE='E'; 62 RUN; 62 ! QUIT; NOTE: The data set WORK.NEXT has 18 observations and 10 variables. NOTE: The PROCEDURE REG printed pages 2-5. NOTE: PROCEDURE REG used (Total process time): real time 0.04 seconds cpu time 0.04 seconds Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Fish Parasite example using REG with CLM The REG Procedure Model: MODEL1 Dependent Variable: PARASITE Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 301.94955 301.94955 54.86 <.0001 Error 14 77.05045 5.50360 Corrected Total 15 379.00000 Root MSE 2.34598 R-Square 0.7967 Dependent Mean 14.25000 Adj R-Sq 0.7822 Coeff Var 16.46299 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95%Confidence Limits Intercept 1 4.77125 1.40769 3.39 0.0044 1.75205 7.79045 AGE 1 1.82723 0.24669 7.41 <.0001 1.29813 2.35632 The REG Procedure Model: MODEL1 Test 1 Results for Dependent Variable PARASITE Mean Source DF Square F Value Pr > F Numerator 1 910.38705 165.42 <.0001 Denominator 14 5.50360 Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Fish Parasite example using REG with CLM Plots of raw data & residuals The REG Procedure Model: MODEL1 Dependent Variable: PARASITE -----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+------ P 30 + + r | | e | | d | | i | | c | O P | t 20 + P O + e | ? O | d | O O O | PRED | O ? | V | O P | a | O P | l 10 + ? + u | P O | e | P O | | | o | O | f | | 0 + + P -----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+------ A 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 R AGE The REG Procedure Model: MODEL1 Dependent Variable: PARASITE ---+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+---- RESIDUAL | | 5.0 + + | | | E | | E | R 2.5 + + e | E E E | s | E E | i | | d 0.0 + E E + u | E | a | E E E | l | | -2.5 + E + | | | E | | E | -5.0 + + | | ---+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+---- 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 AGE 63 proc print data=next; 64 TITLE4 'Listing of output from PROC REG'; 65 var age parasite P E student rstudent lcl lclm ucl uclm; run; NOTE: There were 18 observations read from the data set WORK.NEXT. NOTE: The PROCEDURE PRINT printed page 6. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Fish Parasite example using REG with CLM Listing of output from PROC REG Obs AGE PARASITE P E student rstudent lcl lclm ucl uclm 1 1 3 6.5985 -3.59848 -1.77879 -1.94833 0.9586 4.0507 12.2384 9.1463 2 2 7 8.4257 -1.42571 -0.66902 -0.65524 2.9719 6.3218 13.8795 10.5296 3 3 8 10.2529 -2.25294 -1.02107 -1.02274 4.9389 8.5436 15.5670 11.9623 4 3 12 10.2529 1.74706 0.79180 0.78068 4.9389 8.5436 15.5670 11.9623 5 3 10 10.2529 -0.25294 -0.11464 -0.11052 4.9389 8.5436 15.5670 11.9623 6 4 15 12.0802 2.91983 1.29626 1.33156 6.8558 10.6741 17.3046 13.4863 7 4 14 12.0802 1.91983 0.85231 0.84348 6.8558 10.6741 17.3046 13.4863 8 5 16 13.9074 2.09261 0.92144 0.91614 8.7200 12.6456 19.0948 15.1692 9 6 17 15.7346 1.26538 0.55925 0.54503 10.5304 14.4053 20.9389 17.0640 10 6 15 15.7346 -0.73462 -0.32468 -0.31405 10.5304 14.4053 20.9389 17.0640 11 6 16 15.7346 0.26538 0.11729 0.11308 10.5304 14.4053 20.9389 17.0640 12 7 19 17.5619 1.43815 0.64577 0.63176 12.2875 15.9801 22.8362 19.1436 13 7 21 17.5619 3.43815 1.54382 1.63316 12.2875 15.9801 22.8362 19.1436 14 8 18 19.3891 -1.38908 -0.64222 -0.62818 13.9934 17.4406 24.7848 21.3376 15 9 17 21.2163 -4.21631 -2.03920 -2.34368 15.6514 18.8391 26.7812 23.5936 16 9 20 21.2163 -1.21631 -0.58826 -0.57400 15.6514 18.8391 26.7812 23.5936 17 0 . 4.7713 . . . -1.0967 1.7520 10.6392 7.7905 18 10 . 23.0435 . . . 17.2657 20.2035 28.8213 25.8836 66 OPTIONS PS=61; 67 PROC UNIVARIATE DATA=NEXT NORMAL PLOT; VAR E; 68 TITLE4 'Residual analysis with PROC UNIVARIATE'; 69 RUN; NOTE: The PROCEDURE UNIVARIATE printed pages 7-9. NOTE: PROCEDURE UNIVARIATE used (Total process time): real time 0.01 seconds cpu time 0.01 seconds Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Fish Parasite example using REG with CLM Residual analysis with PROC UNIVARIATE The UNIVARIATE Procedure Variable: E Moments N 16 Sum Weights 16 Mean 0 Sum Observations 0 Std Deviation 2.26642816 Variance 5.13669661 Skewness -0.3183952 Kurtosis -0.7591259 Uncorrected SS 77.0504492 Corrected SS 77.0504492 Coeff Variation . Std Error Mean 0.56660704 Basic Statistical Measures Location Variability Mean 0.000000 Std Deviation 2.26643 Median 0.006220 Variance 5.13670 Mode . Range 7.65446 Interquartile Range 3.24084 Tests for Location: Mu0=0n Test -Statistic- -----p Value------ Student's t t 0 Pr > |t| 1.0000 Sign M 0 Pr >= |M| 1.0000 Signed Rank S 4 Pr >= |S| 0.8603 Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.961962 Pr < W 0.6975 Kolmogorov-Smirnov D 0.149185 Pr > D >0.1500 Cramer-von Mises W-Sq 0.038869 Pr > W-Sq >0.2500 Anderson-Darling A-Sq 0.248615 Pr > A-Sq >0.2500 Quantiles (Definition 5) Quantile Estimate 100% Max 3.43814789 99% 3.43814789 95% 3.43814789 90% 2.91983414 75% Q3 1.83344851 50% Median 0.00621977 25% Q1 -1.40739461 10% -3.59847961 5% -4.21630961 1% -4.21630961 Quantiles (Definition 5) Quantile Estimate 0% Min -4.21630961 Extreme Observations ------Lowest----- -----Highest----- Value Obs Value Obs -4.21631 15 1.74706 4 -3.59848 1 1.91983 7 -2.25294 3 2.09261 8 -1.42571 2 2.91983 6 -1.38908 14 3.43815 13 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 2 11.11 100.00 Stem Leaf Boxplot 3 4 1 | 2 19 2 | 1 3479 4 +-----+ 0 3 1 *--+--* -0 73 2 | | -1 442 3 +-----+ -2 3 1 | -3 6 1 | -4 2 1 | ----+----+----+----+ The UNIVARIATE Procedure Variable: E Normal Probability Plot 3.5+ ++++* | +*++* | * *+*+* | +*+++ -0.5+ ++** | *+*+* | +++*+ | ++++* -4.5+ ++++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 71 PROC GLM DATA=ONE; 72 TITLE3 'Fish Parasite example using GLM with CLI'; 73 MODEL PARASITE=AGE / P CLI ALPHA=.01; ID AGE; 74 CONTRAST 'HO: B1 = 5' AGE 5; 75 RUN; 75 ! QUIT; NOTE: The PROCEDURE GLM printed pages 10-13. NOTE: PROCEDURE GLM used (Total process time): real time 0.03 seconds cpu time 0.03 seconds Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Fish Parasite example using GLM with CLI The GLM Procedure Number of observations 18 NOTE: Due to missing values, only 16 observations can be used in this analysis. Dependent Variable: PARASITE Sum of Source DF Squares Mean Square F Value Pr > F Model 1 301.9495508 301.9495508 54.86 <.0001 Error 14 77.0504492 5.5036035 Corrected Total 15 379.0000000 R-Square Coeff Var Root MSE PARASITE Mean 0.796701 16.46299 2.345976 14.25000 Source DF Type I SS Mean Square F Value Pr > F AGE 1 301.9495508 301.9495508 54.86 <.0001 Source DF Type III SS Mean Square F Value Pr > F AGE 1 301.9495508 301.9495508 54.86 <.0001 Contrast DF Contrast SS Mean Square F Value Pr > F HO: B1 = 5 1 301.9495508 301.9495508 54.86 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 4.771250864 1.40769370 3.39 0.0044 AGE 1.827228749 0.24668872 7.41 <.0001 Observation AGE Observed Predicted Residual 1 1 3.00000000 6.59847961 -3.59847961 2 2 7.00000000 8.42570836 -1.42570836 3 3 8.00000000 10.25293711 -2.25293711 4 3 12.00000000 10.25293711 1.74706289 5 3 10.00000000 10.25293711 -0.25293711 6 4 15.00000000 12.08016586 2.91983414 7 4 14.00000000 12.08016586 1.91983414 8 5 16.00000000 13.90739461 2.09260539 9 6 17.00000000 15.73462336 1.26537664 10 6 15.00000000 15.73462336 -0.73462336 11 6 16.00000000 15.73462336 0.26537664 12 7 19.00000000 17.56185211 1.43814789 13 7 21.00000000 17.56185211 3.43814789 14 8 18.00000000 19.38908086 -1.38908086 15 9 17.00000000 21.21630961 -4.21630961 16 9 20.00000000 21.21630961 -1.21630961 17 * 0 . 4.77125086 . 18 * 10 . 23.04353836 . 99%Confidence Limits for Observation AGE Individual Predicted Value 1 1 -1.22936390 14.42632313 2 2 0.85616543 15.99525129 3 3 2.87734381 17.62853041 4 3 2.87734381 17.62853041 5 3 2.87734381 17.62853041 6 4 4.82900575 19.33132597 7 4 4.82900575 19.33132597 8 5 6.70754602 21.10724320 9 6 8.51140616 22.95784055 10 6 8.51140616 22.95784055 11 6 8.51140616 22.95784055 12 7 10.24130132 24.88240289 13 7 10.24130132 24.88240289 14 8 11.90011489 26.87804682 15 9 13.49249521 28.94012400 16 9 13.49249521 28.94012400 17 * 0 -3.37312377 12.91562550 18 * 10 15.02427676 31.06279995 * Observation was not used in this analysis Example of Simple linear Regression (SLR) Rate of parasite accumulation in Redfin Pickerel Fish Parasite example using GLM with CLI The GLM Procedure Sum of Residuals -0.0000000 Sum of Squared Residuals 77.0504492 Sum of Squared Residuals - Error SS -0.0000000 PRESS Statistic 110.4690933 First Order Autocorrelation 0.3362460 Durbin-Watson D 1.1402481
77 GOPTIONS DEVICE=CGMflwa GSFMODE=REPLACE GSFNAME=OUT NOPROMPT noROTATE
78 ftext='TimesRoman' ftitle='TimesRoman';
79
80 FILENAME OUT1 'F:\Fall2003\_Disk_Fall03\slrci2.cgm';
81 PROC GPLOT DATA=one; TITLE1 'Regression with confidence bands';
82 PLOT parasite*age=1 parasite*age=2 / OVERLAY HAXIS=AXIS1 VAXIS=AXIS2;
83 AXIS1 LABEL=('Age (years)') ORDER=0 TO 10 BY 1;
84 AXIS2 LABEL=('Parasites') ORDER=0 TO 25 BY 5;
85 SYMBOL1 V=dot c=red I=RLclm95 L=1 W=5 mode=include;
86 SYMBOL2 V=none c=blue I=RLcli95 L=1 W=5 mode=include; run;
NOTE: Regression equation : PARASITE = 4.771251 + 1.827229*AGE.
NOTE: 2 observation(s) contained a MISSING value for the PARASITE * AGE request.
NOTE: Regression equation : PARASITE = 4.771251 + 1.827229*AGE.
NOTE: 2 observation(s) contained a MISSING value for the PARASITE * AGE request.
WARNING: GSFNAME OUT has not been assigned.
NOTE: GSFNAME OUT temporarily assigned to F:\Fall2003\_Disk_Fall03\sasgraph.cgm.
NOTE: 82 RECORDS WRITTEN TO F:\Fall2003\_Disk_Fall03\sasgraph.cgm
87
88
89 GOPTIONS GSFNAME=OUT2;
90 FILENAME OUT2 'F:\Fall2003\_Disk_Fall03\resplot2.cgm';
NOTE: There were 18 observations read from the data set WORK.ONE.
NOTE: PROCEDURE GPLOT used:
real time 0.93 seconds
91 PROC GPLOT DATA=next;
92 TITLE1 'Residual plot';
93 PLOT e*age / HAXIS=AXIS1 VAXIS=AXIS2 vref=0;
94 AXIS1 LABEL=('Age (years)') ORDER=0 TO 10 BY 1;
95 AXIS2 LABEL=('Parasite residuals');
96 SYMBOL1 V=dot c=red I=none L=1 W=5 mode=include; run;
NOTE: 2 observation(s) contained a MISSING value for the E * AGE request.
NOTE: 21 RECORDS WRITTEN TO F:\Fall2003\_Disk_Fall03\resplot2.cgm
97 quit;
NOTE: There were 18 observations read from the data set WORK.NEXT.
NOTE: PROCEDURE GPLOT used:
real time 0.16 seconds