The objective of this assignment is to run a regression analysis and to do some selected tests of hypothesis.   You should already have a program that does the following. 

a) Inputs 3 variables (DATE INTERVAL DURATION) from the dataset (ex0723.csv).   

b) Arranges for the usual comments and title, and HTML output dataset to the C:\temp\ directory. 

c) Does a scatter plot of the data using statements similar to the following:

options ps=51 ls=111;

proc plot data=geyser;

   plot duration * interval;

run;

d) Runs a regression analysis using statements similar to the following: 

Title2 'Regression with labels and simple output';

proc reg data=geyser;

   model duration = interval;

run;

 

1) Modify this previous program to do the following.  Everything can be done is SAS. 

a) Get estimates of the slope and intercept with tests of these values against an hypothesized value of zero.  Get a 95 percent confidence interval for each of these values. 

b) The data in this example is given in minutes for both the interval and the duration of the eruption.  Test the hypothesis that each minute of “interval”, the waiting time, produces one additional second of eruption.  Note that as a decimal, one second is 1/60 = 0.016666667 minutes. 

c) Current internet information for Old Faithful claims that the geyser erupts about every 45 minutes.  How long a duration can be expected for a 45 minute wait?  Get this predicted value and place a 95% confidence interval on the duration of an individual eruption with a 45 minute wait period.  

d) The positive slope for this regression means that the eruption lasts longer if the interval between eruptions is longer.  How much longer is the eruption (in seconds) for each additional minute of waiting time (i.e. interval between eruptions). 

e) Get a Shapiro-Wilk test of normality. 

 

 


When this file is complete email me a copy and make sure to save a copy. 

In addition to a copy of the program file, add the following 6 comments to the end of the program with appropriate choices or information in the blanks.  You can copy these statements directly into your program and modify them as needed. 

** 1) The value of the slope does/does not differ significantly from zero.  ;

** 2) The value of the slope is __________________ .  The lower and upper bounds are ___________________ and ___________________ .  ;

** 3) Each additional minute in the interval does/does not produce an additional one (1) second of eruption duration.  The P value for this test is ________________ .  ;

** 4) The duration of the eruption following a 45 minute wait is _________________ .  The lower and upper bounds are ________________ and ________________ .  ;

** 5) The slope for this regression indicates that the eruption lasts an additional _______________ seconds for each additional minute of waiting time. ;

** 6) The assumption of normality is/is not met.  The P value for this test is ________ ;