EXST 7005 Lab #11, spring 2004
Lab #11: Post ANOVA Techniques
Due: Week of April 19


Lab #10 introduced 1-way ANOVA as a method of evaluating the means of two or more populations. This is a powerful technique with applications to many areas of statistical analysis. This week we will look at some of the additional information that can be obtained from the analysis of variance.
A simple ANOVA program containing only class and model statements will return the ANOVA table detailing source, degrees of freedom, Type I and III sums of squares, mean square errors, F-values for individual tests, and their associated p-values. While this is all useful information, it only tells us whether or not the means are equal across all of the classes of interest. If the F-test shows that the means are not all equal, we need to know how to detect which means are different and to find meaningful conclusions from this information.
Using a means statement in the glm procedure will provide the mean value of the dependent variable at each treatment level. Additionally, we can use post hoc tests as options in this statement to perform a variety of multiple comparisons that provide information about the differences between treatment levels. To do multiple comparisons, type a slash (/) after the means statement and then indicate which post hoc test you want SAS to perform. Detailed descriptions of these tests begin on page 253 of your textbook and on page 179 of your course notes. For this lab, we will perform all tests so that you can get a feel for the differences between them. Another set of options available in the means statement allows us to test for homogeneity of variance using a variety of tests. Again after the slash, type in hovtest= and then bartlett, bf, levene, or obrien welch. Descriptions of these tests can be found on page 179 of your course notes.
The lsmeans statement provides adjusted least-square means for the main effects. In balanced designs these are the same as the raw means, but in unbalanced designs you need to use lsmeans instead. Options available after the slash (/) in the lsmeans statement include the standard error of each mean (stderr), probabilities for pairwise differences (pdiff), and which post-hoc test, if any, you want SAS to use (adjust=). Contrasts can be used to test other meaningful comparisons, and are discussed at length beginning on page 242 of your textbook.
Below is a sample SAS program and output for you to become familiar with before writing and interpreting your own.


The following program uses one-way ANOVA to analyze the different amounts of insulin released as a result of varying levels of glucose concentration.


dm 'log;clear;output;clear';
options nodate nocenter nonumber ls=100 ps=100;
title 'Example of 1-Way ANOVA';

data one;
input c$ y @@;
cards;
l 1.59 l 1.73 l 3.64 l 1.97
m 3.36 m 4.01 m 3.49 m 2.89
h 3.92 h 4.82 h 3.87 h 5.39
;

proc print;
title2'Insulin Release as a function of glucose concentrations';
run;

proc glm;
title2'GLM Example';
class c;
model y=c;
output out=two p=yhat residual=resid;
means c/hovtest=bartlett hovtest=bf hovtest=levene hovtest=obrien welch lsd tukey bon scheffe duncan;
lsmeans c/stderr pdiff adjust=tukey;
contrast 'Average of Low & Med vs. Average of High' c 1 1 -2;
run;

proc univariate data=two normal plot;
title2'Tests of Assumptions';
var resid;
run;

options ls=70 ps=40;
proc plot data=two;
plot resid*yhat;
run;

quit;



Next are selected portions of the output, followed by brief descriptions of what they mean..


A.

Dependent Variable: y

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        2     10.29665000      5.14832500       9.31    0.0064

Error                        9      4.97935000      0.55326111

Corrected Total             11     15.27600000


This is the same as the ANOVA table from Lab #10. The p-value indicates that the means are not all equal, so we need to do further tests to determine in what ways they are not equal.


B.

Example of 1-Way ANOVA
GLM Example

The GLM Procedure

         Levene's Test for Homogeneity of y Variance
        ANOVA of Squared Deviations from Group Means

                      Sum of        Mean
Source        DF     Squares      Square    F Value    Pr > F

c              2      0.5407      0.2703       0.91    0.4352
Error          9      2.6626      0.2958


        O'Brien's Test for Homogeneity of y Variance
         ANOVA of O'Brien's Spread Variable, W = 0.5

                      Sum of        Mean
Source        DF     Squares      Square    F Value    Pr > F

c              2      0.9612      0.4806       0.58    0.5771
Error          9      7.3961      0.8218


   Brown and Forsythe's Test for Homogeneity of y Variance
       ANOVA of Absolute Deviations from Group Medians

                      Sum of        Mean
Source        DF     Squares      Square    F Value    Pr > F

c              2      0.2056      0.1028       0.38    0.6975
Error          9      2.4670      0.2741


Bartlett's Test for Homogeneity of y Variance

Source        DF    Chi-Square    Pr > ChiSq

c              2        1.2700        0.5299


The above are tests for homogeneity of variance asked for as options in the means statement of the program. In this case all four tests return the same conclusion, but that is not always the case. Your choice of tests will depend on the nature of your research.


C.

Bonferroni (Dunn) t Tests for y

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type
II error rate than REGWQ.


Alpha                              0.05
Error Degrees of Freedom              9
Error Mean Square              0.553261
Critical Value of t             2.93332
Minimum Significant Difference   1.5428


Means with the same letter are not significantly different.


 Bon
Groupi
  ng            Mean      N    c

     A        4.5000      4    h
     A
B    A        3.4375      4    m
B
B             2.2325      4    l


This is Bonferroni's post-hoc method. Notice the A's and B's alongside the means. Those with the same letter next to them are not significantly different. So in this case low and medium are equal, medium and high are equal, but low and high are not statistically equal. The other post-hoc methods are interpreted in the same manner. The first 'paragraph' of output identifies the critical values being used for the calculations, while the second portion shows the results.


D.

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Tukey

                         Standard                  LSMEAN
c        y LSMEAN           Error    Pr > |t|      Number

h      4.50000000      0.37190762      <.0001           1
l      2.23250000      0.37190762      0.0002           2
m      3.43750000      0.37190762      <.0001           3


       Least Squares Means for effect c
     Pr > |t| for H0: LSMean(i)=LSMean(j)

            Dependent Variable: y

i/j              1             2             3

   1                      0.0050        0.1629
   2        0.0050                      0.1085
   3        0.1629        0.1085


This section shows one of the results from the lsmeans statement. Notice the lsmean number in the right-hand column. This is a group identifier. So high is 1, low is 2, and medium is 3. The lower table shows p-values for the pairwise comparisons of each possible pair of treatment levelmeans. In this case you can see that only 1 and 2 are significantly different from one another. Looking to the first 'paragraph,' you can see that this means that high and low are different.


E.

Dependent Variable: y

Contrast                                       DF    Contrast SS    Mean Square   F Value   Pr > F

Average of Low & Med vs. Average of High        1     0.01353750     0.01353750      0.02   0.8792


The contrast shows the title that was specified and the corresponding F-test. In this case we would accept the null hypothesis that average release of insulin from high concentrations of glucose is equal to the average of release with low plus release with medium.




ASSIGNMENT


The following data describe test scores of students in classes taught by five different instructors.


Instr 1 Instr 2 Instr 3 Instr 4 Instr 5
11.6 8.5 14.5 12.3 13.9
10.0 9.7 13.3 12.9 16.1
10.5 6.7 14.5 11.4 14.3
10.6 7.5 14.8 12.4 13.7
10.7 6.7 14.4 11.6 14.9

1. Input and print the data.
2. Test the hypothesis that the mean scores of all five classes are equal. Remember to ouptut the residuals and test for normality. Report your conclusion, along with F-value, p-value, and degrees of freedom.
3. If the means are not equal, use pairwise comparisons to determine which means are different. Try both means and lsmeans, along with several different methods of testing. Report the appropriate conclusions along with support from your SAS output.
4. Construct a contrast to test the hypothesis that the average of Classes 3 & 4 equals the average of Classes 1 & 5 and report your conclusions.