Lab #11: Post-ANOVA Techniques
Due: Next Week (November 11, 2003)


Recall from last lab the statements required to perform an analysis of variance.

proc glm data = _____;
  class group;
  model response = group;
run;

And if we wanted to test to determine if the residual were normal we included the following statement to output the data, and analyzed this new data set using proc plot and proc univariate.

output out=two p=yhat r=e ;

This gave us the data required to test two assumptions.  First, we tested whether or not the residuals were normally distributed, which was accomplished using the univariate procedure.  We could also check this graphically by using the residual plot.  The residual plot also gives us an idea of whether or not the variances were equal over all levels of the class variable.  The residual plot constructed last week indicated that this might not be the case.  We can also include a statement in the glm procedure that will test this assumption for us.

means group / hovtest=bartlett hovtest=bf hovtest=levene(type=abs) hovtest=obrien welch;

  • This statement can be typed anywhere between the model statement and the run statement;
  • Levene's test is one of the most commonly used tests for homogenous variances.
  • O'brien's and Brown & Forsythe's tests are modifications of Levene's test.
  • Bartlett's test for equality can be inaccurate if the data are not normally distributed.
  • Welch's is a weighted ANOVA that is meant to address potential problems of non-homogenous variances.
  • For these tests, the null hypothesis states that the variances are homogenous, so you reject this null hypothesis when the p-value is less than 0.05.

Another analysis we may be interested in is comparing the individual means for each level of the class variable.  When you first perform an ANOVA you may often find that the class variable you are analyzing is found to be significant (p-value less than 0.05), indicating that the mean of at least one group differs from the others.  This is important, but we often want to know which means differ from other means, and also may want to recommend a specific product (i.e resulting in the highest or lowest mean statistically different from the others).  To do this you can use a means statement in proc glm.  It is important to note that we cannot always use the means statement.  When analyzing multiple factors (i.e. multiple classes) and unbalanced data we will use the lsmeans statement, which makes some adjustments.  The means statement can once again be typed anywhere between the model statement and the run statement;

means group / lsd tukey bon scheffe duncan;

  • These different types of post-ANOVA comparisons differ in how conservative they are (more conservative, lower probability of making a Type I error) and how powerful they are (more powerful, higher probability of detecting a difference between two means).
  • These comparisons assign grouping (in the form of letters) to each mean, and two means are different if they do not have at least one grouping letter in common.
  • Alternatively, these often give a minimum significant difference, and if two means differ by a quantity larger than this then they are significantly different.

This statement provides all pairwise comparisons, but we are often interested in comparing linear combinations of more than two means. To do this we use a contrast statement.  For example, suppose we have four different levels of the variable group.  SAS automatically assigns the levels in ascending order (i.e. suppose the levels were A B C D, then you would need to assign the coefficient for A first, followed by B etc.), and we must give a coefficient for each level.  And suppose we wanted to test whether the mean of group one plus the mean of group two equal the mean of group four.  Our statement would be as follows:

contrast 'Group 1 & 2 vs. Group 4' group 1 1 0 -2;

We can also get an estimate for the linear combination being tested as follows:

estimate 'Group 1 & 2 vs. Group 4' group 1 1 0 -2;

  • The information between the quotation marks ( ' ) is simply a label for your output.
  • After this, you specify the name of the varible for which you want to make a contrast.  In this case, we are contrasting levels of group, so this is the variable name you enter here.
  • Following group are the coefficients.  Remember you should have one coefficient for each level of the class variable (in this case there were four levels).  The sum of all coefficients must equal 0.
  • The null hypothesis states that the sum of means with a positive coefficient is equal to the sum of means with a negative coefficient.  In this case, the null is:  1*Mean (A) + 1*Mean (B) = 2*Mean (D)
  • If the p-value is less than 0.05, you reject the null hypothesis, which suggests that the relationship outlined in the linear combination is not true.
  • Typically you only test contrasts that are suggested by the context of the problem itself and will not choose contrasts at random.

Assignment

Scientists were interested in examining the effect of glucose on the release of insulin.  Twelve identical pancreas specimens were selected and randomly assigned one of three glucose treatments: Low, Med, and Hi.  The amount of insulin released by the tissue was recorded.  The data appear below:

Glucose Conc.    Amount of Insulin Released
Low                    1.59
Low                    1.73
Low                    3.64
Low                    1.97
Med                    3.36
Med                    4.01
Med                    3.49
Med                    2.89
Hi                     3.92
Hi                     4.82
Hi                     3.87
Hi                     5.39

SAS Program
 

  1. Input the data.  Remember that glucose concentration is a character variable.
  2. Use proc glm to determine whether the amount of insulin released was the same for all glucose concentrations. Include a statement to output your residuals into a new data set.
  3. If the amount of insulin released was different, include a statement to conduct all pairwise comparisons (you need to include all adjustments covered).

means Glucose / lsd tukey bon scheffe duncan;

  1. In the glm procedure, include a statement to test whether or not the variances are homogenous (include all type of HOV tests covered).

means Glucose / hovtest=bartlett hovtest=bf hovtest=levene(type=abs) hovtest=obrien welch;

  1. In the glm procedure, include a statement to test the following hypothesis:  2*MEAN(Hi) = 1*MEAN(Low) + 1*MEAN(Med)

contrast 'Glucose Low & Med vs. Glucose Hi' glucose 1 1 -2;

estimate 'Glucose Low & Med vs. Glucose Hi' glucose 1 1 -2;

  1. Use proc plot to construct a residual plot.
  2. Use proc univariate to determine whether or not the residuals are normal.

Questions
 

  1. Is the amount of insulin different for the different glucose concentrations?  Include p-value.
  2. Compare the amount of insulin released between the Med and Low glucose treatments (Hint: this is a pairwise comparison).  Are they significantly different based on the LSD test?  Are they significantly different based on Tukey's test?
  3. Using the results from the HOV test and the residual plot are the variances homogenous?
  4. Test your linear combination (contrast).  Report your p-value.
  5. Are the residuals normal?  Include information from your output used to make your conclusion.