|
Here's what you'll find in this section:
Two-way ANOVA and Nonparametric Inferences
EXAMPLE:\ An agricultural scientist is interested in the corn yield when three different fertilizers are available and corn is planted in four different soil types. The questions he is interested in answering are:
Because we are applying two treatments to our population, we will use two-way ANOVA to analyze this type of problem. We will consider two types of two-way ANOVA:
If we use the model
we have to estimate IJ means and
(a total of IJ+1 parameters) using only IJ observations!
Since we can't estimate all of our parameters, we will change models
(slightly),
where
is the effect of factor A and
is the effect of factor B. Now we only have to estimate I+J+1
parameters, which is now possible. (Actually, we also assume
which leaves us with only I+J-1 parameters to estimate.)
A slightly more general additive model is
where
are the number of replications at each combination of factor A
and factor B levels.
NOTE:\ When k is small, especially when k=1, we are forced to use the additive model. There will be more about this in Section 10.2.2.
The ANOVA table for the additive model is given by
The relevant null hypotheses are
and are tested by
and
, respectively. In words, these hypotheses are
EXAMPLE:\ In a study of automobile traffic and air pollution, air samples taken at four different times and at five different locations were analyzed to obtain the amount of particulate matter present in the air. Is there any difference in true average amount of particulate matter present in the air due either to different sampling times or to different locations?
Notice that in this case, both
and
are significantly greater than one. Thus, there is an effect due both to
time and location.
When the additive model holds, there is no interaction between factors A and B. In other words, the effect of factor A is the same no matter what the level of factor B is. When the additive model doesn't hold, we have to go to a model which allows A and B to interact.
We will use the model
where
,
, and
, but represent it in the form
where
is the interaction of factors A and B.
The relevant null hypotheses are
and are tested by their respective F values in the following ANOVA table.
If we have small samples, the one and two sample t tests and the test of comparing K means are all valid only if we are sampling from normal populations. This week we study methods for comparing the distribution of populations that do not require the normality (or any other distributional assumption). There are two basic points to be made:
1. The distribution-free methods are valid for any distribution of
the populations being compared, that is, if we specify a certain
value, then the true type I error probability is
.
2. If the populations being compared do in fact have the normal distribution, then the previous methods (t tests and so on) are in fact better than the distribution-free methods we will study. They are better in the sense that if the populations are different, then the parametric procedures have a better chance of concluding they are different (that is, they are more powerful).
These two points are illustrated in the ``Comparing Parametric and Nonparametric Tests'' concept lab.
Given n pairs of data, the sign test tests the hypothesis that the median of the differences in the pairs is zero. The test statistic is the number of positive differences. If the null hypothesis is true, then the numbers of positive and negative differences should be approximately the same. In fact, the number of positive differences will have a binomial distribution with parameters n and p. Stataquest will return the p-value associated with the test statistic.
A similar test for the median difference in paired data to be zero consists of sorting the absolute values of the differences from smallest to largest, assigning ranks to the absolute values (rank 1 to the smallest, rank 2 to the next smallest, and so on) and then finding the sum of the ranks of the positive differences. If the null hypothesis is true, the sum of the ranks of the positive differences should be about the same as the sum of the ranks of the negative differences. Again, Stataquest will return the p-value of the test.
This test is used in place of a two sample t test when the
populations being compared are not normal. It requires independent
random samples of sizes
and
. The test is very simple and consists of combining the two samples into
one sample of size
, sorting the result, assigning ranks to the sorted values (giving the
average rank to any `tied' observations), and then letting T be
the sum of the ranks for the observations in the first sample. If the
two populations have the same distribution then the sum of the ranks of
the first sample and those in the second sample should be close to the
same value. Stataquest returns a p value for the null hypothesis
that the two distributions are the same.
This test is the nonparametric version of one way ANOVA and is a
straightforward generalization of the Wilcoxon test for two independent
samples. If we have K independent samples of sizes
, we combine all the samples into one large sample, sort the result from
smallest to largest and assign ranks (again assigning the average rank
to any observation in a group of tied observations), and then find
, the average of the ranks of the observations in the ith sample.
The test statistic is then
and reject the null hypothesis that all K distributions are
the same if
. Again, Stataquest will return the p value for the test.
Applicable StataQuest Commands:
Statistics
ANOVA
Two-way this also includes interaction plots
Statistics
Nonparametric test
Sign test for Wicoxon signed-rank test
Statistics
Nonparametric test
Mann-Whitney for Wicoxon-Mann-Whitney rank sum test
Statistics
Nonparametric test
Kruskal-Wallis for Kruskal-Wallis test
The webmaster and author of this Math Help site is Graeme McRae.