|
Here's what you'll find in this section:
There are two basic questions asked by inferential statistics:
A random sample of size n from a population is a set of n elements from the population that are chosen in such a way that every set of n elements has the same probability of being chosen.
Computers are very good at selecting random samples and we will use Stataquest throughout the course to choose samples.
A statistic is a number calculated from a sample. Examples
include the sample mean
, the sample variance,
, and so on.
Much of what we do in this course consists of
In particular, we consider the following situations:
One remarkable fact about the normal distribution is the fact
that if we took many samples of size n from a population
having mean
and variance
(any distribution we want), then the population of
's would be approximately normally distributed with mean
and variance
. The larger n is, the better the approximation is.
These facts are known collectively as the Central Limit Theorem and allow us to make inferences about population means using the normal distribution no matter what the distribution of the population being sampled from. See the ``Central Limit Theorem'' concept lab for more about this.
A particularly useful example of the Central Limit Theorem is
when we are sampling from a 0-1 population. In this case, the number
of 1's observed has the binomial distribution which is difficult to
make calculations from. But notice that
for the sample is in fact the sample proportion p and the
Central Limit Theorem says that
is approximately normal with mean equal to the mean of the 0-1
population (also known as
, the proportion of 1's in the population) and variance
. See the ``Z, t, Chi-square, F'' concept lab.
The basic idea of statistical inference is that we can determine
(using what is called sampling distributions) the likely values of a
number that measures how far a statistic is from the corresponding
parameter. For example, we can measure how far the statistic
is from the parameter
by calculating the number (called a ``transformed statistic'')
and noting that if
is close to
, then Z should be close to 0. Similarly, we can measure how
close
is to
by calculating
which should be close to n-1 if
is close to
(we will see in a minute why we use the symbols Z and
to represent the numbers).
In the table below, we write down a number of transformed
statistics and what they should be close to. You may wonder why we
use these transformations rather than some simple measure of
distance such as
. The answer is that statisticians have learned over the past 100
years that the more complicated transformations listed in the table
allow them to find the desired likely values while simple distance
measures are much more difficult to work with.

So what good are these transformed statistics? As we said, we know what they should be close to if our statistic is close to the true parameter. The miracle is that (if certain assumptions are met) statisticians have determined mathematically intervals of the real line that a transformed statistic will fall into with specified probability.
For example, the first transformed statistic is labeled Z because statisticians have shown that if the population is normally distributed, then the transformed statistic has the Z distribution (the standard normal curve). Thus if we repeatedly selected random samples of size n and and calculated Z for each one, then we know that 95% of the samples will have a Z between -1.96 and 1.96. (You will use the ``Sampling Distribution'' concept lab to experiment with this idea).
Thus, how close is
to
in this situation? We saw earlier in this week that 95% of the area
under a Z curve fals between -1.96 and 1.96. This tells us
that 95% of all samples will have
that is, 95% of all samples will have
within
. For example, 95% of all samples of 25 IQ's (remember that IQ's are
thought to be normally distributed with
and
) will have
in
, that is, 95% of all samples will have
within
3 of
.
Applicable StataQuest Commands:
Data
Generate/Replace
Random numbers to generate random Normals, Binomials, etc.
Data
Generate/Replace
Formula to generate z-scores
Calculator
Statistical tables
Normal to find probabilities
Calculator
Inverse statistical tables
Normal to find z-scores
%
The webmaster and author of this Math Help site is Graeme McRae.