Created: 2017-02-26 Sun 21:29
Let \(\hat{\mu}_Y\) be an estimator of \(\mu_Y\). The estimator \(\hat{\mu}_Y\) is said to be unbiased if
\[\mathrm{E}(\hat{\mu}_Y) = \mu_Y\]
where \(\mathrm{E}(\hat{\mu}_Y)\) is the expectation of the sampling distribution of \(\hat{\mu}_Y\).
\(\overline{Y}\) is an unbiased estimator of \(\mu_Y\).
In Lecture 2, we have already shown that \(\mathrm{E}(\overline{Y}) = \mu_Y\) when \(Y_i \sim IID(\mu_Y, \sigma^2_Y)\) for \(i=1, \ldots, n\).
\(Y_1\) is also an unbiased estimator.
\(\mathrm{E}(Y_1) = \mu_Y\) when \(Y_1\) is drawn from \(IID(\mu_Y, \sigma^2_Y)\).
\(\hat \mu_Y\) is a consistent estimator of \(\mu_Y\) if \(\hat{\mu}_Y\) is convergent in probability to \(\mu_Y\).
That is, \(\hat{\mu}_Y\) is consistent if \[\hat{\mu}_Y \xrightarrow{\text{ p }} \mu_Y \text{ as } n \rightarrow \infty\]
\(\overline{Y}\) is a consistent estimator of \(\mu_Y\).
The law of large number ensures that \(\overline{Y} \xrightarrow{\text{ p }} \mu_Y\) is true when \(Y_i \sim IID(\mu_Y, \sigma^2_Y)\) for \(i=1, \ldots, n\), and \(\sigma^2_Y < \infty\).
The first order condition for the minimization problem is
The language
One thing should be kept in mind is that we usually do not say "accept the null hypothesis" when the hypothesis test is in favor of the null, but say "fail to reject the null".
So given that \(\sigma_Y\) is known, the z-statistic is computed as
\[ z = \frac{\overline{Y} - \mu_{Y,0}}{\sigma_{\overline{Y}}} = \frac{\overline{Y} - \mu_{Y,0}}{\sigma_Y/\sqrt{n}} \]
The sample variance \(s^2_Y\) is is an estimator of the population variance \(\sigma^2_Y\), which is computed as
\[ s^2_Y = \frac{1}{n-1}\sum^n_{i=1} (Y_i - \overline{Y})^2 \]
The sample variance, \(s^2_Y\), is a consistent estimator of the population variance, that is, as
\[ n \rightarrow \infty, s^2_Y \xrightarrow{\text{ p }} \sigma^2_Y\]
The standard error of \(\overline{Y}\), denoted as \(SE(\overline{Y})\) or \(\hat{\sigma}_{\overline{Y}}\), is an estimator of the standard deviation of \(\overline{Y}\), \(\sigma_{\overline{Y}}=\sigma_Y/\sqrt{n}\), with \(s_Y\) replacing \(\sigma_Y\).
\[ SE(\overline{Y}) = \hat{\sigma}_{\overline{Y}} = \frac{s_Y}{\sqrt{n}} \]
\[ t = \frac{\overline{Y} - \mu_{Y,0}}{SE(\overline{Y})} = \frac{\overline{Y} - \mu_{Y,0}}{s_Y/\sqrt{n}} \]
Figure 1: An illustration of a two-sided test
The power of the test is the probability that the test correctly rejects the null when the alternative is true. That is,
\[\text{power} = 1 - \mathrm{Pr}(\text{type II error})\]
The p-value provides more information than the significance level.
In fact, the p-value is also named the marginal significance level, which the smallest significance level at which you can reject the null hypothesis.
Mathematically, the p-value is computed as
Step 3: we plug in the definition of \(t\) and solving for \(|t| \leq 1.96\), we get
Let \(Y_{m, i}\) for \(i=1, \ldots, n_m\) be \(n_m\) i.i.d. samples from the population of earnings of male college graduate, i.e.,
\[ Y_{m,i} \sim IID(\mu_m, \sigma^2_m) \text{ for } i=1,\ldots,n_m \]
Let \(Y_{w, j}\) for \(j=1, \ldots, n_w\) be \(n_w\) i.i.d. samples from the population of earnings of female college graduate, i.e.,
\[ Y_{w,j} \sim IID(\mu_w, \sigma^2_w) \text{ for } j=1,\ldots,n_w \]
The hypothesis to be tested is whether the mean earnings for the male and female graduates differ by a certain amount, that is,
\[ H_0: \mu_m - \mu_w = d_0,\; \text{ vs. }\: H_1: \mu_m - \mu_w \neq d_0 \]
When \(\sigma^2_m\) and \(\sigma^2_w\) are unknown, we the t statistic \[ t = \frac{(\overline{Y}_m - \overline{Y}_w) - d_0}{SE(\overline{Y}_m - \overline{Y}_w)} \xrightarrow{d} N(0, 1) \] where
Calculate the p value: The p value for the two-sided test is calculated as
\[ p\text{-value} = 2\Phi(-|t|) \]
The 95% confidence interval for \(d = \mu_m - \mu_w\) is
\[ (\overline{Y}_m - \overline{Y}_w) \pm 1.96SE(\overline{Y}_m - \overline{Y}_w) \]
Figure 2: The scatterplot between test scores and student-teacher ratios
We should emphasize that the correlation coefficient is a measure of linear association between \(X\) and \(Y\).
Figure 3: Scatterplots for four hypothetical data sets