Created: 2017-03-30 Thu 08:32
\[ H_0: \beta_1 = \beta_{1,0}, H_1: \beta_1 \neq \beta_{1,0} \]
The general form of the t-statistic is
The t-statistics for testing \(\beta_1\) is
where
The p-value is the probability of observing a value of \(\hat{\beta}_1\) at least as different from \(\beta_{1,0}\) as the estimate actually computed (\(\hat{\beta}^{act}_1\)), assuming that the null hypothesis is correct.
Figure 1: Calculating the p-value of a two-sided test when \(t^{act}=-4.38\)
For such a test, we can set up the null hypothesis and the one-sided alternative hypothesis as
\[ H_0: \beta_1 = \beta_{1,0} \text{ vs. } H_1: \beta_1 < \beta_{1,0} \]
So the 95% confidence interval for the change in \(Y\) when \(X\) changes by \(\Delta X\) is
A binary variable takes on values of one if some condition is true and zero otherwise, which is also called a dummy variable, a categorical variable, or an indicator variable.
The null v.s. alternative hypothesis
\[ H_0:\, \beta_1 = 0 \text{ vs. } H_1:\, \beta_1 \neq 0 \]
The t-statistic
\[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} \]
The 95% confidence interval
\[ \hat{\beta}_1 \pm 1.96 SE(\hat{\beta}_1) \]
We use a binary variable \(D\) to represent small and large classes.
Using the OLS estimation, the estimated regression function is
Figure 2: Homoskedasticity
Figure 3: Heteroskedasticity
Recall that we can write \(\hat{\beta}_1\) as
If \(u_i\) for \(i=1, \ldots, n\) is homoskedastic and \(\sigma^2\) is known, then
The homoskedasticity-only estimator of the variance of \(\hat{\beta}_1\) is
The heteroskedasticity-robust standard error is
where
which is also referred to as Eicker-Huber-White standard errors.
In R, you can use the following codes
library(lmtest) model1 <- lm(testscr ~ str, data = classdata) coeftest(model1, vcov = vcovHC(model1, type="HC1"))
We have already known the least squares assumptions:
for \(i = 1, \ldots, n\),
For \(\mathbf{X} = [X_1, \ldots, X_n]\)
The Gauss-Markov Theorem for \(\hat{\beta}_1\):
If the Gauss-Markov conditions hold, then the OLS estimator \(\hat{\beta}_1\) is the Best (most efficient) Linear conditionally Unbiased Estimator (BLUE).
Any linear estimator \(\tilde{\beta}_1\), it can be written as
where the weights \(a_i\) for \(i = 1, \ldots, n\) depend on \(X_1, \ldots, X_n\) but not on \(Y_1, \ldots, Y_n\).
\(\tilde{\beta}_1\) is conditionally unbiased means that
By the Gauss-Markov conditions, we can have
\(\hat{\beta}_1 = \frac{\sum_i (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_i (X_i - \bar{X})^2} = \frac{\sum_i (X_i - \bar{X})Y_i}{\sum_i (X_i - \bar{X})^2} = \sum_i \hat{a}_i Y_i\)
where the weights are \[ \hat{a}_i = \frac{X_i - \bar{X}}{\sum_i (X_i - \bar{X})^2}, \text{ for } i = 1, \ldots, n \]
Since \(\hat{\beta}_1\) is a linear conditionally unbiased estimator, we must have
\[ \sum_i \hat{a}_i = 0 \text{ and } \sum_i \hat{a}_i X_i = 1 \]
which can be simply verified.
Any violation of the Gauss-Markov conditions will result in the OLS estimators that are not BLUE.
Violation | Cases | Consequences | Remedies |
---|---|---|---|
\(E(u \mid X) \neq 0\) | omitted variables, endogeneity | biased | more \(X\), IV method |
\(\mathrm{Var}(u_i\mid X)\) not constant | heteroskedasticity | inefficient | WLS, GLS, HCCME |
\(E(u_{i}u_{j}\mid X) \neq 0\) | autocorrelation | inefficient | GLS, HAC |
The classical assumptions of the least squares estimation:
For \(i = 1, 2, \ldots, n\)
The t-statistic: \[t = \frac{\hat{\beta}_1 - \beta_{1,0}}{\hat{\sigma}_{\hat{\beta}_1}}\]
where \(\hat{\sigma}^2_{\hat{\beta}_1} = \frac{s^2_u}{\sum_i (X_i - \bar{X})^2}\) and \(s^2_u = \frac{1}{n-2}\sum_i \hat{u}_i^2 = SER^2\).
When the classical least squares assumptions hold, the t-statistic has the exact distribution of \(t(n-2)\), i.e., the Student's t distribution with \((n-2)\) degrees of freedom.
\[ t = \frac{\hat{\beta}_1 - \beta_{1,0}}{\hat{\sigma}_{\hat{\beta}_1}} \sim t(n-2) \]