Created: 2017-03-30 Thu 18:04
Merriam-Webster gives the following definition of the word "regress":
A very simple functional form of a conditional expectation is a linear function. That is, we can model the conditional mean as follows,
The above equation is a simple linear regression function.
Let's introduce a regression analysis with the application of test scores versus class sizes in California school districts.
Can reducing class sizes increase students' test scores?
Compute the expected values of test scores, given the different class sizes.
The effect of class size on test scores is
The population regression function or the population regression line
We can lump all these factors into a single term, and set up a simple linear regression model as follows,
Now we have set up the simple linear regression model,
What is \(\beta_1\) and \(\beta_0\) represent in the model?
Then, we get
\[ \beta_1 = \frac{\Delta TestScore}{\Delta ClassSize} \]
That is, \(\beta_1\) measures the change in the test score resulting from a one-unit change in the class size.
When \(TestScore\) and \(ClassSize\) are two continuous variable, we can write \(\beta_1\) as
\[\beta_1 = \frac{\mathrm{d} TestScore}{\mathrm{d} ClassSize} \]
Then a simple linear regression model that associates \(Y\) with \(X\) is
 
Figure 1: The Population Regression Line
The OLS estimators are the solution to the following minimization problem:
where \(S(b_0, b_1)\) is a function of \(b_0\) and \(b_1\)
Evaluated at the optimal solution \((\hat{\beta}_0, \hat{\beta}_1)\), the FOCs are
From the first condition, we have
From the second condition, we have
Collecting terms in the expression in \(\hat{\beta}_1\), we have
In sum, the OLS estimators for \(\beta_0\) and \(\beta_1\) as
\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\]
| Population | Sample | |
|---|---|---|
| Regression functions | \(\beta_{0} + \beta_{1}X_{i}\) | \(\hat{\beta}_0 + \hat{\beta}_1 X_i\) | 
| Parameters | \(\beta_{0}\), \(\beta_{1}\) | \(\hat{\beta}_{0}\), \(\hat{\beta}_{1}\) | 
| Errors vs residuals | \(u_{i}\) | \(\hat{u}_{i}\) | 
| The regression model | \(Y_i = \beta_0 + \beta_1 X_i + u_i\) | \(Y_i = \hat{\beta}_0 + \hat{\beta}_1 X_i + \hat{u}_{i}\) | 
\[TestScore = \beta_0 + \beta_1 ClassSize + OtherFactors\]
Some commonly used summary statistics are computed, including the mean, standard deviation, median, minimum, maximum, and quantiles (percentiles), etc.
| Average | S.t.d. | 25% | 50% | 75% | |
|---|---|---|---|---|---|
| TestScore | 654.16 | 19.05 | 640.05 | 654.45 | 666.66 | 
| STR | 19.64 | 1.89 | 18.58 | 19.72 | 20.87 | 
 
\[\widehat{TestScore} = 698.93 - 2.28 \times STR\]
 
\[\hat{u}_i = Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i = (Y_i - \overline{Y}) - \hat{\beta}_1 (X_i - \overline{X})\]
\[\sum_{i=1}^n \hat{u}_i = \sum_{i=1}^n (Y_i - \overline{Y}) - \hat{\beta}_1 \sum_{i=1}^n (X_i - \overline{X}) = 0\]
Note that \(Y_i = \hat{Y}_i + \hat{u}_i\). So \[\sum_{i=1}^n Y_i = \sum_{i=1}^n \hat{Y}_i + \sum_{i=1}^n \hat{u}_i = \sum_{i=1}^n \hat{Y}_i\] It follows that \(\overline{\hat{Y}} = (1/n)\sum_{i=1}^n \hat{Y}_i = \overline{Y}\).
\(R^2 = 0\) when \(\hat{\beta}_1 = 0\).
 
Figure 4: An illustration of \(E(u|X=x)=0\)
\[ E(u_i | X_i) = 0 \Rightarrow \mathrm{Cov}(u_i, X_i) = 0 \]
A simple proof:
where the law of iterated expectation is used twice at the second equality.
It follows that \[\mathrm{Cov}(u_i, X_i) \neq 0 \Rightarrow E(u_i|X_i) \neq 0\]
\[0 < E(X^4_i) < \infty \text{ and } 0 < E(Y_i^4) < \infty\]
 
Figure 5: How an outlier can influence the OLS estimates
\[\hat{\beta}_1 = \frac{\sum_{i=1}^n (X_i - \overline{X})(Y_i - \overline{Y})}{\sum_{i=1}^n (X_i - \overline{X})^2}\]
The numerator in \(\hat{\beta}_1\) is
Then
We can prove that \(\hat{\beta}_1\) is asymptotically normally distributed as \[ \hat{\beta}_1 \xrightarrow{ \text{ d }} N\left( \beta_1, \sigma^2_{\hat{\beta}_1}\right) \] where
Similarly, we can show that \[\hat{\beta}_0 \xrightarrow{\text{ d }} N(\beta_0, \sigma^2_{\hat{\beta}_0})\] where