Created: 2017-02-27 Mon 10:35
Figure 1: An illustration of random variable
Random variables can take different types of values
The probability mass function. Let \(X\) be a discrete random variable. The probability distribution of \(X\) (or the probability mass function), \(p(x)\), is
\(X\) | 1 | 2 | 3 | Sum |
---|---|---|---|---|
\(\mathrm{P}(x)\) | 0.25 | 0.50 | 0.25 | 1. |
The cumulative probability distribution (or the cumulative distribution function, c.d.f.):
Let \(F(x)\) be the c.d.f of \(X\). Then \(F(x) = \mathrm{Pr}(X \leq x)\).
\(X\) | 1 | 2 | 3 | Sum |
---|---|---|---|---|
\(\mathrm{P}(x)\) | 0.25 | 0.50 | 0.25 | 1 |
C.d.f. | 0.25 | 0.75 | 1 | -- |
Figure 2: The c.d.f. of a discrete random variable
The Bernoulli distribution
Figure 3: The p.d.f. and c.d.f. of a continuous random variable (the normal distribution)
Let \(\mathrm{E}(X) = \mu_X\). Then the variance of \(X\),
Let \(Y = a + bX\), then
The skewness of a distribution provides a mathematical way to describe how much a distribution deviates from symmetry.
\[ \text{Skewness} = \mathrm{E}(X - \mu_X)^{3}/\sigma_{X}^{3} \]
The kurtosis of the distribution of a random variable \(X\) measures how much of the variance of \(X\) arises from extreme values, which makes the distribution have "heavy" tails.
\[ \text{Kurtosis} = \mathrm{E}(X - \mu_X)^{4}/\sigma_{X}^{4} \]
Rain (\(X=0\)) | No rain (\(X=1\)) | Total | |
---|---|---|---|
Long commute (\(Y=0\)) | 0.15 | 0.07 | 0.22 |
Short commute (\(Y=1\)) | 0.15 | 0.63 | 0.78 |
Total | 0.30 | 0.70 | 1 |
For any two events \(A\) and \(B\), the conditional probability of A given B is defined as
Figure 5: An illustration of conditional probability
For discrete random variables, the conditional mean of \(Y\) given \(X=x\) is
For continuous random variables, it is computed as
The law of iterated expectation:
\[ \mathrm{E}(Y) = E \left[ \mathrm{E}(Y|X) \right] \]
The covariance of two discrete random variables \(X\) and \(Y\) is
The correlation coefficient of \(X\) and \(Y\) is
\[ \mathrm{corr}(X, Y) = \rho_{XY} = \frac{\mathrm{Cov}(X, Y)}{\left[\mathrm{Var}(X)\mathrm{Var}(Y)\right]^{1/2}} = \frac{\sigma_{XY}}{\sigma_{X}\sigma_{Y}} \]
If \(X\) and \(Y\) are independent, then
Then, we can prove that \(\mathrm{Cov}(X, Y) = 0\) and \(\mathrm{corr}(X, Y)=0\).
It follows that \(\mathrm{Cov}(X,Y) = \mathrm{E}(XY) - \mathrm{E}(X) \mathrm{E}(Y) = 0\) and \(\mathrm{corr}(X, Y)=0\).
The following properties of \(\mathrm{E}(\cdot)\), \(\mathrm{Var}(\cdot)\) and \(\mathrm{Cov}(\cdot)\) are useful in calculation,
Figure 6: The normal probability density
Generally, for any two number \(c_1 < c_2\) and let \(d_1 = (c_1 - \mu)/\sigma\) and \(d_2 = (c_2 - \mu)/\sigma\), we have
Figure 7: The probability density function of chi-squared distributions
Figure 8: The probability density function of student t distributions
Figure 9: The probability density function of F distributions
Figure 10: An illustration of the law of large numbers
Let \(F_1, F_2, \ldots, F_n\) be a sequence of cumulative distribution functions corresponding to a sequence of random variables, \(S_1, S_2, \ldots, S_n\). Then the sequence of random variables \({S_n}\) is said to converge in distribution to a random variable \(S\) (denoted as \(S_n \xrightarrow{\text{d}} S\)), if the distribution functions \(\{F_n\}\) converge to \(F\) that is the distribution function of \(S\). We can write it as
\[ S_n \xrightarrow{\text{d}} S \text{ if and only if } \lim_{n \rightarrow \infty}F_n(x)=F(x) \]
The CLT states that if \(Y_1, Y_2, \ldots, Y_n\) are i.i.d. random samples from a probability distribution with finite mean \(\mu_Y\) and finite variance \(\sigma^2_Y\), i.e., \(0 < \sigma^2_Y < \infty\) and \(\overline{Y} = (1/n)\sum_i^nY_i\). Then
\[ \sqrt{n}(\overline{Y}-\mu_Y) \xrightarrow{\text{d}} N(0, \sigma^2_Y) \]
It follows that since \(\sigma_{\overline{Y}} = \sqrt{\mathrm{Var}(\overline{Y})} = \sigma_Y/\sqrt{n}\),
\[ \frac{\overline{Y} - \mu_Y}{\sigma_{\overline{Y}}} \xrightarrow{\text{ d}} N(0, 1) \]
Figure 11: An illustration of the central limit theorem