| Concept | Definition / Formula | | :-------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Density hist | $\text{Rel freq} / \text{length}$ | | Independent | $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$ | | Covariance | $cov(x, y) = s_{xy} = \frac{\sum_{k=1}^n (x_k - \bar{x}) \cdot (y_k - \bar{y})}{n - 1}$ | | Correlation | $cor(x, y) = \rho_{xy} = \frac{cov(x, y)}{\sqrt{var(x) \cdot var(y)}} = \frac{cov(x, y)}{sd(x) \cdot sd(y)} = \frac{s_{xy}}{s_x \cdot s_y}$ | | Uncorrelated | $cov(x, y) = 0$ | | Properties of covariance | $cov(X + b, Z) = cov(X, Z)$<br>$cov(a \cdot X + Y, Z) = a \cdot cov(X, Z) + cov(Y, Z)$$var(\alpha \cdot x \pm \beta \cdot y) = \alpha^2 \cdot var(x) \pm 2 \cdot \alpha \cdot \beta \cdot cov(x, y) + \beta^2 \cdot var(y)$ | | Properties of correlation | $cor(a \cdot X + b, Y) = \text{sign}(a) \cdot cor(X, Y)$<br>$cor(x, y) = \rho_{xy} = \frac{cov(x, y)}{\sqrt{var(x) \cdot var(y)}} = \frac{cov(x, y)}{sd(x) \cdot sd(y)} = \frac{s_{xy}}{s_x \cdot s_y}$ | | Conv in Probability | $\mathbb{P}({\mid}X_n - X{\mid }< \varepsilon) \to 1$ | | Properties of expected value | $E(X^2) = Var(X) + E(X)^2$ | | Conv Almost surely | $\mathbb{P}\left(\omega \in \Omega : \displaystyle \lim_{n \to +\infty} X_n(\omega) = X(\omega)\right) = 1$ | | X not given | Use $X_n - X_m$ in the limit | | X given | Make prob table and tend it to inf for $X_n, {\mid}X_n-X{\mid}, (X_n-X)^2lt;br>Calculate $E((X_n-X)^2) \to 0$ for qm conv | | Conv in MS | $MSE(X_n, X) = \mathbb{E}((X_n - X)^2) \to 0$ | | | $\frac{g(X_1) + g(X_2) + \dots + g(X_n)}{n} \xrightarrow{\mathbb{P}, a.s.} \mathbb{E}(g(X_1))$ | | Sum and avg distr | $S_n \approx \mathcal{N}(n\mu, n\sigma^2) \quad \text{and} \quad \bar{X}_n \approx \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)$ | | Standardized CLT | $\frac{S_n - n \cdot \mu}{\sqrt{n} \cdot \sigma} \approx \mathcal{N}(0, 1) \quad \text{and} \quad \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \approx \mathcal{N}(0, 1)$ | | Sampling distribution | Distribution of some statistic | | | $MSE(\hat{\theta}, \theta) = (\text{Bias}(\hat{\theta}, \theta))^2 + \text{Var}_\theta(\hat{\theta})$ | | Consistent | $\hat{\theta}_n \xrightarrow{\mathbb{P}} \theta \text{ for any } \theta \in \Theta;$ | | Strongly consistent | $\hat{\theta}_n \xrightarrow{a.s.} \theta \text{ for any } \theta \in \Theta;$ | | Weakly or Mean Square consistent | $\hat{\theta}_n \xrightarrow{q.m.} \theta$ | | Fisher information | $I(\theta) = -\mathbb{E}\left(\frac{\partial^2}{\partial\theta^2} \ln f(X\mid\theta)\right) = \mathbb{E}\left[\left(\frac{\partial}{\partial\theta} \ln f(X\mid\theta)\right)^2\right]$ | | [[Fisher Information for a Bernoulli distribution]] | ![[Fisher Information for a Bernoulli distribution]] | | Lower bound Cramer-Rao | $Var(\hat{\theta}) \geq \frac{1}{n \cdot I(\theta)}$ | | | $Var(\bar{X}) = \frac{Var(X_1)}{n}$ | | | $E(X^k) = \int x^k p(x)$<br>Normal model<br>Mom Var = $E[X^2] - \text{mean}^2 = \text{sample variance}_n$ | | Prove efficient | Unbiased $\rightarrow$ Cramer-Rao exact $\rightarrow$ efficient | | [[Proof of efficiency of Bernoulli]] | ![[Proof of efficiency of Bernoulli]] | | Prove [[Minimum Variance Unbiased Estimator]] | Unbiased > cramer-rao exact > efficient > MVUE | | Bernoulli information | $1/p(1-p)$ | | Normal information with var known | $1/var$ | | Poisson information | $-\lambda + X \ln \lambda - \ln X!$ | | MoM Estimator | Equate moments, solve for unknown, take first working sol. | | Find MLE | Get the log-likelihood, differentiate to get argmax | | Asymptotically unbiased and $var \to 0$ | Consistent | | CI mean | $\overline{X} \pm \frac{1}{2\sqrt{na}}$ | | CI mean, variance given | $\overline{X} \pm \frac{\sigma}{\sqrt{na}}$ | | CI piv mean, variance given | $\overline{X} \pm \frac{z_{1-a/2}\sigma}{\sqrt{n}}$ | | CI piv mean, X is normal | $\bar{X} \pm \frac{t_{1-\alpha/2}^{n-1} S}{\sqrt{n}}$ | | CI piv var, given mean | $\frac{\sum_{k=1}^n (X_k - \mu)^2}{\chi_{n, 1-\alpha/2}^2} , \frac{\sum_{k=1}^n (X_k - \mu)^2}{\chi_{n, \alpha/2}^2}$ | | CI piv var | $\frac{\sum_{k=1}^n (X_k - \bar{X})^2}{\chi_{n-1, 1-\alpha/2}^2} , \frac{\sum_{k=1}^n (X_k - \bar{X})^2}{\chi_{n-1, \alpha/2}^2}$ | | Asymptotic CI mean | $\bar{X} \pm \frac{t_{1-\alpha/2}^{n-1} S}{\sqrt{n}}$ | | ACI using MLE for $\theta$ | $\hat{\theta}_n^{MLE} \pm z_{1-\alpha/2} \cdot \sqrt{\frac{1}{n \cdot \mathcal{I}(\hat{\theta}_n^{MLE})}}$ | | z-Test | Mean unknown, variance known: $Z = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}$ | | t-Test | Mean unknown: $t = \frac{\bar{X} - \mu_0}{S/\sqrt{n}}$ | | $\chi^2$-Test | Variance unknown, mean known $\chi^2(n)$ <br> Variance unknown $\chi^2(n-1)$: $\chi^2 = \sum_{k=1}^n \left( \frac{X_k - \mu}{\sigma_0} \right)^2$ | | Asymptotic Test <br> T-test if mean general | $W = \frac{\hat{\theta}^{MLE} - \theta_0}{1/\sqrt{n \cdot \mathcal{I}(\theta_0)}}$ , check upon $z$ | | Asymptotic bernoulli | $W = \frac{\bar{X} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$ | | Asymptotic poisson | $W = \frac{\bar{X} - \lambda_0}{\sqrt{\frac{\lambda_0}{n}}}$ | | 2 sample t-test for mean | **Equal sigma $t(n + m - 2)$**: <br> $t = \frac{(\bar{X} - \bar{Y}) - \mu_0}{S_p \cdot \sqrt{\frac{1}{n} + \frac{1}{m}}}$ <br> $S_p^2 = \frac{\sum_{k=1}^n (X_k - \bar{X})^2 + \sum_{k=1}^m (Y_k - \bar{Y})^2}{n + m - 2}$ | | (Welch test) | **Unequal sigma $t(\nu)$**: <br> $t = \frac{(\bar{X} - \bar{Y}) - \mu_0}{\sqrt{\frac{S_X^2}{n} + \frac{S_Y^2}{m}}}$ <br> $\nu = \left\lfloor \frac{\left( \frac{S_X^2}{n} + \frac{S_Y^2}{m} \right)^2}{\frac{(S_X^2/n)^2}{n-1} + \frac{(S_Y^2/m)^2}{m-1}} \right\rfloor$ | | paired t-test (diff of mean) | $t = \frac{\bar{D} - \mu_0}{S_D / \sqrt{n}}$ | | Two sample proportions | $Z = \frac{(\bar{X} - \bar{Y}) - p_0}{\sqrt{\frac{\bar{X}(1 - \bar{X})}{n} + \frac{\bar{Y}(1 - \bar{Y})}{m}}}$ | | GoF: Pearson Chi2 | $\chi^2 = \sum_{k=1}^{m} \frac{(n_k - n \cdot p_k^0)^2}{n \cdot p_k^0} = \sum_{k=1}^{m} \frac{(O_k - E_k)^2}{E_k} \sim \chi^2_{m-1}$ <br><br> | | GoF Pears. Chi2 | $A_1$ | $A_2$ | ... | $A_m$ | | :------------------------ | :-------------- | :-------------- | :-- | :-------------- | | **Observed Freq., $O_k$** | $n_1$ | $n_2$ | ... | $n_m$ | | **Expected Freq., $E_k$** | $n \cdot p_1^0$ | $n \cdot p_2^0$ | ... | $n \cdot p_m^0$ | | **Chi2 test of independence**<br>$(k - 1) \cdot (m - 1)$ | $\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$ | | :------------------------------------------------------- | :--------------------------------------------------------- | | $B \setminus A$ | $a_1$ | $a_2$ | ... | $a_m$ | Row sum | | :--- | :--- | :--- | :--- | :--- | :--- | | $b_1$ | $O_{11} \vert E_{11}$ | $O_{12} \vert E_{12}$ | ... | $O_{1m} \vert E_{1m}$ | $r_1$ | | $b_2$ | $O_{21} \vert E_{21}$ | $O_{22} \vert E_{22}$ | ... | $O_{2m} \vert E_{2m}$ | $r_2$ | | $\vdots$ | $\vdots$ | $\vdots$ | $\vdots$ | $\vdots$ | $\vdots$ | | $b_k$ | $O_{k1} \vert E_{k1}$ | $O_{k2} \vert E_{k2}$ | ... | $O_{km} \vert E_{km}$ | $r_k$ | | **Col Sum** | $c_1$ | $c_2$ | ... | $c_m$ | **Total** $= n$ |