Essentials of Statistics David Brink CInodnhtoeldnts 1 Preface 11 2 Basic concepts of probability theory 12 2.1 Probability space, probability function, sample space, event 12 2.2 Conditional probability . 12 2.3 Independent events . 14 2.4 The Inclusion-Exclusion Formula . 14 2.5 Binomial coefficients 16 2.6 Multinomial coefficients 17 3 Random variables 18 3.1 Random variables, definition 18 3.2 The distribution function 18 3.3 Discrete random variables, point probabilities . 19 3.4 Continuous random variables, density function 20 3.5 Continuous random variables, distribution function 20 3.6 Independent random variables . 20 3.7 Random vector, simultaneous density, and distribution function 21 4 Expected value and variance 21 4.1 Expected value of random variables 21 4.2 Variance and standard deviation of random variables . 22 4.3 Example (computation of expected value, variance, and standard deviation) . 23 4.4 Estimation of expected value à and standard deviation by eye . 23 4.5 Addition and multiplication formulae for expected value and variance 24 4.6 Covariance and correlation coefficient . 24 5 The Law of Large Numbers 26 5.1 Chebyshev’s Inequality . 26 5.2 The Law of Large Numbers . 26 5.3 The Central Limit Theorem . 26 5.4 Example (distribution functions converge to ) 27 6 Descriptive statistics 27 6.1 Median and quartiles 27 6.2 Mean value . 28 6.3 Empirical variance and empirical standard deviation 28 6.4 Empirical covariance and empirical correlation coefficient 29 7 Statistical hypothesis testing 29 7.1 Null hypothesis and alternative hypothesis . 29 7.2 Significance probability and significance level . 29 7.3 Errors of type I and II 30 7.4 Example . 30 8 The binomial distribution Bin(n, p) 30 8.1 Parameters 30 8.2 Description . 31 8.3 Point probabilities 31 8.4 Expected value and variance 32 8.5 Significance probabilities for tests in the binomial distribution 32 8.6 The normal approximation to the binomial distribution 32 8.7 Estimators 33 8.8 Confidence intervals 34 9 The Poisson distribution Pois( ) 35 9.1 Parameters 35 9.2 Description . 35 9.3 Point probabilities 35 9.4 Expected value and variance 35 9.5 Addition formula 36 9.6 Significance probabilities for tests in the Poisson distribution . 36 9.7 Example (significant increase in sale of Skodas) 36 9.8 The binomial approximation to the Poisson distribution 37 9.9 The normal approximation to the Poisson distribution . 37 9.10 Example (significant decrease in number of complaints) 38 9.11 Estimators 38 9.12 Confidence intervals 39 10 The geometrical distribution Geo(p) 39 10.1 Parameters 39 10.2 Description . 39 10.3 Point probabilities and tail probabilities 39 10.4 Expected value and variance 41 11 The hypergeometrical distribution HG(n, r,N) 41 11.1 Parameters 41 11.2 Description . 41 11.3 Point probabilities and tail probabilities 41 11.4 Expected value and variance 42 11.5 The binomial approximation to the hypergeometrical distribution . 42 11.6 The normal approximation to the hypergeometrical distribution 42 12 The multinomial distribution Mult(n, p1, . , pr) 43 12.1 Parameters 43 12.2 Description . 43 12.3 Point probabilities 44 12.4 Estimators 44 13 The negative binomial distribution NB(n, p) 44 13.1 Parameters 44 13.2 Description . 45 13.3 Point probabilities 45 13.4 Expected value and variance 45 13.5 Estimators 45 14 The exponential distribution Exp( ) 45 14.1 Parameters 45 14.2 Description . 45 14.3 Density and distribution function 46 14.4 Expected value and variance 46 15 The normal distribution 46 15.1 Parameters 46 15.2 Description . 47 15.3 Density and distribution function 47 15.4 The standard normal distribution 47 15.5 Properties of . 48 15.6 Estimation of the expected value à . 48 15.7 Estimation of the variance 2 48 15.8 Confidence intervals for the expected value à . 49 15.9 Confidence intervals for the variance 2 and the standard deviation . 50 15.10Addition formula 50 16 Distributions connected with the normal distribution 50 16.1 The 2 distribution . 50 16.2 Student’s t distribution . 51 16.3 Fisher’s F distribution . 52 17 Tests in the normal distribution 53 17.1 One sample, known variance, H0 : à = à0 . 53 17.2 One sample, unknown variance, H0 : à = à0 (Student’s t test) 53 17.3 One sample, unknown expected value, H0 : 2 = 20 . 54 17.4 Example . 56 17.5 Two samples, known variances, H0 : à1 = à2 . 56 17.6 Two samples, unknown variances, H0 : à1 = à2 (Fisher-Behrens) 57 17.7 Two samples, unknown expected values, H0 : 21 = 22 57 17.8 Two samples, unknown common variance, H0 : à1 = à2 . 58 17.9 Example (comparison of two expected values) . 59 18 Analysis of variance (ANOVA) 60 18.1 Aim and motivation . 60 18.2 k samples, unknown common variance, H0 : à1 = · · · = àk . 60 18.3 Two examples (comparison of mean values from three samples) . 61 19 The chi-squared test (or 2 test) 63 19.1 2 test for equality of distribution . 63 19.2 The assumption of normal distribution . 65 19.3 Standardized residuals . 65 19.4 Example (women with five children) 65 19.5 Example (election) . 67 19.6 Example (deaths in the Prussian cavalry) 68 20 Contingency tables 70 20.1 Definition, method . 70 20.2 Standardized residuals . 71 20.3 Example (students’ political orientation) 71 20.4 2 test for 2 × 2 tables . 73 20.5 Fisher’s exact test for 2 × 2 tables . 73 20.6 Example (Fisher’s exact test) 74 21 Distribution-free tests 74 21.1 Wilcoxon’s test for one set of observations . 75 21.2 Example . 76 21.3 The normal approximation to Wilcoxon’s test for one set of observations . 77 21.4 Wilcoxon’s test for two sets of observations 77 21.5 The normal approximation to Wilcoxon’s test for two sets of observations 78 22 Linear regression 79 22.1 The model 79 22.2 Estimation of the parameters 0 and 1 79 22.3 The distribution of the estimators 80 22.4 Predicted values ˆyi and residuals ˆei . 80 22.5 Estimation of the variance 2 80 22.6 Confidence intervals for the parameters 0 and 1 . 80 22.7 The determination coefficient R2 81 22.8 Predictions and prediction intervals . 81 22.9 Overview of formulae 82 22.10Example . 82 A Overview of discrete distributions 86 B Tables 87 B.1 How to read the tables . 87 B.2 The standard normal distribution 89 B.3 The 2 distribution (values x with F 2(x) = 0.500 etc.) 92 B.4 Student’s t distribution (values x with FStudent(x) = 0.600 etc.) . 94 B.5 Fisher’s F distribution (values x with FFisher(x) = 0.90) . 95 B.6 Fisher’s F distribution (values x with FFisher(x) = 0.95) . 96 B.7 Fisher’s F distribution (values x with FFisher(x) = 0.99) . 97 B.8 Wilcoxon’s test for one set of observations . 98 B.9 Wilcoxon’s test for two sets of observations, = 5% . 99 C Explanation of symbols 100 D Index 102 Many students find that the obligatory Statistics course comes as a shock. The set textbook is difficult, the curriculum is vast, and secondary-school maths feels infinitely far away. “Statistics” offers friendly instruction on the core areas of these subjects. The focus is overview. And the numerous examples give the reader a “recipe” .