# Mathematical Statistics with Resampling and R

## Books This book bridges the latest software applications with the benefits of modern resampling techniques

Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. This groundbreaking book shows how to apply modern resampling techniques to mathematical statistics. Extensively class-tested to ensure an accessible presentation, Mathematical Statistics with Resampling and R utilizes the powerful and flexible computer language R to underscore the significance and benefits of modern resampling techniques.

The book begins by introducing permutation tests and bootstrap methods, motivating classical inference methods. Striking a balance between theory, computing, and applications, the authors explore additional topics such as:

• Exploratory data analysis
• Calculation of sampling distributions
• The Central Limit Theorem
• Monte Carlo sampling
• Maximum likelihood estimation and properties of estimators
• Confidence intervals and hypothesis tests
• Regression
• Bayesian methods

Throughout the book, case studies on diverse subjects such as flight delays, birth weights of babies, and telephone company repair times illustrate the relevance of the real-world applications of the discussed material. Key definitions and theorems of important probability distributions are collected at the end of the book, and a related website is also available, featuring additional material including data sets, R scripts, and helpful teaching hints.

Mathematical Statistics with Resampling and R is an excellent book for courses on mathematical statistics at the upper-undergraduate and graduate levels. It also serves as a valuable reference for applied statisticians working in the areas of business, economics, biostatistics, and public health who utilize resampling methods in their everyday work.

Preface xiii

1 Data and Case Studies 1

1.1 Case Study: Flight Delays 1

1.2 Case Study: Birth Weights of Babies 2

1.3 Case Study: Verizon Repair Times 3

1.4 Sampling 3

1.5 Parameters and Statistics 5

1.6 Case Study: General Social Survey 5

1.7 Sample Surveys 6

1.8 Case Study: Beer and Hot Wings 8

1.9 Case Study: Black Spruce Seedlings 8

1.10 Studies 8

1.11 Exercises 10

2 Exploratory Data Analysis 13

2.1 Basic Plots 13

2.2 Numeric Summaries 16

2.2.1 Center 17

2.2.3 Shape 19

2.3 Boxplots 19

2.4 Quantiles and Normal Quantile Plots 20

2.5 Empirical Cumulative Distribution Functions 24

2.6 Scatter Plots 26

2.7 Skewness and Kurtosis 28

2.8 Exercises 30

3 Hypothesis Testing 35

3.1 Introduction to Hypothesis Testing 35

3.2 Hypotheses 36

3.3 Permutation Tests 38

3.3.1 Implementation Issues 42

3.3.2 One-Sided and Two-Sided Tests 47

3.3.3 Other Statistics 48

3.3.4 Assumptions 51

3.4 Contingency Tables 52

3.4.1 Permutation Test for Independence 54

3.4.2 Chi-Square Reference Distribution 57

3.5 Chi-Square Test of Independence 58

3.6 Test of Homogeneity 61

3.7 Goodness-of-Fit: All Parameters Known 63

3.8 Goodness-of-Fit: Some Parameters Estimated 66

3.9 Exercises 68

4 Sampling Distributions 77

4.1 Sampling Distributions 77

4.2 Calculating Sampling Distributions 82

4.3 The Central Limit Theorem 84

4.3.1 CLT for Binomial Data 87

4.3.2 Continuity Correction for Discrete Random Variables 89

4.3.3 Accuracy of the Central Limit Theorem 90

4.3.4 CLT for Sampling without Replacement 91

4.4 Exercises 92

5 The Bootstrap 99

5.1 Introduction to the Bootstrap 99

5.2 The Plug-In Principle 106

5.2.1 Estimating the Population Distribution 107

5.2.2 How Useful is the Bootstrap Distribution? 109

5.3 Bootstrap Percentile Intervals 113

5.4 Two Sample Bootstrap 114

5.4.1 The Two Independent Populations Assumption 119

5.5 Other Statistics 120

5.6 Bias 122

5.7 Monte Carlo Sampling: The “Second Bootstrap Principle” 125

5.8 Accuracy of Bootstrap Distributions 125

5.8.1 Sample Mean: Large Sample Size 126

5.8.2 Sample Mean: Small Sample Size 127

5.8.3 Sample Median 127

5.9 How Many Bootstrap Samples are Needed? 129

5.10 Exercises 129

6 Estimation 135

6.1 Maximum Likelihood Estimation 135

6.1.1 Maximum Likelihood for Discrete Distributions 136

6.1.2 Maximum Likelihood for Continuous Distributions 139

6.1.3 Maximum Likelihood for Multiple Parameters 143

6.2 Method of Moments 146

6.3 Properties of Estimators 148

6.3.1 Unbiasedness 148

6.3.2 Efficiency 151

6.3.3 Mean Square Error 155

6.3.4 Consistency 157

6.3.5 Transformation Invariance 160

6.4 Exercises 161

7 Classical Inference: Confidence Intervals 167

7.1 Confidence Intervals for Means 167

7.1.1 Confidence Intervals for a Mean σ Known 167

7.1.2 Confidence Intervals for a Mean σ Unknown 172

7.1.3 Confidence Intervals for a Difference in Means 178

7.2 Confidence Intervals in General 183

7.2.1 Location and Scale Parameters 186

7.3 One-Sided Confidence Intervals 189

7.4 Confidence Intervals for Proportions 191

7.4.1 The Agresti–Coull Interval for a Proportion 193

7.4.2 Confidence Interval for the Difference of Proportions 194

7.5 Bootstrap t Confidence Intervals 195

7.5.1 Comparing Bootstrap t and Formula t Confidence Intervals 200

7.6 Exercises 200

8 Classical Inference: Hypothesis Testing 211

8.1 Hypothesis Tests for Means and Proportions 211

8.1.1 One Population 211

8.1.2 Comparing Two Populations 215

8.2 Type I and Type II Errors 221

8.2.1 Type I Errors 221

8.2.2 Type II Errors and Power 226

8.3 More on Testing 231

8.3.1 On Significance 231

8.3.2 Adjustments for Multiple Testing 232

8.3.3 P-values Versus Critical Regions 233

8.4 Likelihood Ratio Tests 234

8.4.1 Simple Hypotheses and the Neyman–Pearson Lemma 234

8.4.2 Generalized Likelihood Ratio Tests 237

8.5 Exercises 239

9 Regression 247

9.1 Covariance 247

9.2 Correlation 251

9.3 Least-Squares Regression 254

9.3.1 Regression toward the Mean 258

9.3.2 Variation 259

9.3.3 Diagnostics 261

9.3.4 Multiple Regression 265

9.4 The Simple Linear Model 266

9.4.1 Inference for α and β 270

9.4.2 Inference for the Response 273

9.5 Resampling Correlation and Regression 279

9.5.1 Permutation Tests 282

9.5.2 Bootstrap Case Study: Bushmeat 283

9.6 Logistic Regression 286

9.6.1 Inference for Logistic Regression 291

9.7 Exercises 294

10 Bayesian Methods 301

10.1 Bayes’ Theorem 302

10.2 Binomial Data Discrete Prior Distributions 302

10.3 Binomial Data Continuous Prior Distributions 309

10.4 Continuous Data 316

10.5 Sequential Data 319

10.6 Exercises 322

11.1 Smoothed Bootstrap 327

11.1.1 Kernel Density Estimate 328

11.2 Parametric Bootstrap 331

11.3 The Delta Method 335

11.4 Stratified Sampling 339

11.5 Computational Issues in Bayesian Analysis 340

11.6 Monte Carlo Integration 341

11.7 Importance Sampling 346

11.7.1 Ratio Estimate for Importance Sampling 352

11.7.2 Importance Sampling in Bayesian Applications 355

11.8 Exercises 359

Appendix A Review of Probability 363

A.1 Basic Probability 363

A.2 Mean and Variance 364

A.3 The Mean of a Sample of Random Variables 366

A.4 The Law of Averages 367

A.5 The Normal Distribution 368

A.6 Sums of Normal Random Variables 369

A.7 Higher Moments and the Moment Generating Function 370

Appendix B Probability Distributions 373

B.1 The Bernoulli and Binomial Distributions 373

B.2 The Multinomial Distribution 374

B.3 The Geometric Distribution 376

B.4 The Negative Binomial Distribution 377

B.5 The Hypergeometric Distribution 378

B.6 The Poisson Distribution 379

B.7 The Uniform Distribution 381

B.8 The Exponential Distribution 381

B.9 The Gamma Distribution 382

B.10 The Chi-Square Distribution 385

B.11 The Student’s t Distribution 388

B.12 The Beta Distribution 390

B.13 The F Distribution 391

B.14 Exercises 393

Appendix C Distributions Quick Reference 395

Solutions to Odd-Numbered Exercises 399

Bibliography 407

Index 413

## Books & Journals

### Books #### Statistical Inference: A Short Course #### Log-Linear Modeling: Concepts, Interpretation, and Application View all

### Journals #### WIREs Computational Statistics #### Significance View all