Statistics: An Introduction Using R, 2nd Edition


thumbnail image: Statistics: An Introduction Using R, 2nd Edition

"...I know of no better book of its kind..." (Journal of the Royal Statistical Society, Vol 169 (1), January 2006)

A revised and updated edition of this bestselling introductory textbook to statistical analysis using the leading free software package R

This new edition of a bestselling title offers a concise introduction to a broad array of statistical methods, at a level that is elementary enough to appeal to a wide range of disciplines. Step-by-step instructions help the non-statistician to fully understand the methodology.  The book covers the full range of statistical techniques likely to be needed to analyse the data from research projects, including elementary material like t--tests and chi--squared tests, intermediate methods like regression and analysis of variance, and more advanced techniques like generalized linear modelling.

Includes numerous worked examples and exercises within each chapter.

Preface xi

Chapter 1 Fundamentals 1

Everything Varies 2

Significance 3

Good and Bad Hypotheses 3

Null Hypotheses 3

p Values 3

Interpretation 4

Model Choice 4

Statistical Modelling 5

Maximum Likelihood 6

Experimental Design 7

The Principle of Parsimony (Occam’s Razor) 8

Observation, Theory and Experiment 8

Controls 8

Replication: It’s the ns that Justify the Means 8

How Many Replicates? 9

Power 9

Randomization 10

Strong Inference 14

Weak Inference 14

How Long to Go On? 14

Pseudoreplication 15

Initial Conditions 16

Orthogonal Designs and Non-Orthogonal Observational Data 16

Aliasing 16

Multiple Comparisons 17

Summary of Statistical Models in R 18

Organizing Your Work 19

Housekeeping within R 20

References 22

Further Reading 22

Chapter 2 Dataframes 23

Selecting Parts of a Dataframe: Subscripts 26

Sorting 27

Summarizing the Content of Dataframes 29

Summarizing by Explanatory Variables 30

First Things First: Get to Know Your Data 31

Relationships 34

Looking for Interactions between Continuous Variables 36

Graphics to Help with Multiple Regression 39

Interactions Involving Categorical Variables 39

Further Reading 41

Chapter 3 Central Tendency 42

Further Reading 49

Chapter 4 Variance 50

Degrees of Freedom 53

Variance 53

Variance: A Worked Example 55

Variance and Sample Size 58

Using Variance 59

A Measure of Unreliability 60

Confidence Intervals 61

Bootstrap 62

Non-constant Variance: Heteroscedasticity 65

Further Reading 65

Chapter 5 Single Samples 66

Data Summary in the One-Sample Case 66

The Normal Distribution 70

Calculations Using z of the Normal Distribution 76

Plots for Testing Normality of Single Samples 79

Inference in the One-Sample Case 81

Bootstrap in Hypothesis Testing with Single Samples 81

Student’s t Distribution 82

Higher-Order Moments of a Distribution 83

Skew 84

Kurtosis 86

Reference 87

Further Reading 87

Chapter 6 Two Samples 88

Comparing Two Variances 88

Comparing Two Means 90

Student’s t Test 91

Wilcoxon Rank-Sum Test 95

Tests on Paired Samples 97

The Binomial Test 98

Binomial Tests to Compare Two Proportions 100

Chi-Squared Contingency Tables 100

Fisher’s Exact Test 105

Correlation and Covariance 108

Correlation and the Variance of Differences between Variables 110

Scale-Dependent Correlations 112

Reference 113

Further Reading 113

Chapter 7 Regression 114

Linear Regression 116

Linear Regression in R 117

Calculations Involved in Linear Regression 122

Partitioning Sums of Squares in Regression: SSY = SSR + SSE 125

Measuring the Degree of Fit, r2 133

Model Checking 134

Transformation 135

Polynomial Regression 140

Non-Linear Regression 142

Generalized Additive Models 146

Influence 148

Further Reading 149

Chapter 8 Analysis of Variance 150

One-Way ANOVA 150

Shortcut Formulas 157

Effect Sizes 159

Plots for Interpreting One-Way ANOVA 162

Factorial Experiments 168

Pseudoreplication: Nested Designs and Split Plots 173

Split-Plot Experiments 174

Random Effects and Nested Designs 176

Fixed or Random Effects? 177

Removing the Pseudoreplication 178

Analysis of Longitudinal Data 178

Derived Variable Analysis 179

Dealing with Pseudoreplication 179

Variance Components Analysis (VCA) 183

References 184

Further Reading 184

Chapter 9 Analysis of Covariance 185

Further Reading 192

Chapter 10 Multiple Regression 193

The Steps Involved in Model Simplification 195

Caveats 196

Order of Deletion 196

Carrying Out a Multiple Regression 197

A Trickier Example 203

Further Reading 211

Chapter 11 Contrasts 212

Contrast Coefficients 213

An Example of Contrasts in R 214

A Priori Contrasts 215

Treatment Contrasts 216

Model Simplification by Stepwise Deletion 218

Contrast Sums of Squares by Hand 222

The Three Kinds of Contrasts Compared 224

Reference 225

Further Reading 225

Chapter 12 Other Response Variables 226

Introduction to Generalized Linear Models 228

The Error Structure 229

The Linear Predictor 229

Fitted Values 230

A General Measure of Variability 230

The Link Function 231

Canonical Link Functions 232

Akaike’s Information Criterion (AIC) as a Measure of the Fit of a Model 233

Further Reading 233

Chapter 13 Count Data 234

A Regression with Poisson Errors 234

Analysis of Deviance with Count Data 237

The Danger of Contingency Tables 244

Analysis of Covariance with Count Data 247

Frequency Distributions 250

Further Reading 255

Chapter 14 Proportion Data 256

Analyses of Data on One and Two Proportions 257

Averages of Proportions 257

Count Data on Proportions 257

Odds 259

Overdispersion and Hypothesis Testing 260

Applications 261

Logistic Regression with Binomial Errors 261

Proportion Data with Categorical Explanatory Variables 264

Analysis of Covariance with Binomial Data 269

Further Reading 272

Chapter 15 Binary Response Variable 273

Incidence Functions 275

ANCOVA with a Binary Response Variable 279

Further Reading 284

Chapter 16 Death and Failure Data 285

Survival Analysis with Censoring 287

Further Reading 290

Appendix Essentials of the R Language 291

R as a Calculator 291

Built-in Functions 292

Numbers with Exponents 294

Modulo and Integer Quotients 294

Assignment 295

Rounding 295

Infinity and Things that Are Not a Number (NaN) 296

Missing Values (NA) 297

Operators 298

Creating a Vector 298

Named Elements within Vectors 299

Vector Functions 299

Summary Information from Vectors by Groups 300

Subscripts and Indices 301

Working with Vectors and Logical Subscripts 301

Addresses within Vectors 304

Trimming Vectors Using Negative Subscripts 304

Logical Arithmetic 305

Repeats 305

Generate Factor Levels 306

Generating Regular Sequences of Numbers 306

Matrices 307

Character Strings 309

Writing Functions in R 310

Arithmetic Mean of a Single Sample 310

Median of a Single Sample 310

Loops and Repeats 311

The ifelse Function 312

Evaluating Functions with apply 312

Testing for Equality 313

Testing and Coercing in R 314

Dates and Times in R 315

Calculations with Dates and Times 319

Understanding the Structure of an R Object Using str 320

Reference 322

Further Reading 322

Index 323

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.