# Robust Correlation: Theory and Applications

## Books

This bookpresents material on both the analysis of the classical concepts of correlation and on the development of their robust versions, as well as discussing the related concepts of correlation matrices, partial correlation, canonical correlation, rank correlations, with the corresponding robust and non-robust estimation procedures. Every chapter contains a set of examples with simulated and real-life data.

Key features:

• Makes modern and robust correlation methods readily available and understandable to practitioners, specialists, and consultants working in various fields.
• Focuses on implementation of methodology and application of robust correlation with R.
• Introduces the main approaches in robust statistics, such as Huber’s minimax approach and Hampel’s approach based on influence functions.
• Explores various robust estimates of the correlation coefficient including the minimax variance and bias estimates as well as the most B- and V-robust estimates.
• Contains applications of robust correlation methods to exploratory data analysis, multivariate statistics, statistics of time series, and to real-life data.
• Includes an accompanying website featuring computer code and datasets
• Features exercises and examples throughout the text using both small and large data sets.

Theoretical and applied statisticians, specialists in multivariate statistics, robust statistics, robust time series analysis, data analysis and signal processing will benefit from this book. Practitioners who use correlation based methods in their work as well as postgraduate students in statistics will also find this book useful.

Preface xv

Acknowledgements xvii

1 Introduction 1

1.1 Historical Remarks 1

1.2 Ontological Remarks 4

1.2.1 Forms of data representation 5

1.2.2 Types of data statistics 5

1.2.3 Principal aims of statistical data analysis 6

1.2.4 Prior information about data distributions and related approaches to statistical data analysis 6

References 8

2 Classical Measures of Correlation 10

2.1 Preliminaries 10

2.2 Pearson’s Correlation Coefficient: Definitions and Interpretations 12

2.2.1 Introductory remarks 13

2.2.2 Correlation via regression 13

2.2.3 Correlation via the coefficient of determination 16

2.2.4 Correlation via the variances of the principal components 18

2.2.5 Correlation via the cosine of the angle between the variable vectors 21

2.2.6 Correlation via the ratio of two means 22

2.2.7 Pearson’s correlation coefficient between random events 23

2.3 Nonparametric Measures of Correlation 24

2.3.1 Introductory remarks 24

2.3.2 The quadrant correlation coefficient 26

2.3.3 The Spearman rank correlation coefficient 27

2.3.4 The Kendall 𝜏-rank correlation coefficient 28

2.3.5 Concluding remark 29

2.4 Informational Measures of Correlation 29

2.5 Summary 31

References 31

3 Robust Estimation of Location 33

3.1 Preliminaries 33

3.2 Huber’s Minimax Approach 35

3.2.1 Introductory remarks 35

3.2.2 Minimax variance M-estimates of location 36

3.2.3 Minimax bias M-estimates of location 43

3.2.4 L-estimates of location 44

3.2.5 R-estimates of location 45

3.2.6 The relations between M-, L- and R-estimates of location 46

3.2.7 Concluding remarks 47

3.3 Hampel’s Approach Based on Influence Functions 47

3.3.1 Introductory remarks 47

3.3.2 Sensitivity curve 47

3.3.3 Influence function and its properties 49

3.3.4 Local measures of robustness 51

3.3.5 B- and V-robustness 52

3.3.6 Global measure of robustness: the breakdown point 52

3.3.7 Redescending M-estimates 53

3.3.8 Concluding remark 56

3.4 Robust Estimation of Location: A Sequel 56

3.4.1 Introductory remarks 56

3.4.2 Huber’s minimax variance approach in distribution density models of a non-neighborhood nature 57

3.4.3 Robust estimation of location in distribution models with a bounded variance 62

3.4.4 On the robustness of robust solutions: stability of least informative distributions 69

3.4.5 Concluding remark 73

3.5 Stable Estimation 73

3.5.1 Introductory remarks 73

3.5.2 Variance sensitivity 74

3.5.3 Estimation stability 76

3.5.4 Robustness of stable estimates 78

3.5.5 Maximin stable redescending M-estimates 83

3.5.6 Concluding remarks 84

3.6 Robustness Versus Gaussianity 85

3.6.1 Introductory remarks 85

3.6.2 Derivations of the Gaussian distribution 87

3.6.3 Properties of the Gaussian distribution 92

3.6.4 Huber’s minimax approach and Gaussianity 100

3.6.5 Concluding remarks 101

3.7 Summary 102

References 102

4 Robust Estimation of Scale 107

4.1 Preliminaries 107

4.1.1 Introductory remarks 107

4.1.2 Estimation of scale in data analysis 108

4.1.3 Measures of scale defined by functionals 110

4.2 M- and L-Estimates of Scale 111

4.2.1 M-estimates of scale 111

4.2.2 L-estimates of scale 115

4.3 Huber Minimax Variance Estimates of Scale 116

4.3.1 Introductory remarks 116

4.3.2 The least informative distribution 117

4.3.3 Minimax variance M- and L-estimates of scale 118

4.4 Highly Efficient Robust Estimates of Scale 119

4.4.1 Introductory remarks 119

4.4.2 The median of absolute deviations and its properties 120

4.4.3 The quartile of pair-wise absolute differences Qn estimate and its properties 121

4.4.4 M-estimate approximations to the Qn estimate: MQ𝛼n, FQ𝛼n , and FQn estimates of scale 122

4.5 Monte Carlo Experiment 130

4.5.1 A remark on the Monte Carlo experiment accuracy 131

4.5.2 Monte Carlo experiment: distribution models 131

4.5.3 Monte Carlo experiment: estimates of scale 132

4.5.4 Monte Carlo experiment: characteristics of performance 133

4.5.5 Monte Carlo experiment: results 134

4.5.6 Monte Carlo experiment: discussion 136

4.5.7 Concluding remarks 138

4.6 Summary 138

References 139

5 Robust Estimation of Correlation Coefficients 140

5.1 Preliminaries 140

5.2 Main Groups of Robust Estimates of the Correlation Coefficient 141

5.2.1 Introductory remarks 141

5.2.2 Direct robust counterparts of Pearson’s correlation coefficient 142

5.2.3 Robust correlation via nonparametric measures of correlation 143

5.2.4 Robust correlation via robust regression 143

5.2.5 Robust correlation via robust principal component variances 145

5.2.6 Robust correlation via two-stage procedures 147

5.2.7 Concluding remarks 147

5.3 Asymptotic Properties of the Classical Estimates of the Correlation Coefficient 148

5.3.1 Pearson’s sample correlation coefficient 148

5.3.2 The maximum likelihood estimate of the correlation coefficient at the normal 149

5.4 Asymptotic Properties of Nonparametric Estimates of Correlation 151

5.4.1 Introductory remarks 151

5.4.2 The quadrant correlation coefficient 152

5.4.3 The Kendall rank correlation coefficient 152

5.4.4 The Spearman rank correlation coefficient 153

5.5 Bivariate Independent Component Distributions 155

5.5.1 Definition and properties 155

5.5.2 Independent component and Tukey gross-error distribution models 156

5.6 Robust Estimates of the Correlation Coefficient Based on Principal Component Variances 158

5.7 Robust Minimax Bias and Variance Estimates of the Correlation Coefficient 161

5.7.1 Introductory remarks 161

5.7.2 Minimax property 162

5.7.3 Concluding remarks 163

5.8 Robust Correlation via Highly Efficient Robust Estimates of Scale 163

5.8.1 Introductory remarks 163

5.8.2 Asymptotic bias and variance of generalized robust estimates of the correlation coefficient 164

5.8.3 Concluding remarks 165

5.9 Robust M-Estimates of the Correlation Coefficient in Independent Component Distribution Models 165

5.9.1 Introductory remarks 165

5.9.2 The maximum likelihood estimate of the correlation coefficient in independent component distribution models 165

5.9.3 M-estimates of the correlation coefficient 166

5.9.4 Asymptotic variance of M-estimators 166

5.9.5 Minimax variance M-estimates of the correlation coefficient 167

5.9.6 Concluding remarks 168

5.10 Monte Carlo Performance Evaluation 168

5.10.1 Introductory remarks 168

5.10.2 Monte Carlo experiment set-up 168

5.10.3 Discussion 171

5.10.4 Concluding remarks 173

5.11 Robust Stable Radical M-Estimate of the Correlation Coefficient of the Bivariate Normal Distribution 173

5.11.1 Introductory remarks 173

5.11.2 Asymptotic characteristics of the stable radical estimate of the correlation coefficient 174

5.11.3 Concluding remarks 175

5.12 Summary 176

References 176

6 Classical Measures of Multivariate Correlation 178

6.1 Preliminaries 178

6.2 Covariance Matrix and Correlation Matrix 179

6.3 Sample Mean Vector and Sample Covariance Matrix 181

6.4 Families of Multivariate Distributions 182

6.4.1 Construction of multivariate location-scatter models 182

6.4.2 Multivariate symmetrical distributions 183

6.4.3 Multivariate normal distribution 184

6.4.4 Multivariate elliptical distributions 184

6.4.5 Independent component model 186

6.4.6 Copula models 186

6.5 Asymptotic Behavior of Sample Covariance Matrix and Sample Correlation Matrix 187

6.6 First Uses of Covariance and Correlation Matrices 189

6.7 Working with the Covariance Matrix–Principal Component Analysis 191

6.7.1 Principal variables 191

6.7.2 Interpretation of principal components 193

6.7.3 Asymptotic behavior of the eigenvectors and eigenvalues 194

6.8 Working with Correlations–Canonical Correlation Analysis 195

6.8.1 Canonical variates and canonical correlations 195

6.8.2 Testing for independence between subvectors 197

6.9 Conditionally Uncorrelated Components 199

6.10 Summary 200

References 200

7 Robust Estimation of Scatter and Correlation Matrices 202

7.1 Preliminaries 202

7.2 Multivariate Location and Scatter Functionals 202

7.3 Influence Functions and Asymptotics 205

7.4 M-functionals for Location and Scatter 208

7.5 Breakdown Point 210

7.6 Use of Robust Scatter Matrices 211

7.6.1 Ellipticity assumption 211

7.6.2 Robust correlation matrices 212

7.6.3 Principal component analysis 212

7.6.4 Canonical correlation analysis 213

7.7 Further Uses of Location and Scatter Functionals 213

7.8 Summary 215

References 215

8 Nonparametric Measures of Multivariate Correlation 217

8.1 Preliminaries 217

8.2 Univariate Signs and Ranks 218

8.3 Marginal Signs and Ranks 220

8.4 Spatial Signs and Ranks 222

8.5 Affine Equivariant Signs and Ranks 226

8.6 Summary 229

References 230

9 Applications to Exploratory Data Analysis: Detection of Outliers 231

9.1 Preliminaries 231

9.2 State of the Art 232

9.2.1 Univariate boxplots 232

9.2.2 Bivariate boxplots 234

9.3 Problem Setting 237

9.4 A New Measure of Outlier Detection Performance 239

9.4.1 Introductory remarks 240

9.4.2 H-mean: motivation, definition and properties 241

9.5 Robust Versions of the Tukey Boxplot with Their Application to Detection of Outliers 243

9.5.1 Data generation and performance measure 243

9.5.2 Scale and shift contamination 243

9.5.3 Real-life data results 244

9.5.4 Concluding remarks 245

9.6 Robust Bivariate Boxplots and Their Performance Evaluation 245

9.6.1 Bivariate FQ-boxplot 245

9.6.2 Bivariate FQ-boxplot performance 247

9.6.3 Measuring the elliptical deviation from the convex hull 249

9.7 Summary 253

References 253

10 Applications to Time Series Analysis: Robust Spectrum Estimation 255

10.1 Preliminaries 255

10.2 Classical Estimation of a Power Spectrum 256

10.2.1 Introductory remarks 256

10.2.2 Classical nonparametric estimation of a power spectrum 258

10.2.3 Parametric estimation of a power spectrum 259

10.3 Robust Estimation of a Power Spectrum 259

10.3.1 Introductory remarks 259

10.3.2 Robust analogs of the discrete Fourier transform 261

10.3.3 Robust nonparametric estimation 262

10.3.4 Robust estimation of power spectrum through the Yule–Walker equations 263

10.3.5 Robust estimation through robust filtering 263

10.4 Performance Evaluation 264

10.4.1 Robustness of the median Fourier transform power spectra 264

10.4.2 Additive outlier contamination model 264

10.4.3 Disorder contamination model 264

10.4.4 Concluding remarks 270

10.5 Summary 270

References 270

11 Applications to Signal Processing: Robust Detection 272

11.1 Preliminaries 272

11.1.1 Classical approach to detection 272

11.1.2 Robust minimax approach to hypothesis testing 273

11.1.3 Asymptotically optimal robust detection of a weak signal 274

11.2 Robust Minimax Detection Based on a Distance Rule 275

11.2.1 Introductory remarks 275

11.2.2 Asymptotic robust minimax detection of a known constant signal with the 𝜌-distance rule 276

11.2.3 Detection performance in asymptotics and on finite samples 278

11.2.4 Concluding remarks 283

11.3 Robust Detection of a Weak Signal with Redescending M-Estimates 285

11.3.1 Introductory remarks 285

11.3.2 Detection error sensitivity and stability 287

11.3.3 Performance evaluation: a comparative study 289

11.3.4 Concluding remarks 291

11.4 A Unified Neyman–Pearson Detection of Weak Signals in a Fusion Model with Fading Channels and Non-Gaussian Noises 296

11.4.1 Introductory remarks 296

11.4.2 Problem setting—an asymptotic fusion rule 298

11.4.3 Asymptotic performance analysis 299

11.4.4 Numerical results 303

11.4.5 Concluding remarks 305

11.5 Summary 306

References 306

12 Final Remarks 308

12.1 Points of Growth: Open Problems in Multivariate Statistics 308

12.2 Points of Growth: Open Problems in Applications 309

Index 311

View all

View all