# Statistical Implications of Turing's Formula

## Books Features a broad introduction to recent research on Turing’s formula and presents modern applications in statistics, probability, information theory, and other areas of modern data science

Turing's formula is, perhaps, the only known method for estimating the underlying distributional characteristics beyond the range of observed data without making any parametric or semiparametric assumptions. This book presents a clear introduction to Turing’s formula and its connections to statistics. Topics with relevance to a variety of different fields of study are included such as information theory; statistics; probability; computer science inclusive of artificial intelligence and machine learning; big data; biology; ecology; and genetics. The author provides examinations of many core statistical issues within modern data science from Turing's perspective. A systematic approach to long-standing problems such as entropy and mutual information estimation, diversity index estimation, domains of attraction on general alphabets, and tail probability estimation is presented in light of the most up-to-date understanding of Turing's formula. Featuring numerous exercises and examples throughout, the author provides a summary of the known properties of Turing's formula and explains how and when it works well; discusses the approach derived from Turing's formula in order to estimate a variety of quantities, all of which mainly come from information theory, but are also important for machine learning and for ecological applications; and uses Turing's formula to estimate certain heavy-tailed distributions.

In summary, this book:

• Features a unified and broad presentation of Turing’s formula, including its connections to statistics, probability, information theory, and other areas of modern data science

• Provides a presentation on the statistical estimation of information theoretic quantities

• Demonstrates the estimation problems of several statistical functions from Turing's perspective such as Simpson's indices, Shannon's entropy, general diversity indices, mutual information, and Kullback–Leibler divergence

• Includes numerous exercises and examples throughout with a fundamental perspective on the key results of Turing’s formula

Statistical Implications of Turing's Formula is an ideal reference for researchers and practitioners who need a review of the many critical statistical issues of modern data science. This book is also an appropriate learning resource for biologists, ecologists, and geneticists who are involved with the concept of diversity and its estimation and can be used as a textbook for graduate courses in mathematics, probability, statistics, computer science, artificial intelligence, machine learning, big data, and information theory.

Zhiyi Zhang, PhD, is Professor of Mathematics and Statistics at The University of North Carolina at Charlotte. He is an active consultant in both industry and government on a wide range of statistical issues, and his current research interests include Turing's formula and its statistical implications; probability and statistics on countable alphabets; nonparametric estimation of entropy and mutual information; tail probability and biodiversity indices; and applications involving extracting statistical information from low-frequency data space. He earned his PhD in Statistics from Rutgers University.

Preface xi

1 Turing’s Formula 1

1.1 Turing’s Formula 3

1.2 Univariate Normal Laws 10

1.3 Multivariate Normal Laws 22

1.4 Turing’s Formula Augmented 27

1.5 Goodness-of-Fit by Counting Zeros 33

1.6 Remarks 42

1.7 Exercises 45

2 Estimation of Simpson’s Indices 49

2.1 Generalized Simpson’s Indices 49

2.2 Estimation of Simpson’s Indices 52

2.3 Normal Laws 54

2.4 Illustrative Examples 61

2.5 Remarks 66

2.6 Exercises 68

3 Estimation of Shannon’s Entropy 71

3.1 A Brief Overview 72

3.2 The Plug-In Entropy Estimator 76

3.2.1 When K Is Finite 76

3.2.2 When K Is Countably Infinite 81

3.3 Entropy Estimator in Turing’s Perspective 86

3.3.1 When K Is Finite 88

3.3.2 When K Is Countably Infinite 94

3.4 Appendix 107

3.4.1 Proof of Lemma 3.2 107

3.4.2 Proof of Lemma 3.5 110

3.4.3 Proof of Corollary 3.5 111

3.4.4 Proof of Lemma 3.14 112

3.4.5 Proof of Lemma 3.18 116

3.5 Remarks 120

3.6 Exercises 121

4 Estimation of Diversity Indices 125

4.1 A Unified Perspective on Diversity Indices 126

4.2 Estimation of Linear Diversity Indices 131

4.3 Estimation of Rényi’s Entropy 138

4.4 Remarks 142

4.5 Exercises 145

5 Estimation of Information 149

5.1 Introduction 149

5.2 Estimation of Mutual Information 162

5.2.1 The Plug-In Estimator 163

5.2.2 Estimation in Turing’s Perspective 170

5.2.3 Estimation of StandardizedMutual Information 173

5.2.4 An Illustrative Example 176

5.3 Estimation of Kullback–Leibler Divergence 182

5.3.1 The Plug-In Estimator 184

5.3.2 Properties of the Augmented Plug-In Estimator 186

5.3.3 Estimation in Turing’s Perspective 189

5.3.4 Symmetrized Kullback–Leibler Divergence 193

5.4 Tests of Hypotheses 196

5.5 Appendix 199

5.5.1 Proof of Theorem 5.12 199

5.6 Exercises 204

6 Domains of Attraction on Countable Alphabets 209

6.1 Introduction 209

6.2 Domains of Attraction 212

6.3 Examples and Remarks 223

6.4 Appendix 228

6.4.1 Proof of Lemma 6.3 228

6.4.2 Proof of Theorem 6.2 229

6.4.3 Proof of Lemma 6.6 232

6.5 Exercises 236

7 Estimation of Tail Probability 241

7.1 Introduction 241

7.2 Estimation of Pareto Tail 244

7.3 Statistical Properties of AMLE 248

7.4 Remarks 253

7.5 Appendix 256

7.5.1 Proof of Lemma 7.7 256

7.5.2 Proof of Lemma 7.9 263

7.6 Exercises 267

References 269

Author Index 275

Subject Index 279

## Books & Journals

### Books #### Handbook of Volatility Models and Their Applications #### Common Errors in Statistics (and How to Avoid Them), 4th Edition View all

### Journals #### Statistical Analysis and Data Mining #### Biometrical Journal View all