Last year, Wiley was proud to publish Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction:
• What type of mortgage risk Chase Bank predicted before the recession.
• Predicting which people will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves.
• Why early retirement decreases life expectancy and vegetarians miss fewer flights.
• Five reasons why organizations predict death, including one health insurance company.
• How US Bank, European wireless carrier Telenor, and Obama’s 2012 campaign calculated the way to most strongly influence each individual.
• How IBM’s Watson computer used predictive modeling to answer questions and beat the human champs on TV’s Jeopardy!
• How companies ascertain untold, private truths — how Target figures out you’re pregnant and Hewlett-Packard deduces you’re about to quit your job.
• How judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free.
• What’s predicted by the BBC, Citibank, ConEd, Facebook, Ford, Google, IBM, the IRS, Match.com, MTV, Netflix, Pandora, PayPal, Pfizer, and Wikipedia.
A truly omnipresent science, predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining who to call, mail, investigate, incarcerate, set up on a date, or medicate.
Predictive analytics transcends human perception. This book’s final chapter answers the riddle: What often happens to you that cannot be witnessed, and that you can’t even be sure has happened afterward — but that can be predicted in advance?
Statistics Views interviews Professor Siegel about the success of the book and his own career in analytics.
1. Congratulations on the publication of your book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, which reveals the power and perils of prediction. Predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining who to call, mail, investigate, incarcerate, set up on a date, or medicate. How did the writing process begin? Who should read the book and why?
My objective was to write an introduction for everyone, yet one that is fully substantive and conceptually complete across the gamut of fascinating topics related to this technology. My goal was to make the book accessible and of interest to any and all readers, regardless of background. This includes non-business readers simply interested in “popular science” or interested in industry trends, i.e., how our world is changing because of what organizations are doing with data about us as consumers, patients, voters, and law enforcement suspects.
On the other hand, I also wanted the book to serve business users and even would-be technical practitioners of the art. So, although not a “how-to,” the book provides the prerequisite background conceptual knowledge for those who may go on to learn the hands-on practice. It also is of interest to those who are already established experts in predictive analytics: new case studies, three chapters on advanced, news breaking topics, a comprehensive industry overview, and new coverage of the ethical and privacy-related issues that arise when harnessing predictive analytics’ power.
2. The guide is extremely varied, covering everything from mortgage risk, how companies can ascertain untold truths, such as if there will be a new addition to the family, to how judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free. Was this one of your original objectives in writing this guide to make its appeal so broad?
Yes, absolutely. It’s just awe inspiring how broadly applicable predictive analytics is. There are so many kinds of human behaviour that can be predicted – such as those within my book’s title, Click, Buy, Lie, and Die – and so many kinds of mass-scale organizational operations that can be rendered more effective by driving them with these predictions. In this way, predictive analytics is providing a great deal of value across most commercial sectors, as well as within healthcare, government, law-enforcement, and political campaigning.
3. What is it about the area of predictive analytics that fascinates you?
Having left academics umpteen years ago to pursue the “real world” commercial deployment of predictive analytics, I have developed a split personality. The clear-cut value and directly established ROI of driving operational decisions with predictive scores is exciting. It is just plain fun to see the theories come to practice and pay off. On the other hand, I’m still deep down a research scientist who thinks that the idea of a computer automatically learning from data (which is an encoding of experience – i.e., an abundance of both positive and negative learning examples) how to predict is pretty much the coolest thing in science there is.
4. Why is this book of particular interest now?
Mostly because of the first of those two reasons the topic is fascinating: the value. With data growing at a wild pace that is only accelerating, and organizations ramping up so quickly on how to leverage predictive analytics, the potential is great and is indeed being realized left and right. There is also a growing awareness of the ethical and privacy concerns around how to safely harness this power, and I devoted Chapter 2 to that topic.
5. Please could you tell us more about your educational background and what was it that brought you to recognise statistics as a discipline in the first place?
I was a full-time faculty member at Columbia University’s computer science department for three years, and before that I spent six years in the same department getting my Ph.D. (during which I also did a fair amount of teaching). During the first couple years of graduate school there I realized that my main interest is machine learning, which is essentially the research-and-development term for predictive modeling. Its capabilities and design rely heavily on statistics and probability, as well as other aspects of computer science.
6. You are founder of Predictive Analytics World and Executive Editor of the Predictive Analytics Times, which makes the how and why of predictive analytics understandable and captivating. Can you tell us about some of your favourite pieces that have been written for the Times?
Sure. Here are a few recent ones that are exemplary regarding what is critical in the field, and what is trending.
http://www.predictiveanalyticsworld.com/patimes/why-overfitting-is-more-dangerous-than-just-poor-accuracy-part-i/
http://www.predictiveanalyticsworld.com/patimes/it-is-a-mistake-to-ask-the-wrong-question/
http://www.predictiveanalyticsworld.com/patimes/the-data-behind-data-scientists-top-kaggle-performers/
7. You are also a former Columbia University professor, who used to sing educational songs to his students. What did these entail?
These musical ditties entailed dense, hopefully clever lyrics that not only rhyme but also cover the substance of certain topics in computer science. They also entail cheesy music such as rock ballads, Broadway style musicals, raps, and patters. I consider myself a quite good amateur musician or a rotten professional musician, depending on your perspective. Although 15 years old, the songs’ teachings still apply, and you can hear some of them here: http://www.cs.columbia.edu/~evs/songs/
8. Which tools and techniques would you recommend an upcoming statistician to learn? Do you have any specific recommendation for learning tools with Big Data capabilities, which is constantly in the news now for being very necessary to learn?
Beyond my book, we’ve collected some getting-started resources together as The Predictive Analytics Guide (www.pawcon.com/guide). The concepts come first; the choice of tool comes later.
9. Are there people or events that have been influential in your career?
In the commercial world, there are some amazing “hot shot” consultants in predictive analytics, such as John Elder and Dean Abbott, who have become regular speakers at Predictive Analytics World (www.pawcon.com). In the academic world, my thesis advisor at Columbia University, Kathleen McKeown, showed me how to conceptualize research contributions. Also, Carnegie Mellon’s Tom Mitchell, who founded their Machine Learning department and wrote the first conceptually complete, friendly academic textbook on the topic, really showed me how to coherently package together and conceptualize the established core technical methods.