Data Mining for Business Analytics: An interview with the authors of the two exciting new editions on XLMiner and JMP Pro

Features

  • Author: Statistics Views
  • Date: 13 May 2016

This month Wiley is proud to publish the dual publications of the third edition of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner, 3rd Edition by Galit Shmueli, Peter C. Bruce and Nitin R. Patel and the first edition of Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro by the same authors and Mia L. Stephens.

Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition presents an applied approach to data mining and predictive analytics with clear exposition, hands-on exercises, and real-life case studies. Readers will work with all of the standard data mining methods using the Microsoft® Office Excel® add-in XLMiner® to develop predictive models and learn how to obtain business value from Big Data.

In turn, Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® presents an applied and interactive approach to data mining. Featuring hands-on applications with JMP Pro®, a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting.

Alison Oliver talks to the authors of these dual publications about the writing process and their background in statistics.

thumbnail image: Data Mining for Business Analytics: An interview with the authors of the two exciting new editions on XLMiner and JMP Pro

1. Congratulations on the dual publications of the third edition of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner, 3rd Edition and the first edition of Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro. Both books present an applied and interactive approach to data mining. How did the XLMiner edition come about in the first place and how did these lead on to the JMP Pro edition?

The seeds for this project lie in a course Nitin Patel and Dimitris Bertsimas developed at MIT, back in 2002. The title of the course was Data Mining: Algorithms and Applications. This was before the wave of “big data analytics” really hit – as late as 2011 the Master's programs in analytics could be counted on one hand. Now there are over 80. Nitin’s experience was in business schools, and he saw that business analysts, not just programmers, would need to have some facility with data mining techniques. This meant using Excel, so he drafted a set of course notes, and spearheaded the initial development of XLMiner, the leading data mining tool for Excel (now owned by Frontline – it is part of their Solver suite). Peter Bruce was a work colleague of Nitin’s at the time and joined the book+software project. Around the same time, Galit Shmueli was designing and teaching a data mining elective course at the University of Maryland’s Smith School of Business. She was struggling to find the right textbook and software for MBA students. Nitin’s course notes, plus the XLMiner tool, fit the bill.

Serendipitously, the three authors met in 2003 at the MSMESB conference at Georgetown University on teaching statistics at business schools. The connection was immediate and we decided to create the textbook that we and others needed, using XLMiner, which placed a comprehensive set of data mining algorithms and facilities within easy reach of any Excel user (it is now also available in a web version). The JMP connection came from Curt Hinrichs – he originally commissioned the book project, then moved on to JMP, and advocated tirelessly for a JMP version ever since.

2. What were the primary objectives that you had in mind when you started the book project?

We sought to produce a data-mining roadmap to be used by the business analyst for whom Excel is the primary analytical tool. The goal was to introduce MBA-level audiences with the business possibilities and challenges that can be tackled with data mining, and with basic familiarity of data mining approaches, principles, and methods. From the start, the book was intended to provide a hands-on guide to data mining methods, in particular the focus on prediction. We wanted to feature realistic data, a rich group of datasets, and invite the reader to get his or her hands dirty.

3. For the XLMiner title, what can we expect from this latest edition?

In this new edition we’ve added material on ensemble methods, social network analysis, uplift modeling, collaborative filtering, and text mining. We’ve also added cases and updated most of the other chapters.

Each book has detailed summaries that supply an outline of key topics at the beginning of each chapter; end-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material; a companion website with over two dozen data sets, exercise and case study solutions, and slides for instructors as well as data-rich case studies to illustrate various applications of data mining techniques. Please could you give us a taster of an example used in which data mining can be a valuable tool?

Here’s one that we feature in the new edition that is timely in the current political environment. “Microtargeting” of individual voters, as opposed to blanket targeting of entire population groups (e.g. Hispanics, women, etc.), has developed rapidly in the last 12 years. One feature of microtargeting is uplift modelling – determining which message works best for a given voter, given a set of predictor values for that voter. It’s a combination of A-B testing and predictive modeling.

Another example from the new edition is a case on cab sharing and forecasting demand for when and where cabs are needed, so that predictions are needed for many areas at many time points. Data mining is very useful in such settings.

4. Who should read the books and why?

This book will be of interest to:

- Professionals in business and other organizations seeking to extract value from “big data,” particularly analysts and managers who are not programmers.
- Students in MBA programs who want to gain skills in analytics
- Students in graduate and certificate programs in analytics and data science
- Researchers in areas where data mining is not typically taught - such as the social sciences - where “big data” are now becoming part of research and predictive modeling is beginning to appear

5. Why are these books of particular interest now?

Academia and managerial ranks in business are finally catching up with the need to teach and acquire skills in the analytics realm. The demand for professionals skilled in analytics has been outstripping supply for the better part of a decade, and universities, boot camps, online education institutions and business training programs need the resources to teach these skills.

We also note that the demand for good data mining textbooks is strong not only in North America and Europe, but also in Asia, where the number of analytics education programs has been growing at a very fast pace.

6. Were there areas of the books that you found more challenging to write, and if so, why?

Getting good data is a challenge. The same datasets tend to be used over and over, because they are well-behaved and they are available. We have been involved with this book for over more than a decade now, and we’ve been able to come up with some good actual datasets motivated by real business problems. We’ve been able to keep the freshness of data and problems by continuously integrating materials from the courses that we teach, which include student team projects and collaborations with industry partners.

7. If there is one piece of information or advice that you would want your reader to take away and remember after reading these titles, what would that be?

Always remember that the statistical or machine learning method that you are studying is a means to an end, and that a well-specified business problem always trumps a fancy statistical method, if the latter provides an answer to the wrong question.

8. What is it about the area of data mining that fascinates you?

It is how the whole area of analytics and prediction has assumed such a central and visible place in our economy and society. Seeing Amazon’s purchase recommendations, watching Google complete search terms for you, and speculating on how the NSA makes use of cell phone data moves out of the realm of magic and become things you can ponder in concrete terms.

From the research point of view, it is fascinating to see how data mining is slowly seeping into fields such as management and the social sciences, which have stuck exclusively to explanatory modeling. The addition of data mining creates controversies, misunderstandings, and lots of opportunities for new discoveries.

9. What will be your next book-length undertaking?

Peter is working on a book called “50 Essential Concepts in Statistics for Data Scientists” with his brother, Andrew Bruce.
Galit just completed a book coauthored with Ron Kenett on “Information Quality: The Potential of Data and Analytics to Generate Knowledge”. She is also working on the next editions of her “Practical Time Series Forecasting” textbooks.

10. Please could you both tell us about your educational backgrounds and what inspired you to pursue your careers in your respective disciplines?

Peter: For me, it was pure serendipity – my educational background was mostly in Russian, and my early career was in the US. Foreign Service. I went back to business school, connected with an early pioneer in resampling methods (Julian Simon) and ended up in statistics – nearly all self-taught.

Galit: My graduate studies were in statistics at the Israel Institute of Technology (Technion), with a strong engineering flavor. In 2000, I started my first academic job at Carnegie Mellon University’s statistics department, where I was first exposed to data mining and to applications in healthcare and marketing. Finally, I moved to University of Maryland’s business school, where I became heavily involved in interdisciplinary research with business school colleagues. These three diverse environments not only stimulated my research but also my teaching. I had to learn how to motivate and explain data analysis (statistics or data mining) for engineers, social scientists, and business students. Immersing is such diverse users of data has been inspiring and the source for endless ideas for new research and teaching methods.

11. Please could you tell us more about your current research interests? What are you working on currently?

Galit: my research and teaching in engineering, social science, and business schools have led to my most important insight and research on “To Explain or To Predict?”. The main point is that most scientific fields confuse prediction and causal explanation in terms of the methods they use for data analysis and performance evaluation. Social sciences typically do only explanatory modeling but think they can predict. Computer science does predictive modeling, but often think they can causally explain. My initial work was to clarify the distinctions to different audiences and highlight the necessity of both explanatory and predictive modeling to scientific progress. My most recent work is focused on “hacking data mining for causality”, where I investigate the adaptation of predictive methods for causal explanatory goals, especially in big data. One example is adapting classification and regression trees for impact assessment studies that involve self-selection. Another is detecting potential Simpson’s paradoxes in big data using classification and regression trees. Looking at trees from such angles is new and apparently pretty useful.

Peter: I’m running a business and it is engaging to work on solving business problems using data, and learning about cutting edge research that might turn into a new Statistics.com course, but regrettably, have little time for research that lends itself to academic publication.

12. Are there people that have been influential in your career?

Peter: Julian Simon opened my eyes to the discipline of statistics, seen as an outsider through the prism of resampling methods. Nitin Patel proved an invaluable mentor in the shift to data mining, and I recall in particular an old saying that remained always on his whiteboard – “With all thy getting, get understanding.” And working with Galit Shmueli has been a tremendous experience – she has boundless industry and good cheer, and can transform any chore into a learning adventure.

Galit: There have been so many influential people in my research and teaching career. I’ve been extremely lucky work with many brilliant, creative, and kind colleagues around the world with many backgrounds: statistics, information systems, econometrics, marketing, medicine, computer science, psychology, and more. The diversity of their knowledge, approaches, and experiences have shaped my thoughts (which will hopefully continue to reshape). Working on this textbook since its first edition has been an incredible journey with endless learning. Nitin’s notes opened my eyes to a completely different usage of linear regression than what I knew at the time. Peter and I have had many eye-opening and engaging discussions and he has been an incredible coauthor to work with. Co-authoring is like a long bootcamp: it creates deep and long lasting bonds (when successful). Through teaching with the book I have learned so much from my students at University of Maryland, Indian School of Business, Statistics.com, and now National Tsing Hua University.

14. Why did you write a JMP Pro version of this book?

Mia: The Shmueli et al. textbook is extremely popular in undergraduate- and graduate-level analytics courses, and JMP is widely used there as well. Many professors who teach with JMP have adopted the book. So, it made a lot of sense to develop a JMP-based version of this already established and successful book.

15. Mia - How did you become a co-author?

Over the years, I have done a lot of work with professors who teach with JMP, including many who teach with this book and have been asking for a JMP version. When Curt Hinrichs suggested the idea of writing a JMP-based version of the book, I naturally jumped at the opportunity to work with Galit and her co-authors. It's been a wonderful learning experience, and the response of early readers has already been extremely positive.

Related Topics

Related Publications

Related Content

Site Footer

Address:

This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on StatisticsViews.com are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and StatisticsViews.com express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.