This month, Wiley is proud to publish the second edition of Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining by Glenn J. Myatt and Wayne P. Johnson.
With a focus on the needs of educators and students, Making Sense of Data presents the steps and issues that need to be considered in order to successfully complete a data analysis or data mining project. This second edition focuses on basic data analysis approaches that are necessary to complete a diverse range of projects. New examples have been added to illustrate the different approaches, and there is considerably more emphasis on hands-on software tutorials to provide real-world exercises. Via the related Web site, the book is accompanied by Traceis software, data sets, and tutorials; PowerPoint slides for classroom use; and other supplementary material to support educational classes.
The authors provide clear explanations that guide readers to make timely and accurate decisions from data in almost every field of study. A step-by-step approach aids professionals in carefully analysing data and implementing results, leading to the development of smarter business decisions. The topical coverage has been revised throughout to ensure only basic data analysis approaches are discussed, and new appendices have been added on the Traceis software as well as new tutorials using a variety of data sets with the software. Additional examples of data preparation, tables of graphs, statistics, grouping, and prediction have been included, and the topics of multiple linear regression and logistic regression have been added to provide a range of widely used and transparent approaches to performing classification and regression.
Statistics Views talks to co-author Glenn Myatt on this second edition.
1. Congratulations on the second edition of your book, Making Sense of Data which practical, step-by-step approach to the issues that need to be considered in order to successfully complete a data analysis or data mining project. How did the writing process begin?
In working with many organizations over the years, I found many professionals who were routinely analysing data even though their primary occupation was not data analysis. It was clear that this would be a growing trend because the volume of information needing analysis was expanding. The writing process began as an attempt to support these domain experts in making sense of their data and to provide a process and set of methods that would help them take the steps necessary for their projects to succeed. These steps also included important considerations not related to data analysis methods, such as how to successfully deploy the results from an analysis.
2. Who should read the book and why?
The book is aimed at business professionals, scientists, engineers, and other professionals that wish to understand what is important in data analysis or data mining projects, regardless of size. The book provides a simple step-by-step straightforward approach to analysing data in project typically performed as a team that will help these professionals make better decisions faster.
3. With the release of the second edition, what can the reader expect in this new version?
The second edition reorganizes the content and expands it in some areas. While the first edition structured the content around different topical areas such as data visualization and statistics, this edition organizes the content based on types of analysis from simple to complex. The emphasis has changed from describing methods as tools to describing the methods in the context of how they might be used. There is also expanded content for a number of useful methods including logistic regression and Kendal Tau correlation coefficient, and a new section that provides a series of hands-on tutorials using freely available software that was developed for use with the book.
While the first edition structured the content around different topical areas such as data visualization and statistics, this edition organizes the content based on types of analysis from simple to complex. The emphasis has changed from describing methods as tools to describing the methods in the context of how they might be used.
4. With a comprehensive collection of methods from both data analysis and data mining disciplines, this book successfully describes the steps that need to be taken, and appropriately treats technical topics to accomplish effective decision making from data. Was it always your and your co-author, Wayne Johnson’s goal to make the book very practical in its approach?
Yes, this was always our goal. We incorporated this practical aspect into the book in a number of ways. We tied different concepts and methods as they were introduced back to practical examples from a range of disciplines. The chapters provide the background materials and worked examples, but are supplemented with both paper and software tutorials using data sets we provided, but that the readers could replace with their own.
5. The book now also has a related website (http://www.makingsenseofdata.com/) with Traceis software, data sets, and tutorials; PowerPoint slides for classroom use; and other supplementary material to support educational classes. Please could you provide us a taste of one of these lectures?
The material has been developed into a 10 week lecture course that could be used or adopted into an introductory course on data analysis.
Statistics is essential for good judgment by helping to find and separate the patterns that are significant from what is not.
6. You are the cofounder and Chief Scientific Officer of Leadscope, Inc. Please could you tell us more about Leadscope and your role there?
This work involves the development of specialized data analysis software, databases, and computer models that are used across the pharmaceutical, chemical, cosmetics and government agencies. The work includes working with customers and partners to deploy data analysis and data mining solutions, including computer models. These models are also being used as part of regulatory submissions. My role also includes working with regulators and sponsors to ensure that the models satisfy the requirements.
7. Why is this book of particular interest now?
There has been and will continue to be a significant increase in the amount of electronic data that professionals will need to analyse. Skills to analyse and process this data in order to make better decisions are now essential for many occupations and disciplines.
8. What were your main objectives during the writing process? What did you set out to achieve in reaching your readers?
The main objective was to provide a practical resource that is accessible to all involved in a data analysis and data mining project, including those with no background in data analysis. The writing process focused on outlining the process and the steps considered necessary for the success of the project. Due to the book’s focus on domain experts, new concepts were introduced and illustrated with practical examples. Since it’s important to understand how the methods work and when they should be applied, each algorithm is described in detail and supplemented with paper exercises. The other objective was to integrate the material in the book with software that provided a hands-on way to explore data.
Glenn Myatt
9. Were there areas of the second edition that you found more challenging to write, and if so, why?
It is always challenging to distil the complex, from hierarchical agglomerative clustering to logistic regression, across domains in a simple and straightforward manner without losing important details.
10. What is it about this area that fascinates you?
The breadth of the subject matter and the difference it is making across all fields. It is particularly rewarding to enable interdisciplinary teams to look at data together. Team members bring different perspectives to data analysis and the synthesis of individual insights is powerful. It requires different types of tools for the analysis of the data including data visualizations, data analysis and mining methods, and so on. It also requires tools to support communicating the results when the solutions are deployed to others outside the team that developed it.
11. What will be your next book-length undertaking?
We will start to look into refreshing the contents in the second book in this series that focuses on more advanced methods and applications.
12. Please could you tell us more about your educational background and what was it that brought you to recognise statistics as a discipline in the first place?
My formal training is in computer science and artificial intelligence; however, throughout my career I have developed data analysis or data mining solutions that support the life science, including analysis tools to support the early stages of drug discovery and computational models for the assessment of chemical safety. In all these areas, I have found it essential to make valid, consistent and transparent decisions from ever increasing volumes of data. Statistics is essential for good judgment by helping to find and separate the patterns that are significant from what is not.