Last month, Wiley was proud to publish the latest in their popular ‘Dummies’ series – Data Science for Dummies written by one of our very own writers for Statistics Views. Lillian Pierson P.E, who writes regularly for our Engineering and Environmental sections, is an entrepreneurial data scientist and professional environmental engineer. She’s the founder of Data-Mania, a start-up that focuses mainly on web analytics, data-driven growth services, data journalism, and data science training services. She also covers the topics of data science, analytics, and statistics for prominent organizations like IBM and UBM.
Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles in organizations. Data Science For Dummies is the perfect starting point for IT professionals and students interested in making sense of their organization’s massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you’ll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization.
- Provides a background in data science fundamentals before moving on to working with relational databases and unstructured data and preparing your data for analysis
- Details different data visualization techniques that can be used to showcase and summarize your data
- Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques
- Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark
It’s a big, big data world out there – Statistics Views talks to Lillian Pierson on how Data Science For Dummies can help you harness the power of data and gain a competitive edge for your organization.
1. Congratulations on the publication of your book, Data Science for Dummies in which data science can help the reader gain in-depth insight into your business. How did the writing process begin?
Kyle Looper from Wiley approached me about writing the book. It was fun and amazing to me as I was actually boarding a plane from Bangkok to Bali at the time and I was checking my email one last time before turning my phone on. I was sent this book deal and I was very excited and did not want to turn my phone off! I was apprehensive too as I wondered how I was going to tackle this.
So I began by carrying out a careful analysis of my readers and their needs, as well as an analysis of other competitive titles on the market and what they were not able to provide to readers. I thought about what would be interesting and useful to me and came up with an outline, then strategy for what I needed to include in order to really help my readers achieve their personal goals, such as subtopics. The outline came back with minor amendments. I was originally contacted in July 2014 and then five months later, the book was in.
2. Who should read the book and why?
• For data scientists – The book overviews each area of data science, in easy-to-read language, and tells you about what specific goals you can achieve by taking on other areas of the field. The book is an overview of the space. Let it serve as a road map while you develop your professional expertise.
• Novice data analysts who want to get started in data science.
• For recent grads or current workforce that wants to amass the skills they need to stay relevant in their respective fields. Building even a few strengths in data science is likely to make you a much more marketable employee and achieve impressive results in your work,
• Business managers and decision-makers that are curious about how data science and advanced analytics could work to improve your business.
These were all the people I had in mind when writing the book.
3. At the beginning, you provide a background in data science fundamentals before moving on to working with relational databases and unstructured data and preparing one’s data for analysis. Was this one of your original objectives in writing this book to alert readers that it is essential to assess your data first?
At the start of the book I introduce the data science field, and then quickly move into contrasting that against data engineering. That’s because data science and data engineering are two separate disciplines, both of which are completely essential to creating value from big data. The distinction between these fields is definite and needs to be cemented into the minds of readers, before going into deeper discussions on data science.
After this I wrote a chapter on how you can use data science to create value in a business setting, because that is the interest of most of my readers. I think there will be more people coming from the business sector as traditionally they have not been very quantitative, whereas those who come from a scientific background have more every day experience in using data.
In Part 2, I discuss a nuts and bolts approach to how to go about using stats, math, clustering, classification, and machine learning to extract value from raw data. Structured vs unstructured data is not central to these discussions.
Building even a few strengths in data science is likely to make you a much more marketable employee and achieve impressive results in your work.
4. The book also details different data visualization techniques that can be used to showcase and summarize one’s data and explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques. I have spoken to lecturers who wish to use such techniques to showcase to their students but are pressed for time. What would your advice be? Is there a quick, easy way?
The concepts are not that complicated. These could be covered in 2 or 3 lectures. If you are talking about teaching this in practice, that is a different story. In engineering school, they would not teach us how to do what we were required to do. They gave us a brief overview of the concept and then told us to use this or that method / approach / application to solve our homework problems. If we were lucky, we would get a lab session with a TA that might help, if we were lucky enough to understand what they were saying.
Mostly we learned by doing and by hitting our heads against the proverbial brick wall. We helped each other learn to do it because there simply was not time for a professor to hold our hands through the whole thing. Data science is complex, and if a person is not curious and driven, then they will never get far no matter who their professor is. My best advice is to require your students to figure out whatever it is you want them to learn during your class. You’ll never have time to hold their hand through it all but if it is a matter of passing or failing, they will find a way to figure out what they need to do, no matter how complex it is. That is my experience.
5. Big Data are two words on the lips of many statisticians. The availability of masses of data has enormous potential but are there potential ethical risks?
Yes, there are some ethical risks. These are mostly posed by breach of privacy rights and misuse of a data to support the propagation of biased and dishonest arguments. People are also realising at the same time that there is no such thing as privacy which is a fact that you need to get over. Cookies track everything you do on the internet.
Big Data will create opportunities in jobs and funding as people now are starting to recognise its value in their data.
6. What is it about the area of data science that fascinates you?
I love data science for the results it helps you achieve. While understanding data for data’s sake is somewhat interesting, I am a very results driven person. Data science fascinates me because of the results it can be used to achieve – namely, it can save lives in civic, natural disaster and other humanitarian applications; it can protect our environment from man-made harms and it can grow the visibility and bottom-line of ecommerce businesses.
7. Why is this book of particular interest now?
In the recent past, there has been significant advancement in the technology and availability of data- generating, capturing, storing, processing, and analysing technologies. Now that data is widely and cheaply available, people and organizations have begun using it to get the upper-leg over their competitors. That means that people and organizations that are not using data to stay competitive are at serious risk, whether they realize that or not. Organizations that aren’t able to use data to stay competitive are at risk for quickly becoming obsolete, as they are superseded by more nimble, data-savvy organizations.
The era of Big Data and data science represents a period of sink-or-swim. Business organizations that want to survive require staff that is skilled in using data to achieve optimal results in all lines of business. This requires training and skill development in existing staff, and new hires that have sharp quantitative skills upon graduation from university.
8. What were your main objectives during the writing process? What did you set out to achieve in reaching your readers?
I really wanted this book to be fun, interesting, and practical. I am very passionate about data science, and my goal was to write a book that would spread and share my excitement with the book’s readers – so that they really understand why and how this is such a pivotal and exciting new area of study. To achieve this, I used lots of fun real-world examples and a friendly, approachable, and conversational tone.
Many times, data science and its approaches are presented in an overly-technical tone that can be quite boring and even scary to non-practitioners that need to learn these skills. It does not have to be that way.
9. Were there areas of the book that you found more challenging to write, and if so, why?
The discussion on spatial statistics was extremely difficult for me to write. That’s because spatial statistics is an incredibly complex and difficult field. Although, I have used spatial statistics to in my work, I found it very difficult to really capture and explain spatial statistical concepts in one small chapter that was geared towards newbies. Luckily, the world’s foremost expert in spatial statistics, Dr. Pierre Goovaerts, is a friend of mine. He was generous and kind enough to step in and help me get this chapter up to par for what readers needed and deserved. I am very proud of the chapter on spatial statistics, as it’s the most succinct and easy-to-understand explanation I’ve ever seen.
10. What will be your next book-length undertaking?
Well, my next major undertaking is a series of data science training courses for online or in-person instruction. I am dedicated to opening up the field of data science. Many times, data science and its approaches are presented in an overly-technical tone that can be quite boring and even scary to non-practitioners that need to learn these skills. It does not have to be that way.
Issues can be presented in data science in a clear, upfront way or they can be presented very technically in a complicated manner, which would not help anyone.
All most people really need is some basic quantitative (statistics and math) and programming (R and/or Python) skills – from there, they just need drive, curiosity, and subject matter expertise in their respective fields. The statistics, math, and programming skills that are required aren’t really that complicated. Learning them doesn’t have to be boring, scary, or overly-burdensome. Through my company, Data-Mania, I will be offering training and courses that offer a fun and easy-to-understand approach for imparting these skills to students.
11. Please could you tell us more about your educational background and what was it that brought you to recognise statistics as a discipline in the first place?
I have a bachelor’s degree and professional license in Environmental Engineering. I was always interested in environmental studies. Whilst I was studying at the University of Texas-Austin, Governor Bush was organizing the transport of nuclear waste from Maine and burying it on a fault along the Rio Grande, a river where Indians live near and cultivate the land. Unsurprisingly, many environmental activists were up against this agreement and that got me interested in containment vessels and geochemistry. How was this going to affect the environment and how could better systems be designed to contain the nuclear waste? I learnt that this type of problem was solved in the area of environmental engineering and this was my biggest passion.
I first started using statistics to support the work we were doing in environmental engineering. Namely, my first two applications of statistics in engineering were:
• To prove or disprove the effectiveness of a lake bioremediation project in Central Florida.
• To develop coefficients for equations that described the infiltration rates and retention volumes of rainfall through a green roof (for different rainfall intensities and across different seasons of the year) at the University of Central Florida.
12. Could you please tell us more about yourself in your current role?
I founded and am managing Data-Mania, an information services business that is currently focused on offering training to recent grads and working professionals. This training is focused on quickly getting enrolees up-to-speed in the data science skills they need to stay relevant and marketable in their fields. I’ve also been very busy delivering international talks on data science and serving as an ambassador for major brands that operate in the data space. I am in the process of branching off into the ecommerce products space with my new Amazon Store. Beyond that, I am managing and overseeing Data-Mania’s growth engineering services.
13. Are there people or events that have been influential in your career?
Yes – Patrick Meier, Jake Porway, and Chris Guillebeau. Also, there was a teacher back in 10th grade, Dr. Mary Walker who showed me that creative women could also be brilliant quants. She showed me that, as a blond, Caucasian woman in the USA, I could do more than the people in my direct environment suggested. Namely, I could be an intelligent, independent, quantitative, and self-sufficient woman. I am forever grateful for the ways she inspired me.