Dr Nathan Yau is best known for his popular blog, Flowing Data, which explores how statisticians, designers, data scientists, and others use analysis, visualization, and exploration to understand data and ourselves. The blog has roughly 45,000 followers on Twitter with around 550,000 page views per month.
Yau obtained his PhD in statistics from UCLA. His dissertation was on personal data collection and how we can use visualization in the everyday context. That expands to more general types of data and visualization and design for a growing audience.
He has also written three books on visualization for Wiley and the series continues to grow. Statistics Views talks to Dr Yau about running Flowing Data and his work.
1. Congratulations on the success of your book, Visualize This: The Flowing Data Guide to Design, Visualization and Statistics which is a practical guide on visualization and how to approach real-world data. How did the writing process begin?
For FlowingData, I feature the work of others along with my own visualization projects. One of the most common questions I get is, “How did you make that?” So I tried to answer that basic question with Visualize This. There are a lot of books about visualization, but when I wrote Visualize This, only a small percentage described how to visualize data.
2. Who should read the book and why? Those who want to present or explore data.
The book contains tutorials which teach the reader how to create graphics that tell stories with real data. The reader can learn how to make statistical graphics in R, design in Illustrator, and create interactive graphics in JavaScript and Flash & Actionscript.
3. Was this one of your original objectives in writing this guide to make its appeal hands on?
Yeah, hands-on with real data was my main objective. I imagined going back to my old self, when I was first starting to learn visualization. Like many, I began with the Edward Tufte books and some related ones, but when it was time to actually work with data, I felt stuck.
4. You begin though with a description of the huge growth of data and visualization in industry, news, and government the book reveals opportunities for those who tell stories with data. The book then moves on to actual human stories and trends in data and statistics. Did you draw upon your own experience as a graduate student to write these stories or are they those from friends and colleagues?
When I started graduate school, there was no iPhone yet and smartphones were barely a thing yet. And then a couple of years in, there was a boom. I remember I was in a meeting with my workgroup – we were exploring how we could repurpose mobile phones as data collection devices – and someone asked, “Should we take a look at this iPhone thing?” So yeah, graduate school played a role, but I tend to keep my eye on this stuff since it’s my job now with FlowingData.
5. What is it about the area of data visualization that fascinates you?
A lot of things, but in short, I’m curious how visualization can be used to help people who aren’t data professionals understand data. How much complexity can they take?
People new to visualization really like “rules” to follow. Like use only this chart for this type of data or never use this chart. In practice, the “rules” have a lot of bend to them, but it’s hard to convey that to someone who hasn’t worked with a lot of data yet.
6. Why is this book of particular interest now?
There’s a lot of data to be play with.
7. What were your main objectives during the writing process? What did you set out to achieve in reaching your readers?
I knew a lot of readers of the book would be FlowingData readers, where I tend to keep things light. I wanted the book to match the tone of the site. Writing and talking about data tends to be dry and sometimes a challenge to sit through. It’s fun though.
8. Were there areas of the book that you found more challenging to write, and if so, why?
People new to visualization really like “rules” to follow. Like use only this chart for this type of data or never use this chart. In practice, the “rules” have a lot of bend to them, but it’s hard to convey that to someone who hasn’t worked with a lot of data yet.
9. What will be your next book-length undertaking?
Well, Data Points was published by Wiley after Visualize This. I also introduced a new 4-week course on learning visualization in R: http://flowingdata.com/2015/05/06/introducing-a-course-for-visualization-in-r/. Like Visualize This, it focuses on the how. Other than that, I’ll hold off on anymore writing projects for now. I want to focus on my own data projects for a while.
10. Please could you tell us more about your educational background and what was it that brought you to recognise statistics as a discipline in the first place?
I have a PhD in statistics from UCLA. My dissertation was focused on using visualization for personal data collection. I first really got into statistics during my second year of college. I had a really good professor for an introduction to stat class, and for some reason, the concepts clicked for me. And there’s that saying that goes something like, “If what you love to do seems like a chore for everyone else, pursue that thing as a career.”
11. As you mentioned, you have also written several other books for Wiley, including and Data Points: Visualization that Means Something and Data Fluency: Empowering your Organization with Effective Data Communication. Of which book are you the proudest?
Tough. If I have to pick, Visualize This. I wrote the book in the middle of my PhD, and I never thought I would or could write a book in the first place.
12. How can you see the teaching of data visualization changing in years to come?
I graduated a couple of years ago. But I never actually took a formal visualization course. They just weren’t offered in my program. That seems to be changing. And with evolving data availability comes new data visualization.
13. What career path are you considering to take now?
I do FlowingData full-time. It started while I was still in school, and so once I finished, I suddenly had a ton of time to work on my own projects.
…for tools of the future, visualization specifically, I recommend R for static graphics and D3.js (a JavaScript library) for interactive graphics on the web. And for the visualization-focused people, I recommend learning statistics beyond hypothesis tests and bell curves.
14. Which tools and techniques would you recommend an upcoming statistician to learn? Do you have any specific recommendation for learning tools with Big Data capabilities, which is constantly in the news now for being very necessary to learn?
I’m not sold on the Big Data concept, and I think it’s fading into becoming just data again. But for tools of the future, visualization specifically, I recommend R for static graphics and D3.js (a JavaScript library) for interactive graphics on the web. And for the visualization-focused people, I recommend learning statistics beyond hypothesis tests and bell curves.
15. Are there people or events that have been influential in your career?
During my first year of graduate school, there was a guest speaker, Mark Hansen, who presented visualization work. It was a combination of art, computer science, and statistics, and I was sold. I went home and looked for all I could on the topic, and Mark became my adviser. I think the main thing he taught me was that applications for data stretch much farther than most people think.