"The use of real datasets and exploring questions of wide appeal is what gets students excited to want to learn more:" An interview with Mine Çetinkaya-Rundel

Mine Çetinkaya-Rundel is Director of Undergraduate Studies and an Associate Professor of the Practice in the Department of Statistical Science at Duke University as well as a Data Scientist and Professional Educator at RStudio.

She received her Ph.D. in Statistics from the University of California, Los Angeles, and a B.S. in Actuarial Science from New York University’s Stern School of Business. Her work focuses on innovation in statistics pedagogy, with an emphasis on student-centered learning, computation, reproducible research, and open-source education.

She primarily works on developing pedagogical approaches for teaching statistics with a focus on the introductory statistics classroom, such as active learning, flipped classroom, and team-based learning. She also works on research projects that aim to assess the effectiveness of these approaches with respect to learning, retention, and self-efficacy.

Professor Çetinkaya-Rundel has co-authored three open-source statistics textbooks as part of the OpenIntro project at the introductory college and advanced high school level. Also as part of the open education effort, she has been developing and teaching various massive open online courses.

Alison Oliver talks to Professor Çetinkaya-Rundel about her career in statistics so far from the challenges of developing successful online courses to advice for students starting out in statistics.

1. You studied for a B.S. in Actuarial Science at New York University’s Stern School of Business followed by a PhD in Statistics at UCLA. What was it that first introduced you to statistics as a discipline and led you to pursue statistics as a career?

It wasn’t a very intentional path, to be honest. Despite studying at a business school, I wasn’t very passionate about the business aspect of things; I enjoyed math more and was stronger in it. Actuarial Science was the “mathiest” major offered in the business school, and that’s why I decided to pursue it. Then I got a job as a consulting actuary – I really enjoyed parts of my job but I didn’t feel like corporate office work was a good fit for me. After a couple years of working there I started thinking about what else I could do. I had always enjoyed teaching and had done a lot of tutoring in undergrad, so I thought what can I do that would allow me to teach but potentially also do other things too? If I could get into a PhD program, I could go down the path of academia, or I can decide that’s not for me and go back to industry. With the skillset that I had from having studied Actuarial Science, it seemed like either a PhD in math or statistics would be the right fit for me. I liked the idea of pursuing statistics because working with data – asking impactful questions and answering them with data – as opposed to focusing on the theoretical aspects of a graduate program in math seemed like a better fit for me, so I ultimately decided that statistics was the right path.

2. After also lecturing at UCLA as a graduate student, you then moved on to Duke University as Assistant Professor and are now Associate Professor. What led to the move at Duke and what do you enjoy about working there?

I started teaching at UCLA as a teaching assistant and then was given the opportunity to be the instructor of record for my own course, both within the UCLA Extension program where I taught evening classes and for UCLA undergrads during the summer and regular semester. A few graduate students get this opportunity. All of this provided me with incredibly valuable experience on what it takes to actually run a course instead of being a teaching assistant for it. The year that I was graduating, Duke had a search for a Professor of the Practice and I decided that what I wanted to do was to focus more of my energy on teaching, course design, and curriculum development as opposed to statistical methodology research, so I thought the position was a perfect fit for me.

3. What does the role of Director of Undergraduate Studies involve and please could you tell us more about how you are incorporating computation into the undergraduate statistics curriculum?

The Director of Undergraduate Studies role involves writing a lot of emails… Just kidding, but it’s a big part of it. I came into this role two years ago. The first year was spent trying to figure out what it even means. From what I could gather, it means thinking not just about your own courses, but about the undergraduate curriculum as a whole. This involves thinking about what the entry and exist points should be for student. It also involves collecting data from them in a way that allows them to reflect on their undergraduate studies before they graduate. For example, we ask them questions like “Were there any gaps in the curriculum for you? Where were they?”. Using this data we’re able to look to see what we can do in the undergraduate experience so students didn’t feel like there are gaps, whether there is discontinuity or repetition in the content of the courses, whether that is good repetition (because some of it is) or if it’s bad repetition (happening because the faculty had no idea what others are covering in their courses). All of this allows me to say “The major is clearly working, it’s been successful; our numbers of majors have been increasing. But are there things that we should do now that it’s the tenth year of the major at Duke? Is this the right set of core courses that we want to offer? Are there other electives that we want to offer?” In addition to advising and some administrative work, this type of introspection is what I’ve been focusing my energy on as the Director of Undergraduate Studies in my department.

The idea of a new gateway course that I have been developing and will be teaching in Spring 2018 comes from this exercise. A couple years ago we added a statistical computing elective that is designed for students to take in their third or fourth year. It’s been an incredibly popular course – many students who took it say on their exit surveys that it’s the most useful course they took at Duke! However, this course designed to come later in the curriculum. So, the new gateway course that I’ll be teaching aims to introduce computational data analysis skills early on to the students so that they can reap benefits of having been exposed to some of these skills early on.

4. You have designed a number of open online courses using R where you create your own real data sets. What are the main challenges in teaching statistics online that you have encountered?

The Coursera courses that I have online are built on a course that I had been teaching at Duke. The Duke course serves a very diverse audience of students in terms of majors and levels. However, once you put this material online on a platform like Coursera, the level of diversity in terms of background in quantitative topics (as well as many other facets) goes up immensely. Some of them are people for whom this is truly the only quantitative course they’re taking. Some are people who have a PhD in physics or some other technical field, which is something I didn’t expect prior to putting an intro stats course online. I clearly had a narrower vision, and I’m so happy to have been proven wrong on that. However, the increased diversity of the course audience really made me consider how the content is presented.

Another thing that’s challenging about teaching an online course is that you can’t say, “If anything is unclear, come by office hours.” There are no office hours. We did have a few Google hangout sessions but these were more to just put a face on the course; I didn’t expect to be able to reach the massive amounts of people with that kind of a tool.

Therefore, everything has to be extremely clear. So, I went through this process of writing very clear learning objectives, and making sure each one of those were covered in the videos. You almost have to write everything that you might say in office hours to help your students, for example, “Oh you made a mistake here, why don’t you go back and review this?” Building that all into the system is a lot of work, but it makes the course better. In fact, my course at Duke became a better course because of the enhancements I made based on what I observed about how people were approaching the material on Coursera.

Teaching R online is a whole other challenge. I was used to having students raising their hand and saying “What does this mean?” when they encountered an error. Enhancing the R labs in a way to include answers to common questions students might encounter as they work through them has helped a lot. But on top of that, the course has these fantastic mentors who are people who took the course previously and now hang out in the forums and help other learners. With their help, the course has become pretty self-serving, which is really nice.

5. What are the rewards in teaching these courses?

Teaching an online course has certainly helped my on-campus teaching, because it really made me rethink teaching in general. I think hundreds of thousands of people have been through this course, and when I say through this course I don’t mean they have done every single exercise and completed every single assessment. MOOCs tend to get a bad rep for having a low completion rate. If you literally divide the number of people who did every single exercise by the number of people who ever clicked on “join the course,” that percentage is a low number. However, the fact that so many people clicked tells me they’re interested in learning something about this content, and the feedback that I’ve received from the students, in general, has been very positive. That is, of course, very rewarding. Perhaps the greatest reward in teaching a MOOC is the ability to reach so many more people than I could ever have just teaching in a brick-and-mortar setting.

I think that one of the most important recent developments in statistics is the introduction of tools that lower the barrier to entry to computation. More and more developers seem to be developing tools with adaptability in mind. The tidyverse suite of packages have been a fantastic development for teaching R to newcomers. I no longer have to spend time having students go between data frames and matrices just to make the code work.

6. What are the most popular lessons that the students respond to that you would recommend to others teaching?

Students like learning R. I am fairly certain that is the biggest reason why my MOOC has been so successful. It was one of the first MOOCs that taught R for doing data analysis and statistics and assumed no background in any of these. The use of real datasets and exploring questions of wide appeal is what gets students excited to want to learn more. Basically, give students a question they actually want to answer, and they will put in the effort to learn the methodology, theory, and tooling to answer that question. Everyone loves working with the NYC flights data, and so do I! I do a modeling activity on how students rate professors and how the attractiveness of the professor is related to that, students often get a kick out of that. Questions from the General Social Survey make for good lessons because they often ask interesting questions about people’s views on matters of social importance, at least in the US, like same sex marriage or gun laws.

Another lesson that I give both in my online course and my on-campus courses is a mini lesson on Bayesian inference. It’s for a discrete problem, which means you don’t need calculus to work through it. But it makes people think about “What do we mean by a prior?”, “What does posterior probability mean?” and “What do these mean in the context of real data?”, “What does it really mean when a medical test result comes out positive?”, “What is the likelihood that you actually have that disease, and if you get retested, how should you consider that information in the retesting procedure?” Students really like this lesson as it stretches their thinking about conditional probabilities, which can be notoriously difficult to reason with. Most intro stats curriculum focuses on frequentist statistics and highlights only p-values to make decisions about data. I would strongly encourager injecting a bit of Bayesian inference, even if just to open the students’ mind to thinking about other ways of making decisions with data.

Additionally, students respond well to topics like randomization-based methods, because it’s something we can do by shuffling cards in the class. There’s a gender discrimination example that I do in class that is based on a small enough data set—it’s 48 observations, so less than 52, which is a deck of cards. This means we can actually simulate this in class using a deck of cards and then build the simulation distribution. This is done before I ever talk to students about how you would code this up in R, and they seem to really respond to that well.

7. What do you think have been the most important recent developments in the field and will these influence your teaching in future years?

I think that one of the most important recent developments in statistics is the introduction of tools that lower the barrier to entry to computation. More and more developers seem to be developing tools with adaptability in mind. The tidyverse suite of packages have been a fantastic development for teaching R to newcomers. I no longer have to spend time having students go between data frames and matrices just to make the code work. The consistent syntax allows them to expand on what they learned in class because the next function is designed to work in a similar way to the one we covered in class. If developers continue to build tools keeping in mind how people learn, I think we’ll be able to get more and more people hooked on using computing to reason with data, and that is a fantastic outcome both for data literacy and for scientific advancement.

8. You have worked on a variety of applied statistics research that has arisen from statistical consulting to working with medical school researchers. What are you working on currently?

I’m not working on anything currently but we recently wrapped up a project with a collaborator in pediatric optomology. I did the statistical analysis part of the project. I’ve found my collaborators for this project via a statistical consulting course in the department that I used to teach. I still get the emails that come to our consulting center, and sometimes when an email says “I’m working on such-and-such project, do you have some availability to help out with it?” and if it seems like an interesting problem that I can work through in the time I have available, I like to participate. I like medical applications – my dissertation research was on public health data and I like reading about health research. Another collaboration I had previously was with evolutionary anthropologists, that was pretty cool as well. It’s so interesting to work with researchers who actually go out into the field and observe chimps and record data and then we get to take a look at it with statistical tools. In short, I tend to pick and choose applied projects based on what I have time for and how interesting I find the dataset to be. It’s been a rewarding experience to do a bit of this on the side to make sure my applied stats skills stay current. Plus, consulting is another great way to find out what people who are not in the field feel comfortable with vs. have difficulty with, this type of information informs my teaching as well.

9. What has been the best book in statistics you have ever read?

This is a hard question, I don’t have a good way of identifying a book as “best”. But I can say that I actually really enjoy reading books about statistics or data that are written for a lay audience. It’s probably because I’m always thinking about what is a good example to use in class when introducing a particular topic, and such books tend to have unexpected examples. I’m thinking books like “The Lady Tasting Tea” and “Naked Statistics”.

For computing in particular “R for Data Science” is probably my favorite. It’s so nice to have a resource that I can learn from and share with students in my intro data science course. The presentation in the book is really appropriate for all levels of audiences.

10. What would you recommend to young people who want to start a career in statistics?

Oh, this is a hard question too. And my answer is two-fold depending on whether this person is in college or not.

If a current student, my strongest recommendation is to start early. Take a stats course early and see how you like it. This is really true for any discipline you might be interested in. This will allow you to navigate the core courses early on and then be able to take electives that you are truly interested in. Additionally, statistics is a major that complements other majors so well. As you work with real data in your stats courses you might say, “Hey, I’m really interested in studies that involve polling data”. Maybe this means you’re also interested in political science, so you might consider some quantitative courses from that discipline as well. Starting early allows for that. Lastly, if at some point you decide you want to continue on to graduate studies in statistics, you also need a strong mathematical foundation, which means you might consider taking some high-level math courses.

If you’re not a current student, the pathway is a bit different. But nowadays the issue isn’t lack of resources to learn statistics, but it’s almost an abundance. Do a bit of research on what online resources are worthwhile, and note that what works for one person might not for another. So first try to figure out what your learning style is, and what you can make time for (say, if you are currently working, or have family obligations), and schedule your learning around those. I suppose this is general learning advice, not specific to statistics, but it’s important to think about nonetheless.

And regardless of which category you fall under, I can’t overemphasize the importance of computation. Learn a computing language that is designed for working with data. R is my choice, but it’s not the only answer. The important thing is to focus on working with data, as that is what statistics is all about. When you’re learning the language, focus on acquiring best practices (like reproducible analysis, version control, organization, etc.) along with the methods. It is so much easier to develop good habits early on compared to after you have a set (but not ideal) way of doing things. And these best practices will also reduce the frustration in your learning.

Try to keep up to date with what’s new in the field as much as possible. Don’t think things will be over your head; even just reading a few blogs can be very helpful.

I would also say that an applied data analysis project can be a great way to really understand the beginning-to-end process of what a statistician does. An independent study project or an online course with a project component are just two of the many ways you can go about working on applied data analysis. If you’re the person doing all the work from acquiring the data, cleaning it, importing it, analyzing it, and writing up the results, you will get to see this process in full.

Lastly, don’t underestimate the importance of communication. If you can’t communicate your results as a statistician or a data scientist, it doesn’t really matter how good they are. Writing as you develop your analysis is a helpful way of getting some experience with it. Reading also helps immensely!

11. Who are the people who have been influential in your career?

One of my mentors that I’ve learned a lot from and still work very closely with is Rob Gould from UCLA. He has been immensely influential and very helpful in my career. He is the Undergraduate Vice Chair at UCLA. I’ve learned so much from him about statistics education. He is also the person who gave me the many opportunities to teach and helped me find my passion in teaching. Among many other things, I work closely with him on DataFest. It’s been an amazing collaboration. If you don’t know of DataFest, look it up, it’s all the rage! And if you’d like to host one at your institution, let me know!

Another person I would list is my advisor thesis advisor from UCLA, Jan de Leeuw. I have certainly learned a lot from him in terms of statistics methodology, but beyond that, his dedication to open source development, open education, and open research has been incredibly inspiring for me. At times when I’ve felt conflicted about whether a project is worth the effort, I often ask myself, “what would Jan say?”

Dalene Stangl was my first mentor at Duke. She was the Director of Undergraduate Studies when I started here, and perhaps the most important thing I’ve learned from her is the importance of establishing a good rapport with students. As I interact but I always have her as a role model for good advising.

In terms of statistics education, two other people who I take so much inspiration from are Jo Hardin and Nick Horton. Nick is incredible at having a pulse on everything that is happening in this area, and Jo always asks great questions that makes me stop and think “seriously, what is the big picture here”. I always walk away from a conversation with them with a thousand things to follow up and work on.

We probably don’t have enough time for me to talk about all the ways Jenny Bryan and Hadley Wickham’s work has been inspirational in my career, I could go on and on forever. Hadley works on so many things that influence how I navigate working with and teaching R, but if I had to pick one aspect of his work to highlight, I would say it’s the development of packages that use a consistent grammar. It is such a bliss to be able to teach R and not sound like a robot when reading the code. I love how the syntax of the tidyverse mimics the natural language. And Jenny, among many other things, is fantastic at creating extensive, thoughtful, and accessible documentation for her teaching and making them openly available to everyone. I draw so much inspiration from how she teaches workflow and organization.

Copyright: Image appears courtesy of Professor Çetinkaya-Rundel

Stats & Data Science Views

“The use of real datasets and exploring questions of wide appeal is what gets students excited to want to learn more:” An interview with Mine Çetinkaya-Rundel

Topic

Topic

Share