“The coupling of innovation and open data science tools for science has been fascinating to me and hard to tease apart”: An interview with Dr Julia Stewart-Lowndes

Dr Julia Stewart-Lowndes is a marine ecologist, data scientist, and Senior Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara, USA. As Director of Openscapes, which she created as a Mozilla Fellow, Julia works to increase the value and practice of open data science within scientific communities. She earned her PhD at Stanford University in 2012 studying drivers and impacts of Humboldt squid in a changing climate.

Alison Oliver talks to Dr Stewart-Lowndes about her career so far.

1. You studied for a PhD at Stanford focusing on drivers and impacts of Humboldt squid in a changing climate. What was it that inspired your interest in environmental science and what was it that led you to pursue a career?
I grew up near Monterey in California so I was able to go to the seashore regularly and spend time exploring and seeing patterns. I became really interested in how humans interact with the environment and how there’s still so much we don’t know, and as everything changes, there’s still so much we need to understand in order to be able to live sustainably.

I’m lucky because the part of Monterey where I grew up is also where I did my PhD, at Stanford University’s Hopkins Marine Station. I think the Monterey Bay Aquarium had a big influence on me because it’s this amazing aquarium where they focus a lot on education and conservation in their messaging, so you actually learn about the species and the habitat and how it interacts and how it’s changing. It’s all in one thing, so it’s educational, and it’s all local things on display. Most things you see on display you can then go to tidepools yourself and see.

2. You are currently a marine data scientist at the National Center for Ecological Analysis and Synthesis. What led to this move and what does your role entail?

When I set out to become a marine biologist in grad school, I didn’t really expect it to be so data-heavy. I got into marine biology because I was passionate about the outdoors, trying to understand things, and contributing to our greater understanding. I expected observation, hypotheses, learning and reading. But modern science actually means data analysis all day long combining your data with other people’s data to answer the questions you’re asking. The data analysis side was a struggle for me, and difficult to accept that it was part of my scientific future.

However, I did really enjoy the power of coding once I overcame being demoralized and resistant to the idea. And then being able to get this job at the National Center of Ecological Analysis and Synthesis was really cool, because they do data analysis on a larger and more collaborative scale. I think of it as kind of upcycling data—NCEAS takes data that has already been collected and analyzed for different studies, and combines them to ask different questions—so that was a really cool part of going to NCEAS.

3. You were awarded a Mozilla Fellowship to help set up Openscapes. Please could you tell us how this came about and more about Openscapes?

Mozilla caught my eye with their Science Lab; they were really supporting scientists to get together and teach each other how to code. The reason I got my fellowship was because of my role in the community now, because as I started to learn R, I got involved with rOpenSci, RStudio, and R-Ladies which are really big community members.

I think being involved in these communities and trying to create others and engage with people is how I got the Mozilla fellowship. I had this vision of wanting to help scientists get more involved because it’s undiscovered territory for scientists — coding can be social and have all these resources that can be really great for science, so I went to Mozilla with this vision of wanting to bring the R community ethos and tools and community to science and bridge that, but I didn’t really have a plan of what I would do. So it was after I was in the fellowship that I was part of this Mozilla open leaders program and really understood the architecture of what my plan would be.

4. Please could you tell us what you are working on currently?

I am not doing my own research right now. Openscapes is my focus as not only a mentorship program for scientists but also a mentorship program for teams, so the idea is that if you get the lead of the research group, whether that’s faculty, a lecturer or a program manager, you have that lead with their team members together and everybody learns what’s possible with R, with teamwork and with community.

We discuss what’s possible with specific R packages and trainings and depending on their role within that team, the lab members who are doing data analysis every day have time and space to explore these tools. They get better at it and find different ways so that they can have shared practices as a research group or team if they are not working on the same deliverable.

The faculty or lecturer goal is to see how powerful these things are with their labs and allow their labs the time and space to do it and become advocates for it more broadly. It’s really about trying to obtain a more collaborative effort, but there’s also a big element of trust there.

I can imagine if I was starting a lab as a new faculty, there’s so many elements that you’re responsible for and you need your lab to do as you’re getting set up. If you are not comfortable with R or these new practices with GitHub, there could be this feeling that you need to learn it first and then teach it to your lab. If you’re a new faculty, you don’t have time to learn GitHub and to do your research on what communities exist or what packages are ideal, etc., so it’s really about trying to remove that burden and this pressure that they need to become an expert beforehand. It’s trying to have everybody see the value and trust each other to work together on it.

5. In your blogs and research, you talk a lot about bridging environmental science with open and reproducible science. Please could you tell us how open data science tools have made improvements to research over the past decade?

The coupling of innovation and open data science tools for science has been fascinating to me and hard to tease apart. With the Ocean Health Index (ohi-science.org), we had a really ambitious plan to have our method for quantifying how healthy oceans are to be used all over the world. We want it to be used by the UN, because we think it’s a good method for comparing across countries at the national scale, but it’s also adaptable for smaller scales, so the government of Columbia can use it to assess their provinces, and so can the government in Sweden, etc. Everybody can tailor this method to their situation using their own data and developing models around the data.

So that’s the vision. But to actually make that happen, the way we were working circa 2012 made it not possible, because we were doing data preparation largely in Excel and writing up documentation in a Word document, making it into a PDF and emailing it to people, and then “oh it’s out of date” so you have to email them again and tell them that this is a new version of the PDF. It’s a lot of one-on-one curation trying to figure out how to communicate with people – including within our own team. But now, with open data science, we all have our code online and they know where to find it, it’s up to date, it’s the most recent version, and they can use GitHub to work on it. At the same time, we can provide guidance. We also were able to use R and GitHub not only traditionally within the GitHub interface, but to create websites, training books and tutorials and all these things that are always online and up to date.

So instead of you having to email and say, “Here’s an updated version of this PDF,” you’re like, “Here’s the link, and the link is always the most recent” so it’s just broadened what’s possible.

Now we have been able to realize that kind of dream in 20 countries all over the world using our science and our code to assess how healthy their oceans are and do so in a way that they’re not reinventing the wheel but can start where there’s architecture that we’ve created. It’s really cool.

6. Your talk at useR! 2019 last summer where you were one of the keynote speakers was entitled ‘R for better science in less time’. If there was one thing you would wish the attendees would take away from your talk, what would it be?

I really think that creating this welcoming environment for teams to engage in data science is critical. Being welcomed into a situation where people are coding and innovating has changed my life.

7. How has the use of R and other statistical software evolved over your working career so far? What have been the developments that have especially helped your research?

The tidyverse has been so influential for me. I tried learning R in graduate school, back in 2008. With no coding background, it was really difficult for me to understand the syntax, and I’d stumble over the classes of variables and not have any clue, even just conceptually what a problem would be or how to fix it. Being able to see the steps of data analysis, where importing your data is a step, which is a discreet thing – and so is tidying your data, which is a big task. But if you wrangle your data to be tidy, then you’ve got this whole suite of analytical tools that you can use to actually ask your questions rather than trying to start off with the analysis and creating bespoke approaches to accommodate whatever format your data is in.

So that framework is so helpful for me. But then also the tidyverse itself lets you chain together steps of the analytical process with the `dplyr` and `tidyr` packages. The “pipe operator” `%>%` lets you chain together bits of logic so your analysis reads like a story. It reads: “Take the data, and then select these columns, and then filter out these years, and then summarize by totals” or whatever—and you can read it like a story. That’s been the biggest sort of coding innovation in terms of what has made all of this much easier for me to wrap my head around and to teach other people. It’s so empowering an utterly different from my experience learning on my own and I want everyone to know about it.

8. What do you think have been the most important recent developments in the field of environmental science and will these influence your research in future years?

I think climate change has been the most influential thing. I finished my PhD in 2012, and I feel like at that point, climate change wasn’t always the driving narrative of research. But now there’s just been this urgency which is critical but also extremely sad.

It’s everything. If you’re studying where fish live for fisheries that feed people all over the world, you’re immediately thinking about climate change. If you’re studying how disease is transmitted through mosquitos, you’re thinking about how climate change is affecting that. If you’re thinking about how disease travels in forests and drought, you’re thinking about climate change. It connects everything.

9. Your research has been published in numerous journals: is there a particular article that you are most proud of?

I think that it has to be the “Better Science in Less Time” article in Nature Ecology and Evolution that we published in 2017, because I feel like that has been the defining point in my research life. It catalyzed a shift into focusing my time on helping other scientists work better rather than doing my own research. Definitely a defining moment in my life, and I love that it speaks to people. Openscapes is basically turning that paper from a static PDF into a mentorship program to help people work that way. So it is something I’m very proud of.

10. What has been the best book in environmental science you have ever read?

My friend from graduate school has just written this amazing book about the evolution of cephalopods: Squid Empire by Danna Staaf. The length of time that they lived and the diversity just overshadows dinosaurs. Dinosaurs are a tiny moment in time and just a few species compared to the breadth and awesomeness of cephalopods, and her book just brings that to life and it’s so cool. That’s what’s on my mind right now.

11. What would you recommend to young people who wish to start a career in environmental science?

Spend time outside and observe things: patterns of the birds that come to your birdfeeder, your trees. Just seeing patterns and how they change and then thinking about it in a data framework, like if you had a question about how things are changing in terms of the number of birds that come, how would you set up an experimental design for that?

I think having a good grounding in natural history and the environment, you don’t even have to be a naturalist, but just enjoying the environment and then coupled with good coding skills is really good.

12. Who are the people who have been influential in your career?

Steve Haddock has been incredibly influential. He is a scientist at the Monterey Bay Aquarium Research Institute (MBARI). He was a friend of mine in graduate school and he wrote a book called Practical Computing for Biologists, and that was really one of the reasons I was able to finish graduate school – his friendship and mentorship. I was the audience that book was written for: I’m a biologist, I want to study marine or the environment, but I have never learned how to use a computer well, and now that’s a huge part of being a scientist, so how do you upskill and learn about computing really effectively? The book really helped me ease into coding and programming, so he has been really influential and definitely helped me to get onto the path I am on now.

Copyright: Image copyright of Elliot Lowndes, appears courtesy of Dr Stewart-Lowndes