Stat is delighted to present the first-ever peer-reviewed compilation of work presented at the Symposium for Data Science and Statistics, an annual conference that brings together data scientists, statisticians, computer scientists, and others interested in the interface between computing and statistics. This virtual special issue of 18 articles collects the work of a variety of presenters at the conference, representing all six of the subject tracks that organized the symposium: Computational Statistics, Data Visualization, Education, Machine Learning, Practice & Applications, and Software & Data Science Technology.
The interdisciplinary nature of much SDSS content is typified by the special issue papers from the Education track. These two articles take a distinctly data science-based approach to statistics education topics: Hillis et al. apply machine learning to assess the effectiveness of supplemental instruction sessions in an introductory statistics course, whereas Kuter and Wedrychowicz offer advice about organizing a student hackathon on a limited budget based on their experience. The Visualization track contributions are similarly varied: Vanderplas et al. develop statistical methodology of formal tests for the perception of graphical displays of data, while Williams discusses methods for graphically comparing samples or populations in big-data contexts.
Among the three Software & Data Science Technology track papers, two introduce new R packages: Knudson et al. discuss a Monte Carlo-based package for estimating generalized linear mixed models along with a primer on the relevant methodology, whereas Bertin and Baumer present a meta-package that can help organize one’s computing-based research so it is reproducible by others. In addition, Abousalh-Neto et al. present two case studies that illustrate how the capabilities of SAS may be extended in combination with publicly-available software to produce interactive data exploration tools for users.
The Practice & Applications track is featured with five articles to showcase the applications of state-of-art machine learning methods and algorithms on predicting the lifespan of drosophila (Zhang et al.), studying polycrystalline materials (Matuk et al.), modeling asthma exacerbation in the City of Houston (Schedler and Ensor), and visualizing the food landscape of Durham, North Carolina (Graves et al.). The Computational Statistics track presents a new analysis on the estimation bias in first-order bifurcating autoregressive models (Elbayoumi and Mostafa).
The Machine Learning track includes six articles that develop new machine learning methods and algorithms. Altosaar et al. propose a flexible and scalable class of models for recommending items with attributes and apply it to building a meal recommender for a diet tracking app. Coleman et al. propose the locally optimized random forests and apply it to the problem of forecasting power outages during hurricanes. Diers and Pigorsch propose a self-supervised learning approach for outlier detection that transforms the unsupervised problem into a supervised problem. Rao and Reimherr study the problem of fitting functional models with sparsely and irregularly sampled functional data. Haghbin et al. introduce the functional singular spectrum analysis to analyze functional time series. Yancey et al. discuss several directions to improve the k‐nearest neighbors algorithm and implement these capabilities in their regtools package.
Finally, we wish to thank all of the authors who have submitted their work to this special issue. Though the speed with which we have managed to publish this special issue is unusually fast relative to many statistics journals, the process was nonetheless slower than we had hoped because we had to work out many of the details as we progressed through the process. There is always a bit of risk in committing to the first iteration of any new venture. We hope that these authors, and the readers of Stat, feel that the venture has been worth it—and that next year’s SDSS conference leads to a similarly high-quality collection of published articles.
As this special issue was in its final stage of preparation, we learned of the death on Feb. 9, 2021 of Jim Harner of West Virginia University’s Statistics Program. Jim was passionate about the interface between statistics and computing and, in recent years, was a strong advocate for remaking the so-called Interface Symposia, which took place most years since 1967, as the SDSS. We send our sincere condolences to his colleagues, friends, and family. His presence at SDSS in the future will be missed, and we dedicate this special issue to his memory.
Co-editors: David R. Hunter, Lingzhou Xue and Helen Hao Zhang