Lay abstract for Statistics in Medicine article: Differential expression of single-cell RNA-seq data using Tweedie models

Each week, we publish lay abstracts of new articles from our prestigious portfolio of journals in statistics. The aim is to highlight the latest research to a broader audience in an accessible format.
The article featured today is from Statistics in Medicine with the full article now available to read here.
Mallick, HChatterjee, SChowdhury, SChatterjee, SRahnavard, AHicks, SCDifferential expression of single-cell RNA-seq data using Tweedie modelsStatistics in Medicine20224118): 3492– 3510. doi:10.1002/sim.9430
Despite recent success, most published single-cell differential expression analysis methods in the literature are platform-specific, making it difficult for practitioners to choose an analysis pipeline among many available.

This is a critical gap to address given the growing commonality of cross-platform single-cell RNA sequencing (scRNA-seq) datasets that exhibit differential patterns in the underlying biological and technical variability. In this paper, we present a novel, platform-agnostic approach to differential expression analysis that is applicable across platforms. Key to our approach is the use of self-adaptive Tweedie models that take into account the rich diversity of data distributions generated by rapidly evolving scRNA‐seq technologies.

Through simulation studies, we demonstrate that our method (Tweedieverse) outperforms existing differential expression analysis methods while controlling appropriate error rates at desired levels. By re-analyzing public single-cell gene expression datasets, we demonstrate that our approach can identify biological discoveries that otherwise cannot be revealed by existing approaches.

Tweedieverse is scalable to datasets with tens of thousands of genes measured on tens of thousands of cells. The open source software implementation of our method (in the R programming language) is available at:

Beyond scRNA-seq studies, Tweedieverse can be applied to other structurally similar data modalities such as metagenomics, metatranscriptomics, and spatial transcriptomics with similar zero-inflated and semi-continuous data distributions. We conjecture that due to their flexibility, the proposed Tweedie models would retain their strong empirical and theoretical properties in these other settings.

More Details