This is a critical gap to address given the growing commonality of cross-platform single-cell RNA sequencing (scRNA-seq) datasets that exhibit differential patterns in the underlying biological and technical variability. In this paper, we present a novel, platform-agnostic approach to differential expression analysis that is applicable across platforms. Key to our approach is the use of self-adaptive Tweedie models that take into account the rich diversity of data distributions generated by rapidly evolving scRNA‐seq technologies.
Through simulation studies, we demonstrate that our method (Tweedieverse) outperforms existing differential expression analysis methods while controlling appropriate error rates at desired levels. By re-analyzing public single-cell gene expression datasets, we demonstrate that our approach can identify biological discoveries that otherwise cannot be revealed by existing approaches.
Tweedieverse is scalable to datasets with tens of thousands of genes measured on tens of thousands of cells. The open source software implementation of our method (in the R programming language) is available at: https://github.com/himelmallick/Tweedieverse.
Beyond scRNA-seq studies, Tweedieverse can be applied to other structurally similar data modalities such as metagenomics, metatranscriptomics, and spatial transcriptomics with similar zero-inflated and semi-continuous data distributions. We conjecture that due to their flexibility, the proposed Tweedie models would retain their strong empirical and theoretical properties in these other settings.