RNA-seq has been an increasingly popular high-throughput platform to identify differentially expressed (DE) genes, which is much more reproducible and accurate than the previous microarray technology. Yet, a number of statistical issues remain to be resolved in data analysis, largely due to the high-throughput data volume and over-dispersion of read counts. These problems become more challenging for those biologists who use RNA-seq to measure genome-wide expression profiles in different combinations of sampling resources (species or genotypes) or treatments. In this paper, the author first reviews the statistical methods available for detecting DE genes, which have implemented negative binomial (NB) models and/or quasi-likelihood (QL) approaches to account for the over-dispersion problem in RNA-seq samples. The author then studies how to carry out the DE test in the context of phylogeny, i.e., RNA-seq samples are from a range of species as phylogenetic replicates. The author proposes a computational framework to solve this phylo-DE problem: While an NB model is used to account for data over-dispersion within biological replicates, over-dispersion among phylogenetic replicates is taken into account by QL, plus some special treatments for phylogenetic bias. This work helps to design cost-effective RNA-seq experiments in the field of biodiversity or phenotype plasticity that may involve hundreds of species under a phylogenetic framework.
© The Author 2015. Published by Oxford University Press. For Permissions, please email: email@example.com.