RESEARCH ARTICLE


Statistical Methods for Overdispersion in mRNA-Seq Count Data



Hui Zhang*, Stanley B. Pounds, Li Tang
Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA


Article Metrics

CrossRef Citations:
4
Total Statistics:

Full-Text HTML Views: 1145
Abstract HTML Views: 1323
PDF Downloads: 1350
Total Views/Downloads: 3818
Unique Statistics:

Full-Text HTML Views: 620
Abstract HTML Views: 787
PDF Downloads: 1020
Total Views/Downloads: 2427



Creative Commons License
© 2013 Tang et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Department of Biostatistics, St. Jude Children's Research Hospital, 262 Danny Thomas PI, MS 768, Memphis, TN 38105, USA; Tel: 901-595-6736; Fax: 901--595-8843; E-mail: hui.zhang@stjude.org


Abstract

Recent developments in Next-Generation Sequencing (NGS) technologies have opened doors for ultra high throughput sequencing mRNA (mRNA-seq) of the whole transcriptome. mRNA-seq has enabled researchers to comprehensively search for underlying biological determinants of diseases and ultimately discover novel preventive and therapeutic solutions. Unfortunately, given the complexity of mRNA-seq data, data generation has outgrown current analytical capacity, hindering the pace of research in this area. Thus, there is an urgent need to develop novel statistical methodology that addresses problems related to mRNA-seq data. This review addresses the common challenge of the presence of overdispersion in mRNA count data. We review current methods for modeling overdispersion, such as negative binomial, quasi-likelihood Poisson method, and the two-stage adaptive method; introduce related statistical theories; and discuss their applications to mRNA-seq count data.

Keywords: Count response, mRNA-seq, negative binomial theory, over-dispersion, Poisson, quasi-likelihood.