RESEARCH ARTICLE
Statistical Methods for Overdispersion in mRNA-Seq Count Data
Hui Zhang*, Stanley B. Pounds, Li Tang
Article Information
Identifiers and Pagination:
Year: 2013Volume: 7
Issue: Suppl-1, M3
First Page: 34
Last Page: 40
Publisher ID: TOBIOIJ-7-34
DOI: 10.2174/1875036201307010034
Article History:
Received Date: 06/08/2013Revision Received Date: 06/09/2013
Acceptance Date: 15/09/2013
Electronic publication date: 13/12/2013
Collection year: 2013
open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
Recent developments in Next-Generation Sequencing (NGS) technologies have opened doors for ultra high throughput sequencing mRNA (mRNA-seq) of the whole transcriptome. mRNA-seq has enabled researchers to comprehensively search for underlying biological determinants of diseases and ultimately discover novel preventive and therapeutic solutions. Unfortunately, given the complexity of mRNA-seq data, data generation has outgrown current analytical capacity, hindering the pace of research in this area. Thus, there is an urgent need to develop novel statistical methodology that addresses problems related to mRNA-seq data. This review addresses the common challenge of the presence of overdispersion in mRNA count data. We review current methods for modeling overdispersion, such as negative binomial, quasi-likelihood Poisson method, and the two-stage adaptive method; introduce related statistical theories; and discuss their applications to mRNA-seq count data.