A Cloud Computing System to Quickly Implement New Microarray Data Pre-processing Methods

Dajie Luo1, #, Prithish Banerjee1, #, E. James Harner1, James A. Mobley2, Dongquan Chen3, 4, *
1 Department of Statistics, West Virginia University, Morgantown, WV 26505, USA
2 Department of Surgery, University of Alabama at Birmingham (UAB), USA
3 Biostatistics and Bioinformatics Shared Facility, Comprehensive Cancer Center and
4 Division of Preventive Medicine, UAB. Birmingham, AL 35294, USA

© 2012 Luo et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Division of Preventive Medi-cine University of Alabama at Birmingham, Birmingham, AL 35294, USA; Tel: (205) 975-7131; Fax: (205) 934-4262; E-mail:
# The authors contribute equally to the work.



Pre-processing, including normalization of raw microarray data is crucial to microarray-related data analysis. It takes time and effort to build newly-developed algorithms into commercial software or locally developed systems. While most new algorithms emerge in the form of sharable R packages, it can be difficult for many biologists to apply them as soon as they are available. Currently, we rely on statisticians and experienced programmers to develop and implement code to access those R packages. Therefore, we need a robust procedure to quickly implement pre-processing methods as they appear. The newly emerging cloud computing concept has directed us toward a new way for providing an easily accessible service to the biologists without requiring them to have any programming knowledge in R.


Based on our earlier Java-based software tool JavaStat, we developed an internet based application prototype to upload data and carry out pre-processing applications that include normalization, statistical analyses and plots. More im-portantly, R packages, e. g., for newly-developed normalization methods, and GC-robust multichip algorithm (RMA) for exon arrays, can be easily incorporated into the system with limited inputs from a biologist or a programmer. The data are stored in the cloud and the R code runs on server.


The newly emerged cloud computing concept provides us a new way to provide an easily accessible and up-to-date service to biologists, as evidenced by our JavaStat system to incorporate new pre-processing package as they ap-pear. Users can access the application with a newly incorporated module through the Web. We expect this and other simi-lar systems greatly decrease turn-around time, improve accessibility of newly developed R model for pre-processing algo-rithms.

Keywords: Microarray, normalization, software, Java-based.