RESEARCH ARTICLE


ECGpred: Correlation and Prediction of Gene Expression from Nucleotide Sequence



Gajendra Pal Singh Raghava*, 1, 2, Da Jeong Hwang1, Joon Hee Han1
1 Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyo-Ja Dong, Pohang 790-784, Republic of Korea
2 Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh-160036, India


Article Metrics

CrossRef Citations:
1
Total Statistics:

Full-Text HTML Views: 708
Abstract HTML Views: 1066
PDF Downloads: 573
Total Views/Downloads: 2347
Unique Statistics:

Full-Text HTML Views: 483
Abstract HTML Views: 727
PDF Downloads: 413
Total Views/Downloads: 1623



Creative Commons License
© 2008 Singh Raghava et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Bioinformatics Centre Institute of Microbial Technology, Sector 39A, Chandigarh, India; Tel: +91-172- 2690557; Fax: +91-172-2690632; E-mail: raghava@imtech.res.in


Abstract

Development of gene expression prediction systems from huge amount of microarray data is an inevitable problem. In the present study a support vector machine (SVM) based method has been developed to predict expression of genes from its nucleotide sequence. In this method, SVM was trained on microarray data of genes and trained SVM was used to predict the expression of other genes of the same organism under the same condition. The SVM models were developed using nucleotide, dinucleotide, and trinucleotide composition of genes and achieved correlation coefficients (r) 0.25, 0.70, 0.82 respectively, between predicted and experimentally determined gene expression. Besides, trinucleotide composition, we also tried codon composition in each forward reading frame and achieved the correlation r = 0.86, 0.83 and 0.73 between the predicted and the actual expression using trinucleotide composition from the first, second and third frames respectively. The method was developed on 4807 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) and evaluated using 5-fold cross validation techniques. A web server ECGpred has been developed to allow users to understand the relationship between expression and various components of genes like coding/non-coding regions, transcription factor (http://www.imtech.res.in/raghava/ecgpred/).

Keywords: Gene expression, Correlation, Nucleotide, Dinucleotide, Trinucleotide, Codon composition, Saccharomyces cerevisiae,, Prediction, Microarray data, Support vector machine.