RESEARCH ARTICLE
ECGpred: Correlation and Prediction of Gene Expression from Nucleotide Sequence
Gajendra Pal Singh Raghava*, 1, 2, Da Jeong Hwang1, Joon Hee Han1
Article Information
Identifiers and Pagination:
Year: 2008Volume: 2
First Page: 64
Last Page: 71
Publisher ID: TOBIOIJ-2-64
DOI: 10.2174/1875036200802010064
Article History:
Received Date: 18/07/2008Revision Received Date: 31/07/2008
Acceptance Date: 11/09/2008
Electronic publication date: 09/10/2008
Collection year: 2008
open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
Development of gene expression prediction systems from huge amount of microarray data is an inevitable problem. In the present study a support vector machine (SVM) based method has been developed to predict expression of genes from its nucleotide sequence. In this method, SVM was trained on microarray data of genes and trained SVM was used to predict the expression of other genes of the same organism under the same condition. The SVM models were developed using nucleotide, dinucleotide, and trinucleotide composition of genes and achieved correlation coefficients (r) 0.25, 0.70, 0.82 respectively, between predicted and experimentally determined gene expression. Besides, trinucleotide composition, we also tried codon composition in each forward reading frame and achieved the correlation r = 0.86, 0.83 and 0.73 between the predicted and the actual expression using trinucleotide composition from the first, second and third frames respectively. The method was developed on 4807 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) and evaluated using 5-fold cross validation techniques. A web server ECGpred has been developed to allow users to understand the relationship between expression and various components of genes like coding/non-coding regions, transcription factor (http://www.imtech.res.in/raghava/ecgpred/).