Yeast Gene Function Prediction from Different Data Sources: An Empirical Comparison

Ying Liu, * Open Modal Authors Info & Affiliations
The Open Bioinformatics Journal 09 June 2011 RESEARCH ARTICLE DOI: 10.2174/1875036201105010069


Different data sources have been used to learn gene function. Whereas combining heterogeneous data sets to infer gene function has been widely studied, there is no empirical comparison to determine the relative effectiveness or usefulness of different types of data in terms of gene function prediction. In this paper, we report a comparative study of yeast gene function prediction using different data sources, namely microarray data, phylogenetic data, literature text data, and a combination of these three data sources. Our results showed that text data outperformed microarray data and phylogenetic data in gene function prediction (p<0.01) as measured by sensitivity, accuracy, and correlation coefficient. There was no significant difference between the results derived from microarray data and phylogenetic data (p>0.05). The combined data led to decreased prediction performance relative to text data. In addition, we showed that feature selection did not improve the prediction performance of support vector machines.

Keywords: Gene Prediction, Microarray data, Phylogenetic Data, Literature Text Data, Comparative Study.
Fulltext HTML PDF