RESEARCH ARTICLE


Yeast Gene Function Prediction from Different Data Sources: An Empirical Comparison



Ying Liu*
Department of Mathematics and Information Sciences, University of North Texas at Dallas, 7300 University Hills Blvd Dallas, TX 75241, USA


Article Metrics

CrossRef Citations:
0
Total Statistics:

Full-Text HTML Views: 466
Abstract HTML Views: 1199
PDF Downloads: 535
Total Views/Downloads: 2200
Unique Statistics:

Full-Text HTML Views: 322
Abstract HTML Views: 767
PDF Downloads: 386
Total Views/Downloads: 1475



Creative Commons License
© 2011 Ying Liu

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Department of Mathematics and Information Sciences, University of North Texas at Dallas, 7300 University Hills Blvd Dallas, TX 75241, USA; Tel: 972-338-1573; Fax: 972-338-1911; E-mail: ying.liu@unt.edu


Abstract

Different data sources have been used to learn gene function. Whereas combining heterogeneous data sets to infer gene function has been widely studied, there is no empirical comparison to determine the relative effectiveness or usefulness of different types of data in terms of gene function prediction. In this paper, we report a comparative study of yeast gene function prediction using different data sources, namely microarray data, phylogenetic data, literature text data, and a combination of these three data sources. Our results showed that text data outperformed microarray data and phylogenetic data in gene function prediction (p<0.01) as measured by sensitivity, accuracy, and correlation coefficient. There was no significant difference between the results derived from microarray data and phylogenetic data (p>0.05). The combined data led to decreased prediction performance relative to text data. In addition, we showed that feature selection did not improve the prediction performance of support vector machines.

Keywords: Gene Prediction, Microarray data, Phylogenetic Data, Literature Text Data, Comparative Study.