All published articles of this journal are available on ScienceDirect.
Yeast Gene Function Prediction from Different Data Sources: An Empirical Comparison
Abstract
Different data sources have been used to learn gene function. Whereas combining heterogeneous data sets to infer gene function has been widely studied, there is no empirical comparison to determine the relative effectiveness or usefulness of different types of data in terms of gene function prediction. In this paper, we report a comparative study of yeast gene function prediction using different data sources, namely microarray data, phylogenetic data, literature text data, and a combination of these three data sources. Our results showed that text data outperformed microarray data and phylogenetic data in gene function prediction (p<0.01) as measured by sensitivity, accuracy, and correlation coefficient. There was no significant difference between the results derived from microarray data and phylogenetic data (p>0.05). The combined data led to decreased prediction performance relative to text data. In addition, we showed that feature selection did not improve the prediction performance of support vector machines.