RESEARCH ARTICLE
Yeast Gene Function Prediction from Different Data Sources: An Empirical Comparison
Ying Liu*
Article Information
Identifiers and Pagination:
Year: 2011Volume: 5
First Page: 69
Last Page: 76
Publisher ID: TOBIOIJ-5-69
DOI: 10.2174/1875036201105010069
Article History:
Received Date: 07/12/2011Revision Received Date: 26/04/2011
Acceptance Date: 08/05/2011
Electronic publication date: 09/06/2011
Collection year: 2011
open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
Different data sources have been used to learn gene function. Whereas combining heterogeneous data sets to infer gene function has been widely studied, there is no empirical comparison to determine the relative effectiveness or usefulness of different types of data in terms of gene function prediction. In this paper, we report a comparative study of yeast gene function prediction using different data sources, namely microarray data, phylogenetic data, literature text data, and a combination of these three data sources. Our results showed that text data outperformed microarray data and phylogenetic data in gene function prediction (p<0.01) as measured by sensitivity, accuracy, and correlation coefficient. There was no significant difference between the results derived from microarray data and phylogenetic data (p>0.05). The combined data led to decreased prediction performance relative to text data. In addition, we showed that feature selection did not improve the prediction performance of support vector machines.