Protein-Protein Interaction Prediction using PCA and SVR-PHCS
Saeideh Mahmoudian, Abdulaziz Yousef, Nasrollah Moghadam Charkari*
Identifiers and Pagination:Year: 2015
First Page: 1
Last Page: 12
Publisher ID: TOBIOIJ-9-1
Article History:Received Date: 13/02/2014
Revision Received Date: 28/02/2014
Acceptance Date: 28/10/2014
Electronic publication date: 23/01/2015
Collection year: 2015
open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Protein-Protein Interactions (PPIs) play a key role in many biological systems. Thus, identifying PPIs is critical for understanding cellular processes. Many experimental techniques were applied to predict PPIs. The data extracted using these techniques are incomplete and noisy. In this regard, a number of computational methods include machine learning classification techniques have been developed to reduce the noise data and predict new PPIs.
Since, using regression methods to solve classification problems has good results in other applications. Therefore, in this paper, a regression view is applied to the PPI prediction classification problem, so a new approach is proposed using Principal Component Analysis (PCA) and Support Vector Regression (SVR) which has been improved by a new Parallel Hierarchical Cube Search (PHCS) method. Firstly, PCA algorithm is implemented to select an optimal subset of features which leads to reduce processing time and to lessen the effect of noise. Then, the PPIs would be predicted, by using SVR. To get a better performance of SVR, a new PHCS method has been applied to select the appropriate values of SVR parameters. The obtained classification accuracy of the proposed method is 74.505% on KUPS (The University of Kansas Proteomics Service) dataset which outperforms the other methods.