REVIEW ARTICLE


The Development and Progress in Machine Learning for Protein Subcellular Localization Prediction



Le He1, Xiyu Liu2, *
1 Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
2 Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA


Article Metrics

CrossRef Citations:
1
Total Statistics:

Full-Text HTML Views: 1160
Abstract HTML Views: 836
PDF Downloads: 470
ePub Downloads: 325
Total Views/Downloads: 2791
Unique Statistics:

Full-Text HTML Views: 722
Abstract HTML Views: 320
PDF Downloads: 360
ePub Downloads: 249
Total Views/Downloads: 1651



Creative Commons License
© 2022 He and Liu.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; E-mail: xiyuliu@usc.edu


Abstract

Protein subcellular localization is a novel and promising area and is defined as searching for the specific location of proteins inside the cell, such as in the nucleus, in the cytoplasm or on the cell membrane. With the rapid development of next-generation sequencing technology, more and more new protein sequences have been continuously discovered. It is no longer sufficient to merely use traditional wet experimental methods to predict the subcellular localization of these new proteins. Therefore, it is urgent to develop high-throughput computational methods to achieve quick and precise protein subcellular localization predictions. This review summarizes the development of prediction methods for protein subcellular localization over the past decades, expounds on the application of various machine learning methods in this field, and compares the properties and performance of various well-known predictors. The narrative of this review mainly revolves around three main types of methods, namely, the sequence-based methods, the knowledge-based methods, and the fusion methods. A special focus is on the gene ontology (GO)-based methods and the PLoc series methods. Finally, this review looks forward to the future development directions of protein subcellular localization prediction.

Keywords: Protein subcellular localization, Machine learning, Gene ontology, Deep learning, mGOASVM, PLoc-Deep-mHum.