The Development and Progress in Machine Learning for Protein Subcellular Localization Prediction

Le He1, Xiyu Liu2, *
1 Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
2 Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

Article Metrics

CrossRef Citations:
Total Statistics:

Full-Text HTML Views: 1254
Abstract HTML Views: 878
PDF Downloads: 495
ePub Downloads: 337
Total Views/Downloads: 2964
Unique Statistics:

Full-Text HTML Views: 768
Abstract HTML Views: 340
PDF Downloads: 381
ePub Downloads: 259
Total Views/Downloads: 1748

Creative Commons License
© 2022 He and Liu.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; E-mail:


Protein subcellular localization is a novel and promising area and is defined as searching for the specific location of proteins inside the cell, such as in the nucleus, in the cytoplasm or on the cell membrane. With the rapid development of next-generation sequencing technology, more and more new protein sequences have been continuously discovered. It is no longer sufficient to merely use traditional wet experimental methods to predict the subcellular localization of these new proteins. Therefore, it is urgent to develop high-throughput computational methods to achieve quick and precise protein subcellular localization predictions. This review summarizes the development of prediction methods for protein subcellular localization over the past decades, expounds on the application of various machine learning methods in this field, and compares the properties and performance of various well-known predictors. The narrative of this review mainly revolves around three main types of methods, namely, the sequence-based methods, the knowledge-based methods, and the fusion methods. A special focus is on the gene ontology (GO)-based methods and the PLoc series methods. Finally, this review looks forward to the future development directions of protein subcellular localization prediction.

Keywords: Protein subcellular localization, Machine learning, Gene ontology, Deep learning, mGOASVM, PLoc-Deep-mHum.