Mining the Proteome of Haemophilus ducreyi for Identification of Potential Drug Targets

Chancroid is an extremely infectious sexually transmitted disease (STD) caused by the bacterium Haemophilus ducreyi, prevalent in Africa, United States and in some parts of South Asia. Chancroid has been recognized as a cofactor for human immunodeficiency virus (HIV) transmission. So, there is a requirement to develop an efficient drug to combat chancroid, which can also diminish the HIV prevalence in those populations where chancroid is a prime source for HIV infection. The availability of the complete proteome information of H. ducreyi help enabled in silico analysis for identification of potential vaccine candidates and drug targets. Our study revealed 1226 proteins in H. ducreyi to be nonhomologous with human proteome. Screening these proteins using the Database of Essential Genes (DEG) resulted in the identification of 451 essential proteins. Analysis of the identified essential proteins, using the KEGG Automated Annotation Server (KAAS), revealed 40 proteins of H. ducreyi as potential drug targets as they are involved in pathogen specific metabolic pathways. Subcellular localization prediction of these 451 essential proteins revealed that 11 proteins lie on the outer membrane of the pathogen which could be potential vaccine candidates. Functional family prediction for the 50 putative uncharacterized essential proteins of H. ducreyi by SVM-Prot web server revealed that out of 50, 3 proteins as transmembrane proteins, which may be potential drug targets. Identification of potential inhibitors against these targets through virtual screening may consequence in detection of novel lead compounds effective against H. ducreyi to combat


INTRODUCTION
Chancroid or soft chancre (ulcus molle), is a sexually transmitted disease, caused by facultative anaerobic gramnegative coccobacillus H. ducreyi.This disease is common in many tropical and subtropical countries of Africa [1][2][3][4] and Southeast Asia [5][6][7] and has been associated with isolated outbreaks of genital ulcer disease in both North America [8] and Europe [9].The disease is characterized by painful ulcers of the genitalia which may often be found in association with painful regional lymphadenopathy.The significance of the disease was recently enhanced with the finding that chancroid may be an important cofactor in the heterosexual transmission of human immunodeficiency virus (HIV) [10].H. ducreyi facilitates HIV transmission by providing an accessible portal entry, promoting viral shedding, recruiting macrophages and CD4 cells to the skin [11][12][13][14][15].The detection and treatment of this disease is urgently needed as it causes the risk of transmission of HIV causing AIDS, which is one of the life threatening diseases.
Centers for Disease Control and Prevention (CDC) recommend azithromycin, ceftriaxone and erythromycin for the treatment of chancroid [16].However, single-dose treatment by these drugs was found not to be successful in more than 20% of chancroid therapies [17][18][19].Furthermore de novo mutations and horizontal gene transfer mechanism in H. ducreyi may lead to development of drug resistance genes [17,20].So, there is a budding need for the identification of novel therapeutic drug targets and drugs against H. ducreyi to combat chancroid.
Large amount of genomic and proteomic data of different organisms are available in public domain and various ongoing genomic projects are escalating the genomic data at brisk pace.The completion of the human genome project has revolutionized the field of drug-discovery against life threatening human pathogens [21].Till date the complete genome sequence of more than one thousand microbes has been determined and another two thousand microbial genome projects are currently in progress [22].The genomic and proteomic sequence information of pathogens can provide an aid in detection and characterization of the novel therapeutic targets and vaccine candidates.
The subtractive genomic [23] analysis is a widely used in silico method that predicts highly conserved genes, which are essential for pathogenic microorganisms and are involved in their replication and survival.These genes are also important components of various metabolic pathways and mechanisms occurring in the pathogen but do not have any similarity with the host genes.Inhibiting these essential genes which are crucial to sustain cellular life will be fatal for the pathogenic microorganism.This approach significantly narrows down the time required for detection and characterization of potential drug targets.
Genome subtraction approach has been effectively employed to detect novel drug targets in Pseudomonas aeruginosa [23].The current study on H. ducreyi is based on proteome subtraction approach.Differential pathway analysis, subcellular localization prediction and functional family classification of putative uncharacterized essential proteins was done to analyze the complete proteome of H. ducreyi to find out potential vaccine candidates and therapeutic drug targets for the prevention of chancroid.

Retrieval of Proteomes of Host and Pathogen
The complete proteome of H. ducreyi was retrieved from SwissProt [24] and the complete Homo sapiens proteome was downloaded from NCBI [25].The prokaryotic essential proteins were retrieved from the Database of Essential Genes (DEG) [26] (http://tubic.tju.edu.cn/deg) using in-house developed PERL script.

Identification of Essential Proteins in H. Ducreyi
The complete proteome of H. ducreyi was purged at 60% using CD-HIT [27] to identify the paralogs or duplicate proteins within the proteome of H. ducreyi.The paralogs were excluded and the remaining set of protein was subjected to BlastP against complete proteome of Homo sapiens with the expectation value (E-value) cut-off of 10 -4 .The resultant dataset obtained had no significant similarity with proteome of Homo sapiens.BlastP analysis was performed for the nonhomologous protein sequences of H. ducreyi against DEG with E-value cut-off score of 10 -10 .Minimum bit-score cutoff of 100 was used to filter out only essential proteins, nonhomologous to H. sapiens, from H. ducreyi proteome.An inhouse developed PERL script was used to parse the BlastP result.

Metabolic Pathway Analysis
Metabolic pathway analysis of the essential proteins of H. ducreyi was done by KEGG Automatic Annotation Server (KAAS) [28].The metabolic pathways of Homo sapiens and H. ducreyi were compared to map out essential proteins involved in pathogen specific metabolic pathways only.

Subcellular Localization Prediction
Sub-cellular localization analysis of the essential protein sequences of H. ducreyi has been done by Proteome Analyst Specialized Subcellular Localization Server v2.5 (PA-SUB) [29] to identify the surface and membrane associated proteins which could be probable vaccine candidates.

Protein Functional Family Classification
Functional family classification of the putative uncharacterized essential proteins was done by using the SVMProt web server (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi) [30].SVMProt utilizes Support Vector Machine (SVM) for classification of a protein into functional family using primary sequence information.

RESULTS & DISCUSSION
The results obtained through subtractive proteomic approach, metabolic pathway analysis and sub cellular localization are listed in Table 1.The purpose of this analysis was to detect the essential proteins of H. ducreyi.The filtered essential proteins of pathogen, non homologous to human, would represent potential drug targets.
The complete proteome set of H. ducreyi contains 1694 proteins.1660 proteins were retrieved as non-redundant by CD-HIT program at 60% threshold.451 proteins were identified to be essential for the pathogen.These 451 proteins were then further analyzed and their sub-cellular localization was predicted using PA-SUB server to locate the outer membrane proteins which could be probable vaccine candidates.11 outer membrane proteins were identified, which are listed in the supplementary Table 1.Membrane proteins have vital role in cellular communications, signal transduction, transport of ions, metabolites and other molecules.These represent a vast number of potential therapeutic drug targets because of their involvement in major biological processes in cell.
Protein functional family prediction provides important information regarding structure, activity and metabolic roles [31].The essential proteins identified in the pathogen com- prises of 50 putative uncharacterized proteins.Protein family classification allows probable function assignment for the uncharacterized protein.These 50 putative uncharacterized proteins were characterized and classified by SVM-Prot web server.Among the 50 proteins, 1 protein is classified as DNA repair protein, 5 as DNA binding proteins, 6 as transferases, 1 as ATP binding protein, 8 as lipid binding proteins, 3 as transmembrane proteins, 2 as hydrolases, 1 as zinc binder protein, 2 proteins with oxidoreductive property, 1 as metal binder and 1 as electrochemical potential-driven transporter (Supplementary Table 2).These transmembrane and transporter proteins predicted by SVM-Prot web server [31] may be considered as effective drug targets.
Information about disease specific target proteins and metabolites involved in different metabolic pathways provides detail insight to discover novel therapeutic targets and help understanding the interaction with other molecules in performing specific tasks [32].Metabolic pathway analysis of the essential proteins of H. ducreyi was done by KEGG Automatic Annotation Server (KASS).The comparative analysis of metabolic pathways of the host and the pathogen revealed 11 pathways that were found to be distinctive in H. ducreyi.40 proteins found to be present exclusively in the pathogen specific metabolic pathways.The results are summarized in Supplementary Table 3.
Fructose 1, 6-bisphosphate (FBP) aldolase (E.C. 4.1.2.13) plays an important role in catalyzing the reactions involved in calvin cycle, glycolysis and gluconeogenesis.Based on the reaction mechanism, the FBP aldolases are divided in two groups: Class I aldolases forms a schiff-base intermediate with the substrate and Class II aldolases instead use a divalent metal to stabilize the carbanion intermediate.The plants and animals lack Class II aldolases, therefore they can be used as feasible drug targets [32].FBP aldolases of H. ducreyi are found to be involved in many enzymatic pathways, viz.glycolysis, gluconeogenesis, pentose phosphate pathway, fructose and mannose metabolism.Molecular modeling of the derivatives in the enzymes' active site was also used to design a new generation of inhibitors [33].
Acetyl-coenzyme A Carboxylases (ACCs) are responsible for fatty acid metabolism in most living organisms.So, they are proved as possible targets for drug discovery against a variety of human diseases, including diabetes, obesity, cancer and microbial infections [34].It can also be considered as a potential therapeutic target against chancroid disease as it is found to be essential in H. ducreyi but not in human.
The Phosphoenolpyruvate Carbohydrate Phosphotransferase system (PTS) is confined exclusively in bacteria, and plays a major role in catalyzing the transport and phosphorylation of numerous monosaccharides, disaccharides, amino sugars, polyols and other sugar derivatives [35].The H. ducreyi PTS contains five proteins, in which glucose-specific PTS system component and mannose-specific phosphotransferase IIAB component shows activity in multiple metabolic pathways like glycolysis, gluconeogenesis, starch and sucrose metabolism, amino sugar and nucleotide sugar metabolism and fructose and mannose metabolism.Therefore these two proteins can be effectively analyzed and can also be considered as potential drug targets.
Beginning from in silico identification of putative drug targets to their validation as potential drug targets is an intricate challenge [36].Several promising technologies for identification and validation of protein targets include twodimensional gel electrophoresis, affinity purification, mass spectrometry, protein arrays, isotope-encoding, two-hybrid systems, information technology and activity-based assays [37].These proteomic techniques can be applied to drug discovery pipeline for identification and validation of potential drug targets, drug-protein interaction and drug specificity.

CONCLUSION
The large scale genome sequencing projects have increased the availability of completely sequenced genomic and proteomic data in public domain.Screening and analysis of these large biological sequence data provide new opportunities to understand and combat both infectious and genetic diseases in humans.There is a budding need for new drugs and vaccines to treat and prevent emerging and neglected infectious diseases.Subtractive genomics is a powerful tool for exploring new therapeutic targets.The current study based on subtractive proteomics approach helped in the identification and characterization of the potential essential proteins that could be targets for efficient drug designing against H. ducreyi to combat chancroid.Furthermore, molecular modeling of the drug targets will decipher the best possible active sites that can be targeted by simulations for drug design.Screening these potential targets against drug bank might be useful in the discovery of potential therapeutic compounds against H. ducreyi.

Table 1 . Summary of Subtractive Proteomic and Metabolic pathway Analysis of H. ducreyi Proteins Summary of Analysis Result
Pathways unique to H. ducreyi 11Outer membrane essential proteins of H. ducreyi