RESEARCH ARTICLE


TULIP Software and Web Server: Automatic Classification of Protein Sequences Based on Pairwise Comparisons and Z-Value Statistics



Grando Delphine1, Ortet Philippe2, Joubert Fourie3, Maréchal Eric*, 1, Bastien Olivier*, 1
1 UMR 5168 CNRS-CEA-INRA-Université Joseph Fourier; Institut de Recherches en Technologies et Sciences pour le Vivant, CEA Grenoble, 17 rue des Martyrs, 38054, Grenoble Cedex 09, France
2 UMR 6191 CNRS-CEA-Université Aix-Marseille II, Institut de Biologie Environnementale et Biotechnologies, CEA Cadarache, 13108 Saint Paul-lez-Durance, France
3 Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, 0002, Pretoria, South Africa


Article Metrics

CrossRef Citations:
0
Total Statistics:

Full-Text HTML Views: 730
Abstract HTML Views: 1356
PDF Downloads: 565
Total Views/Downloads: 2651
Unique Statistics:

Full-Text HTML Views: 442
Abstract HTML Views: 788
PDF Downloads: 438
Total Views/Downloads: 1668



Creative Commons License
© 2009 Delphine et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the UMR 5168 CNRS-CEAINRA- Université Joseph Fourier; Institut de Recherches en Technologies et Sciences pour le Vivant, CEA Grenoble, 17 rue des Martyrs, 38054, Grenoble Cedex 09, France; E-mail: eric.marechal@cea.fr, olivier.bastien@cea.fr


Abstract

A configuration space of homologous protein sequences (or CSHP) has been recently constructed based on pairwise comparisons, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) and following evolutionary assumptions. A Z-value cut-off is applied so as proteins are placed in the CSHP only when the similarity of pairs of sequences is significant following the Theorem of the Upper Limit of a score Probability (TULIP theorem). Based on the positions of similar protein sequences in the CSHP, a classification can be deduced, which can be visualized as trees, called TULIP trees. In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees. To date, no tool has been made available to allow the computation of TULIP trees following this model. The availability of methods to cluster proteins based on pairwise comparisons and following evolutionary assumptions should be useful for evaluation and for the future improvements they might inspire. We developed a web server allowing the local or online computation of TULIP trees based on the CSHP probabilities. The input is a set of homologous protein sequences in multi-FASTA format. Pairwise comparisons are conducted using the Smith-Waterman method, with 100-1,000 sequence shuffling to estimate pairwise Z-values. Obtained Z-value matrix is used to infer a tree which is then written to a file. Output consists therefore of a Z-value matrix, a distance matrix, a TULIP treefile in NEWICK format, and a TULIP tree visualisation. The TULIP server provides an easy-to-use interface to the TULIP software, and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions. TULIP trees are consistent with phylogenies in numerous cases, but they can be inconsistent for multi-domain proteins in which some domains have been conserved in all branches. Thus TULIP trees cannot be considered as conventional phylogenetic trees, following the MIAPA (Minimum Information About a Phylogenetic Analysis) recommendations. A major strength of the TULIP classification is its statistical validity when analysing samples including compositionally unbiased and biased sequences (i.e. with biased amino acid distributions), like sequences from Plasmodium falciparum. The TULIP web server is a service of the Malaria Portal of the University of Pretoria, South Africa, and is available at http://malport.bi.up.ac.za/TULIP/.