TULIP Software and Web Server: Automatic Classification of Protein Sequences Based on Pairwise Comparisons and Z-Value Statistics

Delphine, Grando; Philippe, Ortet; Fourie, Joubert; Eric, Maréchal; Olivier, Bastien

RESEARCH ARTICLE

TULIP Software and Web Server: Automatic Classification of Protein Sequences Based on Pairwise Comparisons and Z-Value Statistics

Grando Delphine¹ Ortet Philippe² Joubert Fourie³ Maréchal Eric^{, *} ^{, 1} Bastien Olivier^{, *} ^{, 1}
Authors Info & Affiliations

The Open Bioinformatics Journal • 30 Jun 2009 • RESEARCH ARTICLE • DOI: 10.2174/1875036200903010018

A configuration space of homologous protein sequences (or CSHP) has been recently constructed based on pairwise comparisons, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) and following evolutionary assumptions. A Z-value cut-off is applied so as proteins are placed in the CSHP only when the similarity of pairs of sequences is significant following the Theorem of the Upper Limit of a score Probability (TULIP theorem). Based on the positions of similar protein sequences in the CSHP, a classification can be deduced, which can be visualized as trees, called TULIP trees. In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees. To date, no tool has been made available to allow the computation of TULIP trees following this model. The availability of methods to cluster proteins based on pairwise comparisons and following evolutionary assumptions should be useful for evaluation and for the future improvements they might inspire. We developed a web server allowing the local or online computation of TULIP trees based on the CSHP probabilities. The input is a set of homologous protein sequences in multi-FASTA format. Pairwise comparisons are conducted using the Smith-Waterman method, with 100-1,000 sequence shuffling to estimate pairwise Z-values. Obtained Z-value matrix is used to infer a tree which is then written to a file. Output consists therefore of a Z-value matrix, a distance matrix, a TULIP treefile in NEWICK format, and a TULIP tree visualisation. The TULIP server provides an easy-to-use interface to the TULIP software, and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions. TULIP trees are consistent with phylogenies in numerous cases, but they can be inconsistent for multi-domain proteins in which some domains have been conserved in all branches. Thus TULIP trees cannot be considered as conventional phylogenetic trees, following the MIAPA (Minimum Information About a Phylogenetic Analysis) recommendations. A major strength of the TULIP classification is its statistical validity when analysing samples including compositionally unbiased and biased sequences (i.e. with biased amino acid distributions), like sequences from Plasmodium falciparum. The TULIP web server is a service of the Malaria Portal of the University of Pretoria, South Africa, and is available at http://malport.bi.up.ac.za/TULIP/.

TULIP Software and Web Server: Automatic Classification of Protein Sequences Based on Pairwise Comparisons and Z-Value Statistics

Abstract

Follow Us

Authors & Information

Authors

Affiliations

Information

Published In

Article Information

Cite As

Article History

Copyright

Download

Download1

Download

Citations & Metrics

Citations

Cite As

Export Citation

Dimensions Statistics

Metrics

Article Usage (Last 30 Days)

Article Usage (Demographic)

Copyright & License

Copyright And License

© 2009 Delphine et al.

Media

Figures

Tables

Abstract

Authors

Affiliations

Information

Published In

Article Information

Cite As

Article History

Copyright

Download1

Download

Citations

Cite As

Export Citation

Dimensions Statistics

Metrics

Article Usage (Last 30 Days)

Article Usage (Demographic)

Copyright And License

© 2009 Delphine et al.

Figures

Share

Share article link

Share on social media