Pandit: a database of protein and associated nucleotide domains with inferred trees
Open Access
- 12 August 2003
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 19 (12) , 1556-1563
- https://doi.org/10.1093/bioinformatics/btg188
Abstract
Motivation: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. Results: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach. Availability: The Pandit database is available for browsing and download via its home page at http://www.ebi.ac.uk/goldman-srv/pandit/ Contact: simon@ebi.ac.ukKeywords
This publication has 6 references indexed in Scilit:
- QuickTree: building huge Neighbour-Joining trees of protein sequencesBioinformatics, 2002
- Structural similarity to link sequence space: New potential superfamilies and implications for structural genomicsProtein Science, 2002
- SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomesNucleic Acids Research, 2002
- HOMSTRAD: adding sequence information to structure-based alignments of homologous protein familiesBioinformatics, 2001
- NIFAS: visual analysis of domain evolution in proteinsBioinformatics, 2001
- PALI--a database of Phylogeny and ALIgnment of homologous protein structuresNucleic Acids Research, 2001