InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Top Cited Papers
Open Access
- 5 November 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (Database) , D196-D203
- https://doi.org/10.1093/nar/gkp931
Abstract
The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.Keywords
This publication has 39 references indexed in Scilit:
- InParanoid 6: eukaryotic ortholog clusters with inparalogsNucleic Acids Research, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- The TIGR Rice Genome Annotation Resource: improvements and new featuresNucleic Acids Research, 2006
- VectorBase: a home for invertebrate vectors of human pathogensNucleic Acids Research, 2006
- PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathwaysNucleic Acids Research, 2006
- WormBase: new content and better accessNucleic Acids Research, 2006
- Sequence resources at the Candida Genome DatabaseNucleic Acids Research, 2006
- Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignmentNucleic Acids Research, 2006
- The genome of Cryptosporidium hominisNature, 2004
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001