PseudoPipe: an automated pseudogene identification pipeline
Open Access
- 30 March 2006
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (12) , 1437-1439
- https://doi.org/10.1093/bioinformatics/btl116
Abstract
Motivation: Mammalian genomes contain many ‘genomic fossils’ i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. Results: We have developed a homology-based computational pipeline (‘PseudoPipe’) that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential “parent” proteins against the intergenic regions of the genome and then processing the resulting “raw hits” -- i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts. Availability: The PseudoPipe program is implemented in Python and can be downloaded at Contact: Mark.Gerstein@yale.edu or zhaolei.zhang@utoronto.caKeywords
This publication has 8 references indexed in Scilit:
- Large-scale analysis of pseudogenes in the human genomeCurrent Opinion in Genetics & Development, 2004
- Comparative analysis of processed pseudogenes in the mouse and human genomesTrends in Genetics, 2004
- Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human GenomeGenome Research, 2003
- An expressed pseudogene regulates the messenger-RNA stability of its homologous coding geneNature, 2003
- Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human GenomeGenome Research, 2002
- Vertebrate pseudogenesFEBS Letters, 2000
- Comparison of DNA Sequences with Protein SequencesGenomics, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997