Transition Priors for Protein Hidden Markov Models: An Empirical Study towards Maximum Discrimination
- 1 January 2004
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 11 (1) , 181-193
- https://doi.org/10.1089/106652704773416957
Abstract
Insertions and deletions in a profile hidden Markov model (HMM) are modeled by transition probabilities between insert, delete and match states. These are estimated by combining observed data and prior probabilities. The transition prior probabilities can be defined either ad hoc or by maximum likelihood (ML) estimation. We show that the choice of transition prior greatly affects the HMM's ability to discriminate between true and false hits. HMM discrimination was measured using the HMMER 2.2 package applied to 373 families from Pfam. We measured the discrimination between true members and noise sequences employing various ML transition priors and also systematically scanned the parameter space of ad hoc transition priors. Our results indicate that ML priors produce far from optimal discrimination, and we present an empirically derived prior that considerably decreases the number of misclassifications compared to ML. Most of the difference stems from the probabilities for exiting a delete state. The ML prior, which is unaware of noise sequences, estimates a delete-to-delete probability that is relatively high and does not penalize noise sequences enough for optimal discrimination.Keywords
This publication has 15 references indexed in Scilit:
- Position-based sequence weightsPublished by Elsevier ,2004
- The Pfam Protein Families DatabaseNucleic Acids Research, 2002
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- TIGRFAMs: a protein family resource for the functional identification of proteinsNucleic Acids Research, 2001
- Weighting hidden Markov models for maximum discrimination.Bioinformatics, 1998
- Hidden Markov models for detecting remote protein homologies.Bioinformatics, 1998
- Profile hidden Markov models.Bioinformatics, 1998
- Maximum Discrimination Hidden Markov Models of Sequence ConsensusJournal of Computational Biology, 1995
- Volume changes in protein evolutionJournal of Molecular Biology, 1994
- Hidden Markov Models in Computational BiologyJournal of Molecular Biology, 1994