Statistical Resolution of Ambiguous HLA Typing Data
Open Access
- 29 February 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (2) , e1000016
- https://doi.org/10.1371/journal.pcbi.1000016
Abstract
High-resolution HLA typing plays a central role in many areas of immunology, such as in identifying immunogenetic risk factors for disease, in studying how the genomes of pathogens evolve in response to immune selection pressures, and also in vaccine design, where identification of HLA-restricted epitopes may be used to guide the selection of vaccine immunogens. Perhaps one of the most immediate applications is in direct medical decisions concerning the matching of stem cell transplant donors to unrelated recipients. However, high-resolution HLA typing is frequently unavailable due to its high cost or the inability to re-type historical data. In this paper, we introduce and evaluate a method for statistical, in silico refinement of ambiguous and/or low-resolution HLA data. Our method, which requires an independent, high-resolution training data set drawn from the same population as the data to be refined, uses linkage disequilibrium in HLA haplotypes as well as four-digit allele frequency data to probabilistically refine HLA typings. Central to our approach is the use of haplotype inference. We introduce new methodology to this area, improving upon the Expectation-Maximization (EM)-based approaches currently used within the HLA community. Our improvements are achieved by using a parsimonious parameterization for haplotype distributions and by smoothing the maximum likelihood (ML) solution. These improvements make it possible to scale the refinement to a larger number of alleles and loci in a more computationally efficient and stable manner. We also show how to augment our method in order to incorporate ethnicity information (as HLA allele distributions vary widely according to race/ethnicity as well as geographic area), and demonstrate the potential utility of this experimentally. A tool based on our approach is freely available for research purposes at http://microsoft.com/science. At the core of the human adaptive immune response is the train-to-kill mechanism in which specialized immune cells are sensitized to recognize small peptides from foreign sources (e.g., from HIV or bacteria). Following this sensitization, these immune cells are then activated to kill other cells which display this same peptide (and which contain this same foreign peptide). However, in order for sensitization and killing to occur, the foreign peptide must be “paired up” with one of the infected person's other specialized immune molecules—an HLA molecule. The way in which peptides interact with these HLA molecules defines if and how an immune response will be generated. There is a huge repertoire of such HLA molecules, with almost no two people having the same set. Furthermore, a person's HLA type can determine their susceptibility to disease, or the success of a transplant, for example. However, obtaining high quality HLA data for patients is often difficult because of the great cost and specialized laboratories required, or because the data are historical and cannot be retyped with modern methods. Therefore, we introduce a statistical model which can make use of existing high-quality HLA data, to infer higher-quality HLA data from lower-quality data.Keywords
This publication has 40 references indexed in Scilit:
- A Statistical Framework for Modeling HLA-Dependent T Cell Response DataPLoS Computational Biology, 2007
- Racial Categories in Medicine: A Failure of Evidence-Based Practice?PLoS Medicine, 2007
- Human leukocyte antigen–associated sequence polymorphisms in hepatitis C virus reveal reproducible immune responses and constraints on viral evolution†Hepatology, 2007
- Evidence of Differential HLA Class I-Mediated Viral Evolution in Functional and Accessory/Regulatory Genes of HIV-1PLoS Pathogens, 2007
- Common and Well-Documented HLA AllelesHuman Immunology, 2007
- Improved Definition of Human Leukocyte Antigen Frequencies Among Minorities and Applicability to Estimates of Transplant CompatibilityTransplantation, 2007
- Evidence of Viral Adaptation to HLA Class I-Restricted Immune Pressure in Chronic Hepatitis C Virus InfectionJournal of Virology, 2006
- A Coalescence-Guided Hierarchical Bayesian Method for Haplotype InferenceAmerican Journal of Human Genetics, 2006
- HLA associated genetic predisposition to autoimmune diseases: Genes involved and possible mechanismsTransplant Immunology, 2005
- Inferred HLA Haplotype Information for Donors From Hematopoietic Stem Cells Donor RegistriesHuman Immunology, 2005