Statistical Resolution of Ambiguous HLA Typing Data

Open Access

29 February 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 4 (2) , e1000016
https://doi.org/10.1371/journal.pcbi.1000016

Abstract

High-resolution HLA typing plays a central role in many areas of immunology, such as in identifying immunogenetic risk factors for disease, in studying how the genomes of pathogens evolve in response to immune selection pressures, and also in vaccine design, where identification of HLA-restricted epitopes may be used to guide the selection of vaccine immunogens. Perhaps one of the most immediate applications is in direct medical decisions concerning the matching of stem cell transplant donors to unrelated recipients. However, high-resolution HLA typing is frequently unavailable due to its high cost or the inability to re-type historical data. In this paper, we introduce and evaluate a method for statistical, in silico refinement of ambiguous and/or low-resolution HLA data. Our method, which requires an independent, high-resolution training data set drawn from the same population as the data to be refined, uses linkage disequilibrium in HLA haplotypes as well as four-digit allele frequency data to probabilistically refine HLA typings. Central to our approach is the use of haplotype inference. We introduce new methodology to this area, improving upon the Expectation-Maximization (EM)-based approaches currently used within the HLA community. Our improvements are achieved by using a parsimonious parameterization for haplotype distributions and by smoothing the maximum likelihood (ML) solution. These improvements make it possible to scale the refinement to a larger number of alleles and loci in a more computationally efficient and stable manner. We also show how to augment our method in order to incorporate ethnicity information (as HLA allele distributions vary widely according to race/ethnicity as well as geographic area), and demonstrate the potential utility of this experimentally. A tool based on our approach is freely available for research purposes at http://microsoft.com/science. At the core of the human adaptive immune response is the train-to-kill mechanism in which specialized immune cells are sensitized to recognize small peptides from foreign sources (e.g., from HIV or bacteria). Following this sensitization, these immune cells are then activated to kill other cells which display this same peptide (and which contain this same foreign peptide). However, in order for sensitization and killing to occur, the foreign peptide must be “paired up” with one of the infected person's other specialized immune molecules—an HLA molecule. The way in which peptides interact with these HLA molecules defines if and how an immune response will be generated. There is a huge repertoire of such HLA molecules, with almost no two people having the same set. Furthermore, a person's HLA type can determine their susceptibility to disease, or the success of a transplant, for example. However, obtaining high quality HLA data for patients is often difficult because of the great cost and specialized laboratories required, or because the data are historical and cannot be retyped with modern methods. Therefore, we introduce a statistical model which can make use of existing high-quality HLA data, to infer higher-quality HLA data from lower-quality data.

Keywords

This publication has 40 references indexed in Scilit:

A Statistical Framework for Modeling HLA-Dependent T Cell Response Data
PLoS Computational Biology, 2007
Racial Categories in Medicine: A Failure of Evidence-Based Practice?
PLoS Medicine, 2007
Human leukocyte antigen–associated sequence polymorphisms in hepatitis C virus reveal reproducible immune responses and constraints on viral evolution†
Hepatology, 2007
Evidence of Differential HLA Class I-Mediated Viral Evolution in Functional and Accessory/Regulatory Genes of HIV-1
PLoS Pathogens, 2007
Common and Well-Documented HLA Alleles
Human Immunology, 2007
Improved Definition of Human Leukocyte Antigen Frequencies Among Minorities and Applicability to Estimates of Transplant Compatibility
Transplantation, 2007
Evidence of Viral Adaptation to HLA Class I-Restricted Immune Pressure in Chronic Hepatitis C Virus Infection
Journal of Virology, 2006
A Coalescence-Guided Hierarchical Bayesian Method for Haplotype Inference
American Journal of Human Genetics, 2006
HLA associated genetic predisposition to autoimmune diseases: Genes involved and possible mechanisms
Transplant Immunology, 2005
Inferred HLA Haplotype Information for Donors From Hematopoietic Stem Cells Donor Registries
Human Immunology, 2005