p Value Adjustments for Multiple Tests in Multivariate Binomial Models

1 September 1989

journal article
research article
Published by Taylor & Francis in Journal of the American Statistical Association

Vol. 84 (407) , 780-786
https://doi.org/10.1080/01621459.1989.10478837

Abstract

Data from rodent carcinogenicity (preclinical) and clinical studies involving new drugs may be modeled as having come from multivariate binomial distributions. In two-year rodent carcinogenicity studies, there are typically 20–50 tissues examined for occurrence of any of several possible lesions. For a particular treatment group, the number of occurrences of a particular lesion at a particular tissue may be modeled as binomial, and the vector of such frequencies may be considered multivariate binomial with unspecified dependence structure. The same model may also apply to clinical side-effects data; in this case the marginal frequencies may represent occurrences of events ranging from headaches to ingrown toenails. Frequently, the goal of such studies is to isolate site-specific significant differences between treatment and control groups. For example, in rodent carcinogenicity analyses it is generally not sufficient to claim that a new compound causes an increase in tumors at some unspecified site; rather, the report should identify the particular sites where unusual increases are noted. Such an analysis requires separate tests for each site. False significances may easily occur when multiple tests are performed. When a marginal significance criterion p ≤ .05 is used, experimentwise false significance rates as large as 44% have been reported (Haseman, Winbush, and O'Donnell 1986). Others have reported the experimentwise false significance rate much lower; for example, Gart, Chu, and Tarone (1979) reported 8%–10% for each sex and species combination of a two-sex, two-species experiment. In this article it is proposed that the experimentwise false significance rate be controlled by adjusting all p values for the multiplicity of testing using vector-based resampling methods. This analysis is an extension of the bootstrap method described by Westfall (1985) to the multisample case, with particular application to models useful in clinical and preclinical biopharmaceutical analyses; it is also similar to the methodology proposed by Brown and Fears (1981). Assuming no differences between treatment and control groups (the null case), one may estimate the multivariate binomial distribution or permutation distribution conveniently via vector resampling. Using this estimated distribution, one may easily estimate (via Monte Carlo) the probability that the smallest p value in the study is smaller than any given threshold. An adjusted p value is then defined as the probability that the smallest p value in the study is less than or equal to the observed p value for the given test. This methodology is compared to the usual Bonferronistyle adjustments, and it is demonstrated that these adjustments are grossly conservative in certain instances because of their failure to account for dependence between tests and the discreteness of the data. Results of bootstrap and permutation resampling adjustments tend to be similar, particularly for large sample sizes. The approaches are philosophically different: Bootstrap resampling is preferable if an unconditional analysis is desired [Upton (1982) demonstrated that nominal and actual Type I errors are closer and that statistical power is greater in the univariate two-sample case] whereas permutation resampling gives essentially exact results and is preferable if a conditional analysis is desired [Yates (1984) gave philosophical arguments for favoring the conditional approach].

Keywords

This publication has 18 references indexed in Scilit:

A New Probability Model for Determining Exact P-Values for 2 x 2 Contingency Tables When Comparing Binomial Proportions
Biometrics, 1988
Adjusting for Multiplicity of Statistical Tests in the Analysis of Carcinogenicity Studies
Biometrical Journal, 1988
An Improved Sequentially Rejective Bonferroni Test Procedure
Published by JSTOR ,1987
Use of dual control groups to estimate false positive rates in laboratory animal carcinogenicity studies
Fundamental and Applied Toxicology, 1986
A reexamination of false-positive rates for carcinogenesis studies
Fundamental and Applied Toxicology, 1983
Some Asymptotic Theory for the Bootstrap
The Annals of Statistics, 1981
A Biometrics Invited Paper. Assessing Laboratory Evidence for Neoplastic Activity
Published by JSTOR ,1980
Rectangular Confidence Regions for the Means of Multivariate Normal Distributions
Journal of the American Statistical Association, 1967
Tests for Linear Trends in Proportions and Frequencies
Published by JSTOR ,1955
Some Methods for Strengthening the Common χ 2 Tests
Published by JSTOR ,1954