Generalizing Logistic Regression by Nonparametric Mixing

Abstract
Logistic regression is a common technique for analyzing the effect of a covariate vector x on the number of successes y in m trials when y has a binomial distribution. But at times either the logistic curve does not describe the probability of success p(x) adequately, or m is larger than 1 and y is more variable than the binomial distribution allows. Overdispersion relative to the binomial distribution is possible if the m trials in a set or “litter” are positively correlated, an important covariate is omitted, or x is measured with error. A simple way to accommodate departures from the logit link and overdispersion is to introduce a random intercept α and thus permit a random propensity toward success. When α varies between individual binary trials according to a discrete or multimodal distribution, p(x) has smooth steps and y has a binomial(m, p(x)) distribution. When the random α is constant for a set of m binary trials and varies between sets of m trials according to a discrete or multimodal distribution, p(x) has smooth steps and y is overdispersed relative to the binomial distribution. In this article the distribution of α is left unspecified and estimated by nonparametric maximum likelihood. The estimated distribution of α is discrete, so the distribution of y and all of its properties are easily estimated for any x. Two examples are considered. In the first, the logit link is inadequate, but y appears to be binomial. Hence α is allowed to vary between binary trials. In the second, y's (with m > 1) from the same design point are more dispersed than the binomial distribution would predict, and there are outliers. Allowing α to vary randomly between sets of m trials accounts for the overdispersion and seems to temper the influence of outliers on the estimated probability of success.

This publication has 0 references indexed in Scilit: