Hauck-Donner phenomenon (was Re: Summary of Robust Regression Algorithms)

Prof Brian Ripley (ripley@stats.ox.ac.uk)
Wed, 7 Jan 1998 14:40:24 GMT

> From <@uconnvm.uconn.edu:kent@darwin.eeb.uconn.edu> Wed Jan 7 12:51 GMT 1998
> To: ripley@stats.ox.ac.uk (Prof Brian Ripley)
> Cc: s-news@utstat.toronto.edu
> Subject: Re: Summary of Robust Regression Algorithms
> From: kent@darwin.eeb.uconn.edu (Kent E. Holsinger)
> >>>>> "Brian" == Prof Brian Ripley <ripley@stats.ox.ac.uk> writes:
> Brian> My best example of this not knowing the literature is the
> Brian> Hauck-Donner (1977) phenomenon: a small t-value in a
> Brian> logistic regression indicates either an insignificant OR a
> Brian> very significant effect, but step.glm assumes the first,
> Brian> and I bet few users of glm() stop to think.
> All right I confess. This is a new one for me. Could some one explain
> the Hauck-Donner effect to me? I understand that the t-values from
> glm() are a Wald approximation and may not be terribly reliable, but I
> don't understand how a small t-value could indicate "either an
> insignificant OR a very significant effect."
> Thanks for the help. It's finding gems like these that make this group
> so extraordinarily valuable.

There is a description in V&R2, pp. 237-8., given below. I guess I was
teasing people to look up Hauck-Donner phenomenon in our index.
(I seem to remember this was new to my co-author too, so you were in
good company. This is why it is such a good example of a fact which
would be useful to know but hardly anyone does. Don't ask me how I
knew: I only know that I first saw this in about 1980.)

There is a little-known phenomenon for binomial GLMs that was pointed
out by Hauck & Donner (1977: JASA 72:851-3). The standard errors and
t values derive from the Wald approximation to the log-likelihood,
obtained by expanding the log-likelihood in a second-order Taylor
expansion at the maximum likelihood estimates. If there are some
\hat\beta_i which are large, the curvature of the log-likelihood at
\hat{\vec{\beta}} can be much less than near \beta_i = 0, and so the
Wald approximation underestimates the change in log-likelihood on
setting \beta_i = 0. This happens in such a way that as |\hat\beta_i|
\to \infty, the t statistic tends to zero. Thus highly significant
coefficients according to the likelihood ratio test may have
non-significant t ratios.

To expand a little, if |t| is small it can EITHER mean than the Taylor
expansion works and hence the likelihood ratio statistic is small OR
that |\hat\beta_i| is very large, the approximation is poor and the
likelihood ratio statistic is large. (I was using `significant' as
meaning practically important.) But we can only tell if |\hat\beta_i|
is large by looking at the curvature at \beta_i=0, not at
|\hat\beta_i|. This really does happen: from later on in V&R2:

There is one fairly common circumstance in which both convergence
problems and the Hauck-Donner phenomenon (and trouble with
\sfn{step}) can occur. This is when the fitted probabilities
are extremely close to zero or one. Consider a medical diagnosis
problem with thousands of cases and around fifty binary
explanatory variables (which may arise from coding fewer
categorical factors); one of these indicators is rarely true but
always indicates that the disease is present. Then the
fitted probabilities of cases with that indicator should be one,
which can only be achieved by taking \hat\beta_i = \infty.
The result from \sfn{glm} will be warnings and an estimated
coefficient of around +/- 10 [and an insignificant t value].

That was based on a real-life example, which prompted me to write what
is now stepAIC. Once I had that to try, I found lots of examples.

Brian Ripley