The IWLS algorithm used to fit conditional logit models

The package “mclogit” fits conditional logit models using a maximum likelihood estimator. It does this by maximizing the log-likelihood function using an iterative weighted least-squares (IWLS) algorithm, which follows the algorithm used by the glm.fit() function from the “stats” package of R.

If \pi_{ij} is the probability that individual i chooses alternative j from his/her choice set \mathcal{S}_i, where

\pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})}

and if y_{ij} is the dummy variable with equals 1 if individual i chooses alternative j and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given \pi_{ij}) can be written as

\ell=\sum_{i,j}y_{ij}\ln\pi_{ij}
    =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right)

If the data are aggregated in the terms of counts such that n_{ij} is the number of individuals with the same choice set and the same choice probabilities \pi_{ij} that have chosen alternative j, the log-likelihood is (given that the choices are identically independent distributed given \pi_{ij})

\ell=\sum_{i,j}n_{ij}\ln\pi_{ij}
    =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right)

where n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}.

If

\eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\bm{x}_{ij}'\bm{\alpha}

then the gradient of the log-likelihood with respect to the coefficient vector \bm{\alpha} is

\frac{\partial\ell}{\partial\bm{\alpha}}
=
\sum_{i,j}
\frac{\partial\eta_{ij}}{\partial\bm{\alpha}}
\frac{\partial\ell}{\partial\eta_{ij}}
=
\sum_{i,j}
\bm{x}_{ij}
(n_{ij}-n_{i+}\pi_{ij})
=
\sum_{i,j}
\bm{x}_{ij}
n_{i+}
(y_{ij}-\pi_{ij})
=
\bm{X}'\bm{N}(\bm{y}-\bm{\pi})

and the Hessian is

\frac{\partial^2\ell}{\partial\bm{\alpha}\partial\bm{\alpha}'}
=
\sum_{i,j}
\frac{\partial\eta_{ij}}{\partial\bm{\alpha}}
\frac{\partial\eta_{ij}}{\partial\bm{\alpha}'}
\frac{\partial\ell^2}{\partial\eta_{ij}^2}
=
-
\sum_{i,j,k}
\bm{x}_{ij}
n_{i+}
(\delta_{jk}-\pi_{ij}\pi_{ik})
\bm{x}_{ij}'
=
-
\bm{X}'\bm{W}\bm{X}

Here y_{ij} is n_{ij}n_{i+}^{-1}, while \bm{N} is a diagonal matrix with diagonal elements n_{i+}.

Newton-Raphson iterations then take the form

\bm{\alpha}^{(s+1)}
=
\bm{\alpha}^{(s)}
-
\left(
\frac{\partial^2\ell}{\partial\bm{\alpha}\partial\bm{\alpha}'}
\right)^{-1}
\frac{\partial\ell}{\partial\bm{\alpha}}
=
\bm{\alpha}^{(s)}
+
\left(
\bm{X}'\bm{W}\bm{X}
\right)^{-1}
\bm{X}'\bm{N}(\bm{y}-\bm{\pi})

where \bm{\pi} and \bm{W} are evaluated at \bm{\alpha}=\bm{\alpha}^{(s)}.

Multiplying by \bm{X}'\bm{W}\bm{X} gives

\bm{X}'\bm{W}\bm{X}
\bm{\alpha}^{(s+1)}
=
\bm{X}'\bm{W}\bm{X}
\bm{\alpha}^{(s)}
+
\bm{X}'\bm{N}(\bm{y}-\bm{\pi})
=
\bm{X}'\bm{W}
\left(\bm{X}\bm{\alpha}^{(s)}+\bm{W}^-\bm{N}(\bm{y}-\bm{\pi})\right)
=
\bm{X}'\bm{W}\bm{y}^*

where \bm{W}^- is a generalized inverse of \bm{W} and \bm{y}^* is a “working response vector” with elements

y_{ij}^*=\bm{x}_{ij}'\bm{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}}

The IWLS algorithm thus involves the following steps:

  1. Create some suitable starting values for \bm{\pi}, \bm{W}, and \bm{y}^*

  2. Construct the “working dependent variable” \bm{y}^*

  3. Solve the equation

    \bm{X}'\bm{W}\bm{X}
\bm{\alpha}
=
\bm{X}'\bm{W}\bm{y}^*

    for \bm{\alpha}.

  4. Compute updated \bm{\eta}, \bm{\pi}, \bm{W}, and bm{y}^*.

  5. Compute the updated value for the log-likelihood or the deviance

    d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}

  6. If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically \Delta d \leq 10^{-7}) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of \bm{\alpha}.

The starting values for the algorithm used by the mclogit package are constructe as follows:

  1. Set

    \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12)
                - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)

    (where q_i is the size of the choice set \mathcal{S}_i)

  2. Compute the starting values of the choice probalities \pi_{ij}^{(0)} according to the equation at the beginning of the page

  3. Compute intial values of the working dependent variable according to

    y_{ij}^{*(0)}
=
\eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}