# The IWLS algorithm used to fit conditional logit models¶

The package “mclogit” fits conditional logit models using a maximum likelihood estimator. It does this by maximizing the log-likelihood function using an iterative weighted least-squares (IWLS) algorithm, which follows the algorithm used by the glm.fit() function from the “stats” package of R.

If $$\pi_{ij}$$ is the probability that individual $$i$$ chooses alternative $$j$$ from his/her choice set $$\mathcal{S}_i$$, where

$\pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})}$

and if $$y_{ij}$$ is the dummy variable with equals 1 if individual $$i$$ chooses alternative $$j$$ and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given $$\pi_{ij}$$) can be written as

$\ell=\sum_{i,j}y_{ij}\ln\pi_{ij} =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right)$

If the data are aggregated in the terms of counts such that $$n_{ij}$$ is the number of individuals with the same choice set and the same choice probabilities $$\pi_{ij}$$ that have chosen alternative $$j$$, the log-likelihood is (given that the choices are identically independent distributed given $$\pi_{ij}$$)

$\ell=\sum_{i,j}n_{ij}\ln\pi_{ij} =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right)$

where $$n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}$$.

If

$\eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\bm{x}_{ij}'\bm{\alpha}$

then the gradient of the log-likelihood with respect to the coefficient vector $$\bm{\alpha}$$ is

$\frac{\partial\ell}{\partial\bm{\alpha}} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\bm{\alpha}} \frac{\partial\ell}{\partial\eta_{ij}} = \sum_{i,j} \bm{x}_{ij} (n_{ij}-n_{i+}\pi_{ij}) = \sum_{i,j} \bm{x}_{ij} n_{i+} (y_{ij}-\pi_{ij}) = \bm{X}'\bm{N}(\bm{y}-\bm{\pi})$

and the Hessian is

$\frac{\partial^2\ell}{\partial\bm{\alpha}\partial\bm{\alpha}'} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\bm{\alpha}} \frac{\partial\eta_{ij}}{\partial\bm{\alpha}'} \frac{\partial\ell^2}{\partial\eta_{ij}^2} = - \sum_{i,j,k} \bm{x}_{ij} n_{i+} (\delta_{jk}-\pi_{ij}\pi_{ik}) \bm{x}_{ij}' = - \bm{X}'\bm{W}\bm{X}$

Here $$y_{ij}$$ is $$n_{ij}n_{i+}^{-1}$$, while $$\bm{N}$$ is a diagonal matrix with diagonal elements $$n_{i+}$$.

Newton-Raphson iterations then take the form

$\bm{\alpha}^{(s+1)} = \bm{\alpha}^{(s)} - \left( \frac{\partial^2\ell}{\partial\bm{\alpha}\partial\bm{\alpha}'} \right)^{-1} \frac{\partial\ell}{\partial\bm{\alpha}} = \bm{\alpha}^{(s)} + \left( \bm{X}'\bm{W}\bm{X} \right)^{-1} \bm{X}'\bm{N}(\bm{y}-\bm{\pi})$

where $$\bm{\pi}$$ and $$\bm{W}$$ are evaluated at $$\bm{\alpha}=\bm{\alpha}^{(s)}$$.

Multiplying by $$\bm{X}'\bm{W}\bm{X}$$ gives

$\bm{X}'\bm{W}\bm{X} \bm{\alpha}^{(s+1)} = \bm{X}'\bm{W}\bm{X} \bm{\alpha}^{(s)} + \bm{X}'\bm{N}(\bm{y}-\bm{\pi}) = \bm{X}'\bm{W} \left(\bm{X}\bm{\alpha}^{(s)}+\bm{W}^-\bm{N}(\bm{y}-\bm{\pi})\right) = \bm{X}'\bm{W}\bm{y}^*$

where $$\bm{W}^-$$ is a generalized inverse of $$\bm{W}$$ and $$\bm{y}^*$$ is a “working response vector” with elements

$y_{ij}^*=\bm{x}_{ij}'\bm{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}}$

The IWLS algorithm thus involves the following steps:

1. Create some suitable starting values for $$\bm{\pi}$$, $$\bm{W}$$, and $$\bm{y}^*$$
2. Construct the “working dependent variable” $$\bm{y}^*$$
3. Solve the equation

$\bm{X}'\bm{W}\bm{X} \bm{\alpha} = \bm{X}'\bm{W}\bm{y}^*$

for $$\bm{\alpha}$$.

4. Compute updated $$\bm{\eta}$$, $$\bm{\pi}$$, $$\bm{W}$$, and bm{y}^*.
5. Compute the updated value for the log-likelihood or the deviance

$d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}$
6. If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically $$\Delta d \leq 10^{-7}$$) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of $$\bm{\alpha}$$.

The starting values for the algorithm used by the mclogit package are constructe as follows:

1. Set

$\eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)$

(where $$q_i$$ is the size of the choice set $$\mathcal{S}_i$$)

2. Compute the starting values of the choice probalities $$\pi_{ij}^{(0)}$$ according to the equation at the beginning of the page
3. Compute intial values of the working dependent variable according to

$y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}$