The IWLS algorithm used to fit conditional logit models¶
The package “mclogit” fits conditional logit models using a maximum likelihood
estimator. It does this by maximizing the log-likelihood function using an
iterative weighted least-squares (IWLS) algorithm, which follows the
algorithm used by the glm.fit()
function from the “stats” package of R.
If \(\pi_{ij}\) is the probability that individual \(i\) chooses alternative \(j\) from his/her choice set \(\mathcal{S}_i\), where
and if \(y_{ij}\) is the dummy variable with equals 1 if individual \(i\) chooses alternative \(j\) and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given \(\pi_{ij}\)) can be written as
If the data are aggregated in the terms of counts such that \(n_{ij}\) is the number of individuals with the same choice set and the same choice probabilities \(\pi_{ij}\) that have chosen alternative \(j\), the log-likelihood is (given that the choices are identically independent distributed given \(\pi_{ij}\))
where \(n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}\).
If
then the gradient of the log-likelihood with respect to the coefficient vector \(\boldsymbol{\alpha}\) is
and the Hessian is
Here \(y_{ij}\) is \(n_{ij}n_{i+}^{-1}\), while \(\boldsymbol{N}\) is a diagonal matrix with diagonal elements \(n_{i+}\).
Newton-Raphson iterations then take the form
where \(\boldsymbol{\pi}\) and \(\boldsymbol{W}\) are evaluated at \(\boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}\).
Multiplying by \(\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}\) gives
where \(\boldsymbol{W}^-\) is a generalized inverse of \(\boldsymbol{W}\) and \(\boldsymbol{y}^*\) is a “working response vector” with elements
The IWLS algorithm thus involves the following steps:
- Create some suitable starting values for \(\boldsymbol{\pi}\), \(\boldsymbol{W}\), and \(\boldsymbol{y}^*\)
- Construct the “working dependent variable” \(\boldsymbol{y}^*\)
-
Solve the equation
\[\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*\]for \(\boldsymbol{\alpha}\).
- Compute updated \(\boldsymbol{\eta}\), \(\boldsymbol{\pi}\), \(\boldsymbol{W}\), and boldsymbol{y}^*.
-
Compute the updated value for the log-likelihood or the deviance
\[d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}\] - If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically \(\Delta d \leq 10^{-7}\)) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of \(\boldsymbol{\alpha}\).
The starting values for the algorithm used by the mclogit package are constructe as follows:
-
Set
\[\eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)\](where \(q_i\) is the size of the choice set \(\mathcal{S}_i\))
- Compute the starting values of the choice probalities \(\pi_{ij}^{(0)}\) according to the equation at the beginning of the page
-
Compute intial values of the working dependent variable according to
\[y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}\]