mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion

The package ‘mclogit’ allows the estimation of the two main varieties of multinomial logit models: baseline-category logit models and conditional logit models. It is published on CRAN. Development occurs on GitHub, where both releases and the development tree can be found.

Baseline-category logit models

Multinomial baseline-category logit models are a generalisation of logistic regression, that allow to model not only binary or dichotomous responses, but also polychotomous responses. In addition, they allow to model responses in the form of counts that have a pre-determined sum. These models are described in Agresti (2002). Estimating these models is also supported by the function multinom() in the R package “nnet” (Venables, and Ripley 2002). In the package “mclogit”, the function to estimate these models is called mblogit() (see the relevant manual page), which uses the infrastructure for estimating conditional logit models, exploiting the fact that baseline-category logit models can be re-expressed as condigional logit models.

Baseline-category logit models are constructed as follows. Suppose a categorical dependent variable or response with categories \(j=1,\ldots,q\) is observed for individuals \(i=1,\ldots,n\). Let \(\pi_{ij}\) denote the probability that the value of the dependent variable for individual \(i\) is equal to \(j\), then the baseline-category logit model takes the form:

\[\begin{split}\pi_{ij} = \begin{cases} \dfrac{\exp(\alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri})} {1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})} & \text{for } j>1\\[20pt] \dfrac{1} {1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})} & \text{for } j=1 \end{cases}\end{split}\]

where the first category (\(j=1\)) is the baseline category.

Equivalently, the model can be expressed in terms of log-odds, relative to the baseline-category:

\[\ln\frac{\pi_{ij}}{\pi_{i1}} = \alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri}.\]

Here the relevant parameters of the model are the coefficients \(\alpha_{jk}\) which describe how the values of independent variables (numbered \(k=1,\ldots,r\)) affect the relative chances of the response taking a value \(j\) versus taking the value \(1\). Note that there is one coefficient for each independent variable and each response other than the baseline category.

Conditional logit models

Conditional logit models are motivated by a variety of considerations, notably as a way to model binary panel data or responses in case-control-studies. The variant supported by the package “mclogit” is motivated by the analysis of discrete choices and goes back to McFadden (1974). Here, a series of individuals \(i=1,\ldots,n\) is observed to have made a choice (represented by a number \(j\)) from a choice set \(\mathcal{S}_i\), the set of alternatives at the individual’s disposal. Each alternatives \(j\) in the choice set can be described by the values \(x_{1ij},\ldots,x_{1ij}\) of \(r\) attribute variables (where the variables are enumerated as \(i=1,\ldots,r\)). (Note that in contrast to the baseline-category logit model, these values vary between choice alternatives.) Conditional logit models then posit that individual \(i\) chooses alternative \(j\) from his or her choice set \(\mathcal{S}_i\) with probability

\[\pi_{ij} = \frac{\exp(\alpha_1x_{1ij}+\cdots+\alpha_rx_{rij})} {\sum_{k\in\mathcal{S}_i}\exp(\alpha_1x_{1ik}+\cdots+\alpha_rx_{rik})}.\]

It is worth noting that the conditional logit model does not require that all individuals face the same choice sets. Only that the alternatives in the choice sets can be distinguished from one another by the attribute variables.

The similarities and differences of these models to baseline-category logit model becomes obvious if one looks at the log-odds relative to the first alternative in the choice set:

\[\ln\frac{\pi_{ij}}{\pi_{i1}} = \alpha_{1}(x_{1ij}-x_{1i1})+\cdots+\alpha_{r}(x_{rij}-x_{ri1}).\]

Conditional logit models appear more parsimonious than baseline-category logit models in so far as they have only one coefficient for each independent variables.1 In the “mclogit” package, these models can be estimated using the function mclogit() (see the relevant manual page).

My interest in conditional logit models derives from my research into the influence of parties’ political positions on the patterns of voting. Here, the political positions are the attributes of the alternatives and the choice sets are the sets of parties that run candidates in a countries at various points in time. For the application of the conditional logit models, see my doctoral thesis (Elff 2006).


It is nevertheless possible to re-express baseline-category logit models as conditional logit models, as is shown on this page

Random effects in baseline logit models and conditional logit models

The “mclogit” package allows for the presence of random effects in baseline-category logit and conditional logit models. In baseline-category logit models, the random effects may represent (unobserved) characteristics that are common the individuals in clusters, such as regional units or electoral districts or the like. In conditional logit models, random effects may represent attributes that share across several choice occasions within the same context of choice. That is, if one analyses voting behaviour across countries then an random effect specific to the Labour party may represent unobserved attributes of this party in terms of which it differs from (or is more like) the Social Democratic Party of Germany (SPD). My original motivation for working on conditional logit models with random effects was to make it possible to assess the impact of parties’ political positions on the patterns of voting behaviour in various European countries. The results of this research are published in an article in Electoral Studies (Elff 2009).

In its earliest incarnation, the package supported only a very simple random-intercept extension of conditional logit models (or “mixed conditional logit models”, hence the name of the package). These models can be written as

\[\pi_{ij} = \frac{\exp(\eta_{ij})}{\sum_{k\in\mathcal{S}_i}\exp(\eta_{ik})}\]



where \(x_{hij}\) represents values of independent variables, \(\alpha_h\) are coefficients, \(z_{ik}\) are dummy ariables (that are equal to \(1\) if \(i\) is in cluster \(k\) and equal to \(0\) otherwise), \(b_{jk}\) are random effects with a normal distribution with expectation \(0\) and variance parameter \(\sigma^2\).

Later releases also added support for baseline-category logit models (initially only without random effects). In order to support random effects in baseline-category logit models, the package had to be further modified to allow for conditional logit models with random slopes (this is so because baseline-categoy logit models can be expressed as a particular type of conditional logit models). (The relations between these various model variants will be discussed on a dedicated page as soon as I find the time to write it.)

It should be noted that estimating the parameters of random effects multinomial logit models (whether of baseline-category logit variety or the conditional logit variety) involves the considerable challenges already known from the “generalized linear mixed models” literature. The main challenge is that the likelihood function involves analytically intractable integrals (i.e. there is know way to “solve” or eliminate the intergrals from the formula of the likelihood function). This means that either computationally intensive methods for the computation of such integrals have to be used or certain approximations (most notably the Laplace approximation technique and its variants), which may lead to biases in certain situations. The “mclogit” package only supports approximate likelihood-based inference. Most of the time the PQL-technique based on a (first-order) Laplace approximation was supported, release 0.8, “mclogit” also supports the MQL technique, which is based on a (first-order) Solomon-Cox approximation. The ideas behind the PQL and MQL techniques are described e.g. in Breslow, and Clayton (1993). A dedicated page will describe these techniques as soon as I find the time for this.

Documentation of the Package


Agresti, Alan. 2002. Categorical Data Analysis. New York: Wiley.

Breslow, Norman E. and David G. Clayton. 1993. “Approximate Inference in Generalized Linear Mixed Models”. Journal of the American Statistical Association 88(421): 9-25.

Elff, Martin. 2006. Politische Ideologien, soziale Konflikte und Wahlverhalten: Die Bedeutung politischer Angebote der Parteien für den Zusammenhang zwischen sozialen Merkmalen und Parteipräferenzen in zehn westeuropäischen Demokratien. Baden-Baden: Nomos.

Elff, Martin. 2009. “Social Divisions, Party Positions, and Electoral Behaviour”. Electoral Studies 28(2): 297-308.

McFadden, Daniel. 1974. “Conditional Logit Analysis of Qualitative Choice Behaviour”. 105-142 in Frontiers in Econometrics, ed. by Paul Zarembka. New York: Academic Press.

Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S. New York: Springer.