Logistic Regression, Average Marginal Effects, and the Linear Probability Model - Part II: Coefficients and AMEs of nested models

As mentioned in the previous post, one of the claims made by Mood (2010) is that coefficients of nested models are not comparable, because tend to increase when predictor variables are added to a model, even if these predictor variables are uncorrelated. This post examines how big a problem this is and whether it can be avoided by the use of average marginal effects. This investigation is based on a few simulation studies and an empirical example.

A simulation study

The first simulation study examines the change in coefficient estimates of to nested models that involve two normal distributed predictor variables. This simulation study repeatedly fits two logistic regression models and two linear regression models to artifical data. These data are created such that two predictor variables of i.i.e. random numbers affect a binary response variable. Of the two logistic regression models and the two linear regression models, one includes only one predictor variable and the other includes both. For each logistic regression model fit to an artificial data data set also the average marginal effect is computed for the predictor that is present in both models.

The following two lines of code activate the “memisc” package and run a code file in which the simulation infrastructure is defined.

library(memisc)
source("simsim.R")
Last executed at 30 January 2026 at 22:27:41 GMT+1 in 400ms
Show cell output
Loading required package: lattice
Loading required package: MASS

Attaching package: ‘memisc’

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

    as.array

The following line defines the logistic function.

logistic <- function(x) 1/(1+exp(-x))
Last executed at 30 January 2026 at 22:28:49 GMT+1 in 21ms

The next few lines define a function for the computation of the average marginal effects of the predictor variables in a logistic regression model.

# Note that this function is non-generic is correct only for 
# logistic regression or logit models
AME <- function(mod) {
    p <- predict(mod,type="response")
    cf <- coef(mod)
    cf <- cf[names(cf)!="(Intercept)"]
    unname(mean(p*(1-p),na.rm=TRUE))*cf
}
Last executed at 30 January 2026 at 22:28:52 GMT+1 in 22ms

The following lines define the function that is evaluated in each repliation of the simulation study. The arguments a, b1, b2 are the intercept and slopes of a “true” logistic regression models, according to which binary responses are computed based on the values of two predictor variables x1 and x2. The arguments mu.x1 and mu.x are the expected values of x1 and x2, and n is the simulation sample size. The sample size is set to a large number by default, so that the results of the simulation study mainly reflect the structural features of the model estimates and the average marginal effects, withouth being swamped by sampling error. The objects glm0 and glm1 are the fitted logistic regression models, while lm0 and lm1 are the fitted linear regression models. The number of iteration in the computation of the ML estimates for the logistic regression model is set to maximum of 6, so that the (occassionally occuring) cases of separation do not lead to runaway coefficient estimates.

fun <- function(a=0,b1=1,b2=1,mu.x1=0,mu.x2=0,n=5000) {
    x1 <- rnorm(n=n,mean=mu.x1)
    x2 <- rnorm(n=n,mean=mu.x2)
    p <- logistic(a+b1*x1+b2*x2)
    y <- rbinom(n=n,size=1,prob=p)
    glm1 <- glm(y~x1, family = binomial,maxit=6)
    glm2 <- glm(y~x1+x2, family = binomial,maxit=6)
    lm1 <- lm(y~x1)
    lm2 <- lm(y~x1+x2)
    
    c(
        b_glm1=coef(glm1)[-1],
        b_glm2=coef(glm2)[-1],
        AME_glm1=AME(glm1),
        AME_glm2=AME(glm2),
        b_lm1=coef(lm1)[-1],
        b_lm2=coef(lm2)[-1]
    )

}
Last executed at 30 January 2026 at 23:31:57 GMT+1 in 45ms

The following code runs a simulation with various settings for the parameters b1 and b2, the coeffients of the data response generating model. The settings comprise zero and both small and large values for the coefficients.

simres <- simsim(fun,
                 conditions=list(
                     b1=c(0,0.1,.3,0.5,1,1.5,2,2.5,3),
                     b2=c(0,0.1,.3,0.5,1,1.5,2,2.5,3)
                 ),
                 nsim=100)
Last executed at 30 January 2026 at 23:34:06 GMT+1 in 127083ms
There were 50 or more warnings (use warnings() to see the first 50)

The simulation results are collected into an array, which in the following line of code is transferred into a data frame.

simres.df <- as.data.frame(simres)
Last executed at 30 January 2026 at 23:34:12 GMT+1 in 20ms

For the visualization of the simulation results, the package “ggplot2” is activated, and a particular theme for the overal appearance is set.

library(ggplot2)
theme_set(theme_bw())
Last executed at 30 January 2026 at 23:34:13 GMT+1 in 38ms

The following diagram plots the distribution of the differences between the estimates of b1 from a logistic regression fit with x1 only (b_glm1.x1) and and with both x1 and x2 (b_glm2.x1). The difference b_glm2.x1 - b_glm1.x1 expresses how much the estimate of b1 changes when the variable x2 is added to the model. The diagram makes clear that (almost) no change occurs on average if either b1 or b2 (or both) are equal to 0, while the change gets the larger the larger the values of b1 and b2.

ggplot(simres.df) + 
  geom_boxplot(aes(b2,b_glm2.x1 - b_glm1.x1,group=b2)) +
  facet_wrap(~b1, labeller = label_both,scales="free_y")
Last executed at 30 January 2026 at 23:34:16 GMT+1 in 961ms

In the same way as the previous diagram plots the difference between the estimates of the coefficient of x1 from glm0 and glm1, the following diagram plots the differences between the avarage marginal effect of x1 according to these models. It becomes clear that at least on average, adding the variable x2 to the model with x1 does not change, on average, the AME of x1.

ggplot(simres.df) + 
  geom_boxplot(aes(b2,AME_glm2.x1 - AME_glm1.x1,group=b2)) +
  facet_wrap(~b1, labeller = label_both)
Last executed at 30 January 2026 at 23:34:18 GMT+1 in 885ms

Is the change in the coefficient estimate of x1 the consequence of a bias in model with both variables x1 and x2 (i.e. glm1) or with only x1 (i.e. glm0)? This can be gleaned from the following diagram, which shows the distribution of the difference between estimate of the coefficient b1 in the full model glm1 and its “true” value (i.e. the value from which the observations are generated).

ggplot(simres.df) + 
  geom_boxplot(aes(b2,b_glm2.x1-b1,group=b2)) +
  facet_wrap(~b1, labeller = label_both)
Last executed at 30 January 2026 at 23:34:20 GMT+1 in 858ms

The plot shows that on average, the difference between the estimate b_glm2.x1 and the true parameter value b1 is zero. This should not surprise, however, since the maximum likelihood estimator of logistic regression coefficient is consistent and asymptotically unbiased for correctly specified models (i.e. for a model like glm1) and the sample size is large (with n equal to 5000). This means that, conversely, the estimate b_glm1.x1 is biased.

In the empirical example in the previous blog post we saw that the average marginal effect of a variable can be quite different from its maximum variable if predicted probabilities are mainly close to 0 and 1. The following diagram examines how the average marginal effect varies with the true coefficients of the generating model:

# Here the AME's come from glm2, which has the 
# model formula y~x1+x2
ggplot(simres.df) + 
  geom_boxplot(aes(b2,AME_glm2.x1,group=b2)) +
  facet_wrap(~b1, labeller = label_both,scales="free_y")
Last executed at 30 January 2026 at 23:34:22 GMT+1 in 938ms

This shows that the AME of x1 in the full model glm2 not only increases with the coefficient of x1 but also decreases with the coefficient of the other variable, i.e. x1.

But consider now that the AME does not change when a variable is added to a model or removed from it. This implies that the AME of x1 decreases with coefficient of x2, whether or not x2 is in the model! Even though this may be hard to believe, the next diagram shows that this is the case:

# Note that here the AME's come from glm1, which has the 
# model formula y~x1
ggplot(simres.df) + 
  geom_boxplot(aes(b2,AME_glm1.x1,group=b2)) +
  facet_wrap(~b1, labeller = label_both,scales="free_y")
Last executed at 30 January 2026 at 23:34:25 GMT+1 in 914ms

A claim by Moode which gained a log of attention is that if the influence of the predictor variables in a logistic regressions are not “too strong” so that the link between the predictor variables and the predicted probability is (almost) linear, then the AME from a logistic regression and the coefficient from a linear regression should be (almost) identical. The following code creates a diagram that provides an impression of under what conditions is this (more or less) the case.

ggplot(simres.df) + 
  geom_boxplot(aes(b2,b_lm2.x1-AME_glm2.x1,group=b2)) +
  facet_wrap(~b1, labeller = label_both,scales="free_y")
Last executed at 31 January 2026 at 00:17:03 GMT+1 in 1347ms

The diagram just created contains the differences between the AMEs from logistic regression and OLS estimates from linear regression. It shows that, indeed, AMEs and linear regression coefficients are very close and that there is hardly any systematic variation in the difference between the values of these two quantities.

A real-world example

The following explores the issue of comparing coefficients across nested models with “real-world” data. We use the same data as in the previous post. The data are from a survey conducted on occasion of the 1988 Chilean plebiscite, available from the R package “carData”.

The following lines activate two packages, “carData” and “memisc”.

library(carData)
library(memisc)
Last executed at 30 January 2026 at 23:40:16 GMT+1 in 31ms

The next few lines create a binary variable for the votes, where 1 represents a vote for Pinochet and 0 represents a vote against him.

Chile %$$%  {
    vote2 <- factor(vote, levels = c("N","Y"))
    vote.bin <- as.integer(vote2 == "Y")
}
Last executed at 30 January 2026 at 23:40:17 GMT+1 in 24ms

The following lines of code define a couple logistic regression models with the binary vote variable (vote.bin) as response, and age, sex, and statusquo as predictor variables. There are several nesting relations among these models. Both glm_1 and glm_2are nested in glm_3, glm_1 and glm_4 are nested in glm_5, glm_2 and glm_4 are nested in glm_6, and finally glm_1, glm_2, glm_3, glm_4, glm_5, and glm_6 nested in glm_7. Comparing these models allow us to see how much coefficient estimates are affected when variables are nested to a model, and how this varies if the existing or the added variables have a “strong” impact, i.e. lead to many of the predicted probabilities getting closer to 0 and 1.

glm_1 <- glm(vote.bin ~ age, 
             data = Chile, 
             family = binomial)
glm_2 <- glm(vote.bin ~ sex, 
             data = Chile, 
             family = binomial)
glm_3 <- glm(vote.bin ~ age + sex, 
             data = Chile, 
             family = binomial)
glm_4 <- glm(vote.bin ~ statusquo, 
             data = Chile, 
             family = binomial)
glm_5 <- glm(vote.bin ~ age + statusquo, 
             data = Chile, 
             family = binomial)
glm_6 <- glm(vote.bin ~ sex + statusquo, 
             data = Chile, 
             family = binomial)
glm_7 <- glm(vote.bin ~ age + sex + statusquo, 
             data = Chile, 
             family = binomial)

Last executed at 30 January 2026 at 23:40:18 GMT+1 in 73ms

The next lines allow for a comparison of the coefficient estimates of the models specified and estimated in the lines above. Comparing the coefficients suggests that adding even a strong predictor such as status-quo preferences to a model with a weak predictor does not necessary lead to major changes in the estimates. The most drastic change that occurs is the one of the coefficient of sex if status-quo preferences are added to the model. In model glm_2 the coefficient is -0.584, while in model glm_6 it is -0.718.

mtable(
    "(1)"=glm_1,
    "(2)"=glm_2,
    "(3)"=glm_3,
    "(4)"=glm_4,
    "(5)"=glm_5,
    "(6)"=glm_6,
    "(7)"=glm_7
) |> show_html()
Last executed at 30 January 2026 at 23:40:19 GMT+1 in 102ms
(1) (2) (3) (4) (5) (6) (7)
(Intercept) −0 . 814*** 0 . 279*** −0 . 529*** 0 . 215* −0 . 200 0 . 594*** 0 . 233
(0 . 133) (0 . 070) (0 . 141) (0 . 100) (0 . 266) (0 . 146) (0 . 296)
age 0 . 021*** 0 . 022*** 0 . 011 0 . 009
(0 . 003) (0 . 003) (0 . 007) (0 . 007)
sex: M/F −0 . 584*** −0 . 610*** −0 . 718*** −0 . 696***
(0 . 097) (0 . 098) (0 . 197) (0 . 198)
statusquo 3 . 206*** 3 . 196*** 3 . 208*** 3 . 195***
(0 . 143) (0 . 143) (0 . 144) (0 . 144)
Log-likelihood −1197 . 036 −1199 . 256 −1177 . 407 −376 . 294 −374 . 881 −369 . 561 −368 . 591
N 1757 1757 1757 1754 1754 1754 1754

Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05

The following lines define a function that allow a convenient display of various quantities, such as AMEs and correlations.

print_nicely <- function(x, digits=3) {
    x <- formatC(x,format="f",digits=digits)
    x <- gsub("NA","  ",x,fixed=TRUE)
    x <- format(x,justify="right")
    print(x,quote=FALSE)
}
Last executed at 30 January 2026 at 23:40:21 GMT+1 in 25ms

In a way similar way as above, the following lines allow to compare the AMEs between nested models. This comparison yields a suprising result, given the claimed advantages of AMEs, namely that they supposedly change little between nested models. However, the AME of sex changes quite drastically if status-quo preferences are added to the model (as the comparison of the AMEs from glm_2 and glm_6 indicates). While we found a substantial change in the coefficient of sex from model glm_2 to model glm_6, it was small relative to its size. However the change in the AME of sex from model glm_2 to model glm_6 is large relative compared to its initial value of -0.146. In glm_6 the AME of sex is less than half of that in glm_2. Notably, while the coefficient gets larger, the AME gets smaller.

collect(
    "(1)"=AME(glm_1),
    "(2)"=AME(glm_2),
    "(3)"=AME(glm_3),
    "(4)"=AME(glm_4),
    "(5)"=AME(glm_5),
    "(6)"=AME(glm_6),
    "(7)"=AME(glm_7)
) |> print_nicely()
Last executed at 30 January 2026 at 23:40:22 GMT+1 in 39ms
          (1)    (2)    (3)    (4)    (5)    (6)    (7)   
age        0.005         0.005         0.001         0.001
sexM             -0.143 -0.146               -0.043 -0.041
statusquo                       0.195  0.193  0.191  0.190

While the simulation study discussed earlier indicate that coefficients differ more between nested models than AMEs, the example of the Chile plebiscite seems to suggest otherwise. Yet the predictor variabels in the simulation study were, at least at population-level, uncorrelated. The different conclusion from the empirical data may come from the predictor variables being correlated. However, as the results from following code show, the correlations among the predictor variables in the empirical example are far from being strong.

with(Chile,cor(
    cbind(age,sex,statusquo),
    use="pairwise"
)) |> print_nicely()
Last executed at 30 January 2026 at 23:40:25 GMT+1 in 26ms
          age    sex    statusquo
age        1.000  0.017  0.115   
sex        0.017  1.000 -0.067   
statusquo  0.115 -0.067  1.000   

What can be concluded from the previous example is that the specific “advantage” of average marginal effects breaks down when the predictor variables are correlated: In the example, an AME (the one related to respondents’ sex) changed quite substantially when a predictor was added to a model, while the coefficient changed relatively little. This could be interpretated such that in the present case the logitstic regression coefficient fails to express that the effect of sex on voting for Pinochet is partially mediated by status quo preferences, while this is reflected by the average marginal effects.

Discussion and conclusion

The simulation studies reported about in this post suggest the following conclusions:

The empirical example leads to a qualification of the conlusions drawn from the simulation study: If predictor variables are correlated and then it may happen that the relative change of an AME of a predictor variable due to adding an omitted variable can be larger than the relative change of a coefficient. On the other hand, changes in a coefficient due to adding an omitted variable to the model is not always substantial in size.

That adding an omitted variable to a model even if it is uncorrelated to variables already present in a model may make it difficult if not important to decompose a total effect of a predictor variable into a direct and an indirect effect. However, one the one hand, this decomposability of total effects may only work if total, direct, and indirect effects are all linear (or nearly linear). But it is arguably a very strong (and not very plausible) that all relations between variables in the social science are linear. The linearity assumption is all too often motivated by convenience that (for the same reason) remains untested. On the hand, there seems to be a solution of the (alleged) problem of coefficient incomparability, the KHB-techique, developed by Karlson, Holm, and Breen (2012).

That AMEs are influenced not only by the coefficient of the predictor variable of interest but also by other variables with influence on the response (whether they are included in the model or not) has some consequences that one may consider as paradoxical or inconvenient, if one sticks to the use of AMEs to describe the effect of predictor variables on a binary response. These consequences will be uncovered in the next blog post.