===================================
mpred: Generic Predictive Margins
===================================
Many dependent variables of interest in the study of political behaviour and
opinion formation are categorical. Statistical models involving such dependent
variables generally pose a challenge for their interpretation (apart from the fact that
estimation usually is more difficult than in models for numeric dependent
variables). Because of the difficulty of interpretation one may want to resort
to graphics in which predicted values of a depedent variables are plotted
against values of one or more independent variables of interest. The problem of
such plots based on models for categorical responses is that the pattern of
dependence between dependent variable and independent variable of interest
usually is non-linear and typically also depends on values of other independent
variables that may not of interest. E.g. in a logistic regression of a binary
dependent variable $Y$ with independent variables $X_1$ and $X_2$,
.. math::
\Pr(Y=1|X_1=x_1,X_2=x_2)=\frac{\exp(\beta_0+\beta_1x_1+\beta_2)}{1+\exp(\beta_0+\beta_1x_1+\beta_2)}
the unit change of a positive outcome probability
.. math::
\Pr(Y=1|X_1=x_1,X_2=x_2)-\Pr(Y=1|X_1=x_1+1,X_2=x_2)
is not constant (as in a linear regression), but depends on the particular value
of $x_1$. Furthermore, it also depends on the particular value of $x_2$. This
variation in the unit change, when identified as the "unit effect" of $x_1$ on
$y$, has led various authors to claim that the presence of interaction terms
(e.g. $x_1x_2$) in a logistic regression model (or other model for categorical
dependent variables) is neither a necessary or sufficient condition for the
existence of interaction effects (citation coming soon).
A way out of the ensuing complications is to focus instead on the expectation or
the average of this unit change:
.. math::
\sum_z\Pr(Y=1|X_1=x_1,X_2=z)f(z)-\sum_z\Pr(Y=1|X_1=x_1+1,X_2=z)f(z)
where $f(z)$ is either the density function or probability mass function of
$X_2$ or the emprical distribution of $X_2$. If the empirical distribution is
used then this difference is a difference of what is also called the
*predictive margins* of $Y$ for $X_1=x_1$ and $X_1=x_1+1$. A predictive margin
for $X_1=x_1$ from the logistic regression model under discussion is defined as
.. math::
\frac1n\sum_i\Pr(Y=1|X_1=x_1,X_2=z_i)
where $n$ is the sample size and $z_1,\ldots,z_n$ are the sample values of
$X_2$. There is an apparent relation between a predictive margin and the
"do-operator" in the terminology of Judea Pearl. It is defined in the present
context (if $X_2$ is discrete) as
.. math::
\Pr(Y=1|do(X_1=x_1)) := \sum_z\Pr(Y=1|X_1=x_1,X_2=z)\Pr(X_2=z)
where the sum is over the range of $X_2$ so that $\sum_z\Pr(X_2=z)=1$. A
predictive margin could be interpreted as an estimate of $\Pr(Y=1|do(X_1=x_1))$.
This package provides a generic function to compute predictive margins from
models with given covariate settings. All classes of model objects can be used
that have a `predict()` method that allows for a `newdata=` argument. The
package is available not (yet) available from CRAN_ but only on GitHub_.
If you have the package devtools_ installed, you can
install the package by
.. code-block:: r
library(devtools)
install_github("melff/mpred/pkg")
Documentation
=============
.. toctree::
:titlesonly:
:maxdepth: 1
mpred/manual-pages
mpred/manual-index
.. _CRAN: https://cran.r-project.org
.. _GitHub: https://github.com/melff/mpred
.. _devtools: https://cran.r-project.org/package=devtools
.. _githubinstall: https//cran.r-project.org/package=githubinstall