Motivation

R is well suited for statistical graphics, the application of advanced data analysis techniques, and Monte Carlo studies of estimators. However, it lacks support for the typical data management tasks as they arise in the social sciences as well as for the simple generation of desctiptive statistics. “memisc” facilitates not only typical data management tasks of survey researchers, but also the generation of descriptive statistics, as they are often a first step in serious social science data analysis. In particular it facilitates the creation of tables of percentages of other descriptive statistics broken down by subgroups in the data. This is mainly achieved by the function genTable, which is described in the following section. The section thereafter describes how tables thus created can be exported to LaTeX and HTML.

Creating Tables of Descriptive Statistics

General table of descriptive statistics can be created using the function genTable(). The syntax of calls to this function is quite similar to that of the function xtabs(): The first argument (tagged formula) is a formula that determines the descriptive statistics used and by what groups they are computed. The left-hand side of the formula determines the statistics being computed. The right-hand side determines the grouping factor(s). The second argument is an optional data= argument that determines from which data frame or data set the descriptive statistics are to be computed. This is illustrated by the following example, which uses (like the page on item objects) the GLES 2013 election study1. In this example we first create a table of some descriptives of the age distribution of the respondents per German federal state:

library(memisc)
ZA5702 <- spss.system.file("Data/ZA5702_v2-0-0.sav")
gles2013work <- subset(ZA5702,
select=c(
wave                  = survey,
gender                   = vn1,
byear                 = vn2c,
bmonth                = vn2b,
intent.turnout        = v10,
turnout               = n10,
voteint.candidate     = v11aa,
voteint.list          = v11ba,
postal.vote.candidate = v12aa,
postal.vote.list      = v12ba,
vote.candidate        = n11aa,
vote.list             = n11ba,
bula                  = bl
))
gles2013work <- within(gles2013work,{
measurement(byear) <- "interval"
measurement(bmonth) <- "interval"
age <- 2013 - byear
age[bmonth > 9] <- age[bmonth > 9] - 1

})
options(digits=3)
age.tab <- genTable(c(Mean=mean(age),
Std.dev=sd(age),
Median=median(age))~bula,
data=gles2013work)
age.tab
         bula
Baden-Wuerttemberg Bayern Berlin Brandenburg Bremen Hamburg Hessen
Mean                  54.5   54.4   52.8        59.7   60.4    51.5   56.9
Std.dev               18.9   18.9   19.8        19.3   11.5    18.7   18.5
Median                57.0   56.0   57.0        62.5   63.0    53.0   60.0
bula
Mecklenburg-Vorpommern Niedersachsen Nordrhein-Westfalen
Mean                      57.0          55.1                53.9
Std.dev                   19.2          18.4                19.1
Median                    60.5          56.0                55.0
bula
Rheinland-Pfalz Saarland Sachsen Sachsen-Anhalt Schleswig-Holstein
Mean               57.2     61.9    58.3           54.7               60.0
Std.dev            18.2     17.3    16.7           17.1               19.9
Median             60.5     65.0    60.5           56.0               65.0
bula
Thueringen
Mean          57.8
Std.dev       17.4
Median        60.0

This table does not look good, so we transprose it:

age.tab <- t(age.tab)
age.tab

bula                     Mean Std.dev Median
Bayern                 54.4    18.9   56.0
Berlin                 52.8    19.8   57.0
Brandenburg            59.7    19.3   62.5
Bremen                 60.4    11.5   63.0
Hamburg                51.5    18.7   53.0
Hessen                 56.9    18.5   60.0
Mecklenburg-Vorpommern 57.0    19.2   60.5
Niedersachsen          55.1    18.4   56.0
Nordrhein-Westfalen    53.9    19.1   55.0
Rheinland-Pfalz        57.2    18.2   60.5
Saarland               61.9    17.3   65.0
Sachsen                58.3    16.7   60.5
Sachsen-Anhalt         54.7    17.1   56.0
Schleswig-Holstein     60.0    19.9   65.0
Thueringen             57.8    17.4   60.0

In the next example we create a table of percentages of the second votes per federal state. First we have to prepare the data, though:

gles2013work <- within(gles2013work,{

candidate.vote <- cases(
wave == 1 & intent.turnout == 6 -> postal.vote.candidate,
wave == 1 & intent.turnout %in% 4:5 -> 900,
wave == 1 & intent.turnout %in% 1:3 -> voteint.candidate,
wave == 2 & turnout == 1 -> vote.candidate,
wave == 2 & turnout == 2 -> 900
)

list.vote <- cases(
wave == 1 & intent.turnout == 6 -> postal.vote.list,
wave == 1 & intent.turnout %in% 4:5 -> 900,
wave == 1 & intent.turnout %in% 1:3 -> voteint.list,
wave == 2 & turnout ==1 -> vote.list,
wave == 2 & turnout ==2 -> 900
)

candidate.vote <- recode(as.item(candidate.vote),
"CDU/CSU"   =  1 <- 1,
"SPD"       =  2 <- 4,
"FDP"       =  3 <- 5,
"Grüne"     =  4 <- 6,
"NPD"       =  6 <- 206,
"Piraten"   =  7 <- 215,
"AfD"       =  8 <- 322,
"Other"     = 10 <- 801,
"No Vote"   = 90 <- 900,
"WN"        = 98 <- -98,
"KA"        = 99 <- -99
)
list.vote <- recode(as.item(list.vote),
"CDU/CSU"   =  1 <- 1,
"SPD"       =  2 <- 4,
"FDP"       =  3 <- 5,
"Grüne"     =  4 <- 6,
"NPD"       =  6 <- 206,
"Piraten"   =  7 <- 215,
"AfD"       =  8 <- 322,
"Other"     = 10 <- 801,
"No Vote"   = 90 <- 900,
"WN"        = 98 <- -98,
"KA"        = 99 <- -99
)

missing.values(candidate.vote) <- 98:99
missing.values(list.vote) <- 98:99
measurement(candidate.vote) <- "nominal"
measurement(list.vote) <- "nominal"

})
Warning in cases(postal.vote.candidate <- wave == 1 & intent.turnout == :
conditions are not exhaustive
Warning in cases(postal.vote.list <- wave == 1 & intent.turnout == 6, 900 <-
wave == : conditions are not exhaustive

(When the code is run, some warnings are issued, that indicate that the conditions are not exhaustive, that is, there are some observations for which none of the conditions in the call cases() are met. The corresponding elements of resulting vector will contain NA for these observations. In the present case this occurs with observations that have missing values in both intent.turnout and turnout.)

After having set up the data, we get our table of percentages:

vote.tab <- genTable(percent(list.vote)~bula,
data=gles2013work)
options(digits=1)
t(vote.tab)

bula                     CDU/CSU   SPD   FDP Grüne Linke   NPD Piraten   AfD
Baden-Wuerttemberg        27.7  21.8   7.0  17.2   6.0   0.4     2.1   4.6
Bayern                    36.4  17.7   5.5  10.6   5.3   0.0     2.4   4.0
Berlin                    26.5  22.3   8.4  10.2  13.9   1.8     1.8   6.6
Brandenburg               20.4  22.8   2.5   5.6  18.5   0.6     0.6   2.5
Bremen                    21.7  26.1   0.0  17.4  13.0   0.0     0.0   4.3
Hamburg                   22.2  35.6   2.2   4.4   6.7   2.2     0.0   4.4
Hessen                    42.0  26.5   3.0   8.5   4.0   0.0     0.5   3.0
Mecklenburg-Vorpommern    32.9  19.9   2.1   4.1  17.8   1.4     2.7   1.4
Niedersachsen             32.7  32.4   3.2   9.5   3.2   0.0     0.7   0.7
Nordrhein-Westfalen       32.7  31.3   3.4  10.7   3.7   0.4     2.3   1.8
Rheinland-Pfalz           39.4  21.3   1.6   6.3   8.7   1.6     0.8   3.9
Saarland                  40.0  40.0   0.0   0.0   0.0   0.0     0.0   0.0
Sachsen                   49.4  16.6   1.2   3.3  14.2   0.3     1.2   0.9
Sachsen-Anhalt            27.0  29.5   1.2   8.3  19.1   0.4     0.8   0.4
Schleswig-Holstein        28.4  25.9   4.3   9.5   4.3   0.0     0.0   5.2
Thueringen                35.1  15.9   1.6   2.9  22.4   1.2     0.0   2.4

bula                     Other No Vote     N
Bayern                   2.0    16.0 451.0
Berlin                   0.6     7.8 166.0
Brandenburg              1.2    25.3 162.0
Bremen                   0.0    17.4  23.0
Hamburg                  2.2    20.0  45.0
Hessen                   0.0    12.5 200.0
Mecklenburg-Vorpommern   0.0    17.8 146.0
Niedersachsen            0.4    17.3 284.0
Nordrhein-Westfalen      0.7    13.1 563.0
Rheinland-Pfalz          1.6    15.0 127.0
Saarland                 0.0    20.0  30.0
Sachsen                  0.3    12.7 332.0
Sachsen-Anhalt           0.0    13.3 241.0
Schleswig-Holstein       0.9    21.6 116.0
Thueringen               0.8    17.6 245.0

It is of course also possible to create multi-dimensional tables, i.e. tables created by grouping by more than one factor:

gles2013work <- within(gles2013work,{

# We relabel the items, since they are originally in German
labels(turnout) <- c("Yes, voted"=1, "No, did not vote"=2)
labels(gender) <- c("Male"=1,"Female"=2)
})
genTable(percent(turnout)~gender+bula,
data=gles2013work)
, , bula = Baden-Wuerttemberg

gender
Male Female
Yes, voted         88     85
No, did not vote   12     15
N                  90     61

, , bula = Bayern

gender
Male Female
Yes, voted         85     80
No, did not vote   15     20
N                  89    129

, , bula = Berlin

gender
Male Female
Yes, voted        100     85
No, did not vote    0     15
N                  38     52

, , bula = Brandenburg

gender
Male Female
Yes, voted         83     77
No, did not vote   17     23
N                  36     62

, , bula = Bremen

gender
Male Female
Yes, voted         91     80
No, did not vote    9     20
N                  11      5

, , bula = Hamburg

gender
Male Female
Yes, voted         88     76
No, did not vote   12     24
N                  16     21

, , bula = Hessen

gender
Male Female
Yes, voted         91     81
No, did not vote    9     19
N                  66     48

, , bula = Mecklenburg-Vorpommern

gender
Male Female
Yes, voted         84     72
No, did not vote   16     28
N                  32     47

, , bula = Niedersachsen

gender
Male Female
Yes, voted         88     83
No, did not vote   12     17
N                  75     70

, , bula = Nordrhein-Westfalen

gender
Male Female
Yes, voted         90     82
No, did not vote   10     18
N                 148    158

, , bula = Rheinland-Pfalz

gender
Male Female
Yes, voted         84     85
No, did not vote   16     15
N                  43     34

, , bula = Saarland

gender
Male Female
Yes, voted         91     72
No, did not vote    9     28
N                  11     18

, , bula = Sachsen

gender
Male Female
Yes, voted         88     88
No, did not vote   12     12
N                 103     73

, , bula = Sachsen-Anhalt

gender
Male Female
Yes, voted         89     81
No, did not vote   11     19
N                  63     73

, , bula = Schleswig-Holstein

gender
Male Female
Yes, voted         89     85
No, did not vote   11     15
N                  37     33

, , bula = Thueringen

gender
Male Female
Yes, voted         91     71
No, did not vote    9     29
N                  70     73

Formatting Tables of Descriptive Statistics

The results of genTable() are objects of class "table" so that they can be re-arranged into a “flattened” table by the function ftable. To demonstrate this, we continue the previous example:

gt <- genTable(percent(turnout)~gender+bula,
data=gles2013work)
# We beautify the table a bit ...
names(dimnames(gt)) <- c("Voted","Gender","State")
gt <- dimrename(gt,"Yes, voted"="Yes",
"No, did not vote"="No")
ftable(gt,col.vars = c("Gender","Voted"))
                       Gender Male         Female
Voted   Yes  No   N    Yes  No   N
State
Baden-Wuerttemberg              88  12  90     85  15  61
Bayern                          85  15  89     80  20 129
Berlin                         100   0  38     85  15  52
Brandenburg                     83  17  36     77  23  62
Bremen                          91   9  11     80  20   5
Hamburg                         88  12  16     76  24  21
Hessen                          91   9  66     81  19  48
Mecklenburg-Vorpommern          84  16  32     72  28  47
Niedersachsen                   88  12  75     83  17  70
Nordrhein-Westfalen             90  10 148     82  18 158
Rheinland-Pfalz                 84  16  43     85  15  34
Saarland                        91   9  11     72  28  18
Sachsen                         88  12 103     88  12  73
Sachsen-Anhalt                  89  11  63     81  19  73
Schleswig-Holstein              89  11  37     85  15  33
Thueringen                      91   9  70     71  29  73

Arranging the cells of a table using ftable() improves the appearance of the results of genTable() on screen, but to include the results into a word processor document or a LaTeX file, further facilities are needed and provided by “memisc”. To include the flattened table into a LaTeX document, one can convert and store it in the appropriate format using toLatex() and writeLines()

ft <- ftable(gt,col.vars = c("Gender","Voted"))
lt <- toLatex(ft,digits=c(1,1,0,1,1,0))
writeLines(lt,con="Voted2013-GenderState.tex")

For HTML output, one can use show_html() (e.g. for inclusion in “knitr” documents) and write_html(), both functions being based on format_html(). Here we continue the example to demonstate this:

show_html(ft,digits=c(1,1,0,1,1,0))
 Gender: Male Female State Voted: Yes No N Yes No N Baden-Wuerttemberg 87 . 8 12 . 2 90 85 . 2 14 . 8 61 Bayern 85 . 4 14 . 6 89 79 . 8 20 . 2 129 Berlin 100 . 0 0 . 0 38 84 . 6 15 . 4 52 Brandenburg 83 . 3 16 . 7 36 77 . 4 22 . 6 62 Bremen 90 . 9 9 . 1 11 80 . 0 20 . 0 5 Hamburg 87 . 5 12 . 5 16 76 . 2 23 . 8 21 Hessen 90 . 9 9 . 1 66 81 . 2 18 . 8 48 Mecklenburg-Vorpommern 84 . 4 15 . 6 32 72 . 3 27 . 7 47 Niedersachsen 88 . 0 12 . 0 75 82 . 9 17 . 1 70 Nordrhein-Westfalen 89 . 9 10 . 1 148 82 . 3 17 . 7 158 Rheinland-Pfalz 83 . 7 16 . 3 43 85 . 3 14 . 7 34 Saarland 90 . 9 9 . 1 11 72 . 2 27 . 8 18 Sachsen 88 . 3 11 . 7 103 87 . 7 12 . 3 73 Sachsen-Anhalt 88 . 9 11 . 1 63 80 . 8 19 . 2 73 Schleswig-Holstein 89 . 2 10 . 8 37 84 . 8 15 . 2 33 Thueringen 91 . 4 8 . 6 70 71 . 2 28 . 8 73
show_html(ft,digits=c(1,1,0,1,1,0),show.titles=FALSE)
 Male Female Yes No N Yes No N Baden-Wuerttemberg 87 . 8 12 . 2 90 85 . 2 14 . 8 61 Bayern 85 . 4 14 . 6 89 79 . 8 20 . 2 129 Berlin 100 . 0 0 . 0 38 84 . 6 15 . 4 52 Brandenburg 83 . 3 16 . 7 36 77 . 4 22 . 6 62 Bremen 90 . 9 9 . 1 11 80 . 0 20 . 0 5 Hamburg 87 . 5 12 . 5 16 76 . 2 23 . 8 21 Hessen 90 . 9 9 . 1 66 81 . 2 18 . 8 48 Mecklenburg-Vorpommern 84 . 4 15 . 6 32 72 . 3 27 . 7 47 Niedersachsen 88 . 0 12 . 0 75 82 . 9 17 . 1 70 Nordrhein-Westfalen 89 . 9 10 . 1 148 82 . 3 17 . 7 158 Rheinland-Pfalz 83 . 7 16 . 3 43 85 . 3 14 . 7 34 Saarland 90 . 9 9 . 1 11 72 . 2 27 . 8 18 Sachsen 88 . 3 11 . 7 103 87 . 7 12 . 3 73 Sachsen-Anhalt 88 . 9 11 . 1 63 80 . 8 19 . 2 73 Schleswig-Holstein 89 . 2 10 . 8 37 84 . 8 15 . 2 33 Thueringen 91 . 4 8 . 6 70 71 . 2 28 . 8 73
# Writing into a HTML file ...
write_html(ft,digits=c(1,1,0,1,1,0),show.titles=FALSE,
file="Voted2013-GenderState.html")

Continuing another example:

# age.tab was created earlier
age.ftab <- ftable(age.tab,row.vars=1)
show_html(age.ftab,digits=1,show.titles=FALSE)
 Mean Std.dev Median Baden-Wuerttemberg 54 . 5 18 . 9 57 . 0 Bayern 54 . 4 18 . 9 56 . 0 Berlin 52 . 8 19 . 8 57 . 0 Brandenburg 59 . 7 19 . 3 62 . 5 Bremen 60 . 4 11 . 5 63 . 0 Hamburg 51 . 5 18 . 7 53 . 0 Hessen 56 . 9 18 . 5 60 . 0 Mecklenburg-Vorpommern 57 . 0 19 . 2 60 . 5 Niedersachsen 55 . 1 18 . 4 56 . 0 Nordrhein-Westfalen 53 . 9 19 . 1 55 . 0 Rheinland-Pfalz 57 . 2 18 . 2 60 . 5 Saarland 61 . 9 17 . 3 65 . 0 Sachsen 58 . 3 16 . 7 60 . 5 Sachsen-Anhalt 54 . 7 17 . 1 56 . 0 Schleswig-Holstein 60 . 0 19 . 9 65 . 0 Thueringen 57 . 8 17 . 4 60 . 0

Of course we can also export to LaTeX:

toLatex(age.ftab,digits=1,show.titles=FALSE)
\begin{tabular}{llD{.}{.}{1}D{.}{.}{1}D{.}{.}{1}}
\toprule
&& \multicolumn{1}{c}{Mean}&\multicolumn{1}{c}{Std.dev}&\multicolumn{1}{c}{Median}\\
\midrule
Baden-Wuerttemberg     && 54.5 & 18.9 & 57.0\\
Bayern                 && 54.4 & 18.9 & 56.0\\
Berlin                 && 52.8 & 19.8 & 57.0\\
Brandenburg            && 59.7 & 19.3 & 62.5\\
Bremen                 && 60.4 & 11.5 & 63.0\\
Hamburg                && 51.5 & 18.7 & 53.0\\
Hessen                 && 56.9 & 18.5 & 60.0\\
Mecklenburg-Vorpommern && 57.0 & 19.2 & 60.5\\
Niedersachsen          && 55.1 & 18.4 & 56.0\\
Nordrhein-Westfalen    && 53.9 & 19.1 & 55.0\\
Rheinland-Pfalz        && 57.2 & 18.2 & 60.5\\
Saarland               && 61.9 & 17.3 & 65.0\\
Sachsen                && 58.3 & 16.7 & 60.5\\
Sachsen-Anhalt         && 54.7 & 17.1 & 56.0\\
Schleswig-Holstein     && 60.0 & 19.9 & 65.0\\
Thueringen             && 57.8 & 17.4 & 60.0\\
\bottomrule
\end{tabular}

After formatting with LaTeX, this may look like this:

1. The German Longitudinal Election Study is funded by the German National Science Foundation (DFG) and carried out outin close cooperation with the DGfW, German Society for Electoral Studies. Principal investigators are Hans Rattinger (University of Mannheim, until 2014), Sigrid Roßteutscher (University of Frankfurt), Rüdiger Schmitt-Beck (University of Mannheim), Harald Schoen (Mannheim Centre for European Social Research, from 2015), Bernhard Weßels (Social Science Research Center Berlin), and Christof Wolf (GESIS – Leibniz Institute for the Social Sciences, since 2012). Neither the funding organisation nor the principal investigators bear any responsibility for the example code shown here.