Tables of summary statistics
As discussed in earlier examples, svyby can be used to estimate statistics in subpopulations and svymean and svytotal give proportions or totals in subpopulations when used on factor variables. In this example we see how to construct reasonably attractive tables of summary statistics using the output from these functions. These examples use the dclus1 survey design object created in an earlier example.
The first example shows the estimation of proportions in the cells of a contingency table: school type (elementary, middle, high) by whether the school met its "comparable improvement" target. The first step is to construct a single factor variable that specifies all the cells in the table and use svymean to estimate proportions
> a <- svymean(~interaction(stype, comp.imp), design = dclus1)
> a
mean SE
interaction(stype, comp.imp)E.No 0.174863 0.0260
interaction(stype, comp.imp)H.No 0.038251 0.0161
interaction(stype, comp.imp)M.No 0.060109 0.0246
interaction(stype, comp.imp)E.Yes 0.612022 0.0417
interaction(stype, comp.imp)H.Yes 0.038251 0.0161
interaction(stype, comp.imp)M.Yes 0.076503 0.0217
This contains all the numbers we need, but the formatting leaves something to be desired. The ftable function reshapes output like this into a flattened table. We specify the variable names and the labels for each level in the rownames argument:
> b <- ftable(a, rownames = list(stype = c("E", "H",
"M"), comp.imp = c("No", "Yes")))
> b
stype E H M
comp.imp
No mean 0.17486339 0.03825137 0.06010929
SE 0.02599552 0.01607602 0.02457177
Yes mean 0.61202186 0.03825137 0.07650273
SE 0.04167572 0.01605469 0.02167084
The major remaining fault in the table is that too many digits are given. We can convert to percentages and then round to one decimal place:
> round(100 * b, 1)
stype E H M
comp.imp
No mean 17.5 3.8 6.0
SE 2.6 1.6 2.5
Yes mean 61.2 3.8 7.7
SE 4.2 1.6 2.2
The second example deals with a table of means produced by svyby. This is more straightforward, since svyby already knows the variable names and levels. First we estimate the mean of 1999 and 2000 API by school type and comparable improvement target
> a<-svyby(~api99 + api00, ~stype + sch.wide, rclus1, svymean, keep.var=TRUE)
> a
stype sch.wide statistic.api99 statistic.api00 SE1 SE2
E.No E No 601.6667 596.3333 70.04669 64.50553
E.Yes E Yes 608.3485 653.6439 23.67277 22.37296
H.No H No 662.0000 659.3333 40.92204 37.80385
H.Yes H Yes 577.6364 607.4545 57.38815 53.97142
M.No M No 611.3750 606.3750 48.19716 48.27853
M.Yes M Yes 607.2941 643.2353 49.49574 49.34813
Now we convert to a table
> ftable(a)
sch.wide No Yes
api99 api00 api99 api00
stype
E svymean 601.66667 596.33333 608.34848 653.64394
SE 47.27582 43.49010 21.52493 20.31720
H svymean 662.00000 659.33333 577.63636 607.45455
SE 29.23003 27.00275 46.50125 43.70468
M svymean 611.37500 606.37500 607.29412 643.23529
SE 41.11886 41.11686 42.53046 42.12850
For variety, we trim the excess digits using the digits argument to print (the previous approach using round would also work).
> print(ftable(a),digits=3)
sch.wide No Yes
api99 api00 api99 api00
stype
E svymean 601.7 596.3 608.3 653.6
SE 47.3 43.5 21.5 20.3
H svymean 662.0 659.3 577.6 607.5
SE 29.2 27.0 46.5 43.7
M svymean 611.4 606.4 607.3 643.2
SE 41.1 41.1 42.5 42.1
Note that digits specifies the number of significant digits, rather than decimal places.
Thomas Lumley
Last modified: Thu Jul 28 10:48:04 PDT 2005