Domain estimators
Estimating a statistic in a subpopulation with a complex survey sample requires some care, and the resulting standard errors do not depend only on data in the subpopulation. The survey package takes care of all these details transparently, so it is valid simply to take a subset of a survey design object.
Here we again use the dclus1 and rclus1 design objects created in earlier examples. We want to estimate the total number of students in schools that met their "school-wide growth" and "comparable improvement" targets
> svytotal(~enroll, subset(dclus1, sch.wide=="Yes" & comp.imp=="Yes"))
total SE
enroll 2406420 645444
> svytotal(~enroll, subset(rclus1, sch.wide=="Yes" & comp.imp=="Yes"))
total SE
enroll 2406420 645444
Often we want estimates in a set of subpopulations, and svyby will do this. The first argument gives the analysis variables. The second gives the variables that specify subpopulations. The third is the survey design object and the fourth is the analysis function. Any other arguments are passed to the analysis function (eg quantiles=0.5 in the second example below).
> svyby(~api99, ~stype, dclus1, svymean)
stype statistic
E E 607.7917
H H 595.7143
M M 608.6000
> svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5)
stype statistic
E E 615
H H 593
M M 611
As this example shows, the default is not to give standard errors. This is overridden with the argument keep.var=TRUE
> svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=TRUE)
stype sch.wide statistic.api99 statistic.api00 SE1 SE2
E.No E No 601.6667 596.3333 47.27582 43.49010
E.Yes E Yes 608.3485 653.6439 21.52493 20.31720
H.No H No 662.0000 659.3333 29.23003 27.00275
H.Yes H Yes 577.6364 607.4545 46.50125 43.70468
M.No M No 611.3750 606.3750 41.11886 41.11686
M.Yes M Yes 607.2941 643.2353 42.53046 42.12850
Here we have two subpopulation variables that jointly define six subpopulations, and the 1999 and 2000 API means, together with their standard errors, are reported in each subpopulation.
As usual, the same syntax can be used for objects with replicate weights. Here we estimate the total enrollment for each type of school (elementary, middle, high) and give the design effect
> svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
stype statistic.enroll DEff
E E 2109717.1 125.913474
H H 535594.9 5.003186
M M 759628.1 13.557221
Design effects are available only for means and totals, and if the analysis function does not compute design effects an error would be given.
Thomas Lumley
Last modified: Tue Apr 12 14:08:36 PDT 2005