svyrecvar.Rd
Compute the variance of a total under multistage sampling, using a recursive descent algorithm.
Matrix of data or estimating functions
Data frame or matrix with cluster ids for each stage
Strata for each stage
Information on population and sample size for each stage,
created by as.fpc
post-stratification information as created by
postStratify
or calibrate
How to handle strata with a single PSU
If TRUE
, compute a one-stage
(ultimate-cluster) estimator
The main use of this function is to compute the variance of the sum of a set of estimating functions under multistage sampling. The sampling is assumed to be simple or stratified random sampling within clusters at each stage except perhaps the last stage. The variance of a statistic is computed from the variance of estimating functions as described by Binder (1983).
Use one.stage=FALSE
for compatibility with other software that
does not perform multi-stage calculations, and set
options(survey.ultimate.cluster=TRUE)
to make this the default.
The idea of a recursive algorithm is due to Bellhouse (1985). Texts such as Cochran (1977) and Sarndal et al (1991) describe the decomposition of the variance into a single-stage between-cluster estimator and a within-cluster estimator, and this is applied recursively.
If one.stage
is a positive integer it specifies the number of
stages of sampling to use in the recursive estimator.
If pps="brewer"
, standard errors are estimated using Brewer's
approximation for PPS without replacement, option 2 of those described
by Berger (2004). The fpc
argument must then be specified in
terms of sampling fractions, not population sizes (or omitted, but
then the pps
argument would have no effect and the
with-replacement standard errors would be correct).
A covariance matrix
Bellhouse DR (1985) Computing Methods for Variance Estimation in Complex Surveys. Journal of Official Statistics. Vol.1, No.3, 1985
Berger, Y.G. (2004), A Simple Variance Estimator for Unequal Probability Sampling Without Replacement. Journal of Applied Statistics, 31, 305-315.
Binder, David A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51, 279-292.
Brewer KRW (2002) Combined Survey Sampling Inference (Weighing Basu's Elephants) [Chapter 9]
Cochran, W. (1977) Sampling Techniques. 3rd edition. Wiley.
Sarndal C-E, Swensson B, Wretman J (1991) Model Assisted Survey Sampling. Springer.
A simple set of finite population corrections will only be exactly correct when each successive stage uses simple or stratified random sampling without replacement. A correction under general unequal probability sampling (eg PPS) would require joint inclusion probabilities (or, at least, sampling probabilities for units not included in the sample), information not generally available.
The quality of Brewer's approximation is excellent in Berger's simulations, but the accuracy may vary depending on the sampling algorithm used.
data(mu284)
dmu284<-svydesign(id=~id1+id2,fpc=~n1+n2, data=mu284)
svytotal(~y1, dmu284)
#> total SE
#> y1 15080 2274.3
data(api)
# two-stage cluster sample
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)
summary(dclus2)
#> 2 - level Cluster Sampling design
#> With (40, 126) clusters.
#> svydesign(id = ~dnum + snum, fpc = ~fpc1 + fpc2, data = apiclus2)
#> Probabilities:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.003669 0.037743 0.052840 0.042390 0.052840 0.052840
#> Population size (PSUs): 757
#> Data variables:
#> [1] "cds" "stype" "name" "sname" "snum" "dname"
#> [7] "dnum" "cname" "cnum" "flag" "pcttest" "api00"
#> [13] "api99" "target" "growth" "sch.wide" "comp.imp" "both"
#> [19] "awards" "meals" "ell" "yr.rnd" "mobility" "acs.k3"
#> [25] "acs.46" "acs.core" "pct.resp" "not.hsg" "hsg" "some.col"
#> [31] "col.grad" "grad.sch" "avg.ed" "full" "emer" "enroll"
#> [37] "api.stu" "pw" "fpc1" "fpc2"
svymean(~api00, dclus2)
#> mean SE
#> api00 670.81 30.099
svytotal(~enroll, dclus2,na.rm=TRUE)
#> total SE
#> enroll 2639273 799638
# bootstrap for multistage sample
mrbclus2<-as.svrepdesign(dclus2, type="mrb", replicates=100)
svytotal(~enroll, mrbclus2, na.rm=TRUE)
#> total SE
#> enroll 2639273 857469
# two-stage `with replacement'
dclus2wr<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
summary(dclus2wr)
#> 2 - level Cluster Sampling design (with replacement)
#> With (40, 126) clusters.
#> svydesign(id = ~dnum + snum, weights = ~pw, data = apiclus2)
#> Probabilities:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.003669 0.037743 0.052840 0.042390 0.052840 0.052840
#> Data variables:
#> [1] "cds" "stype" "name" "sname" "snum" "dname"
#> [7] "dnum" "cname" "cnum" "flag" "pcttest" "api00"
#> [13] "api99" "target" "growth" "sch.wide" "comp.imp" "both"
#> [19] "awards" "meals" "ell" "yr.rnd" "mobility" "acs.k3"
#> [25] "acs.46" "acs.core" "pct.resp" "not.hsg" "hsg" "some.col"
#> [31] "col.grad" "grad.sch" "avg.ed" "full" "emer" "enroll"
#> [37] "api.stu" "pw" "fpc1" "fpc2"
svymean(~api00, dclus2wr)
#> mean SE
#> api00 670.81 30.712
svytotal(~enroll, dclus2wr,na.rm=TRUE)
#> total SE
#> enroll 2639273 820261