calibrate {survey} | R Documentation |
Calibration (or GREG) estimators generalise post-stratification and
raking by calibrating a sample to the marginal totals of
variables in a linear regression model. This function reweights the
survey design and adds additional information that is used by
svyrecvar
to reduce the estimated standard errors.
calibrate(design,...) ## S3 method for class 'survey.design2': calibrate(design, formula, population, aggregate.stage=NULL, stage=0, variance=NULL, bounds=c(-Inf,Inf), calfun=c("linear","raking","logit"), maxit=50,epsilon=1e-7,verbose=FALSE,force=FALSE,...) ## S3 method for class 'svyrep.design': calibrate(design, formula, population,compress=NA, aggregate.index=NULL, variance=NULL, bounds=c(-Inf,Inf), calfun=c("linear","raking","logit"), maxit=50, epsilon=1e-7, verbose=FALSE,force=FALSE, ...) ## S3 method for class 'twophase': calibrate(design, phase=2,formula, population, calfun=c("linear","raking","logit","rrz"),...) grake(mm,ww,calfun,eta=rep(0,NCOL(mm)),bounds,population,epsilon, verbose,maxit)
design |
survey design object |
formula |
model formula for calibration model |
population |
Vectors of population column totals for the model matrix in the calibration model, or list of such vectors for each cluster. Required except for two-phase designs |
compress |
compress the resulting replicate weights if
TRUE or if NA and weights were previously compressed |
stage |
See Details below |
variance |
Coefficients for variance in calibration model (see Details below) |
aggregate.stage |
An integer. If not NULL , make calibration weights
constant within sampling units at this stage. |
aggregate.index |
A vector or one-sided formula. If not NULL , make calibration weights
constant within levels of this variable |
bounds |
Bounds for the calibration weights, optional
except for calfun="logit" |
... |
options for other methods |
calfun |
Calibration function: see below |
maxit |
Number of iterations |
epsilon |
tolerance in matching population total |
verbose |
print lots of uninteresting information |
force |
Return an answer even if the specified accuracy was not achieved |
phase |
Phase of a two-phase design to calibrate (only
phase=2 currently implemented.) |
mm |
model matrix |
ww |
vector of weights |
eta |
starting values for iteration |
If the population
argument has a names attribute it will be
checked against the names produced by model.matrix(formula)
and
reordered if necessary. This protects against situations where the
(locale-dependent) ordering of factor levels is not what you expected.
The calibrate
function implements linear, bounded linear,
raking, bounded raking, and logit calibration functions. All except
unbounded linear calibration use the Newton-Raphson algorithm
described by Deville et al (1993). This algorithm is exposed for other
uses in the grake
function. Unbounded linear calibration uses
an algorithm that is less sensitive to collinearity. The calibration
function may be specified as a string naming one of the three built-in
functions or as an object of class calfun
, allowing
user-defined functions. See make.calfun
for details.
Calibration with bounds, or on highly collinear data, may fail. If
force=TRUE
the approximately calibrated design object will
still be returned (useful for examining why it failed). A failure in
calibrating a set of replicate weights when the sampling weights were
successfully calibrated will give only a warning, not an error.
For two-phase designs calfun="rrz"
estimates the sampling
probabilities using logistic regression as described by Robins et al
(1994). estWeights
will do the same thing.
Calibration may result in observations within the last-stage sampling
units having unequal weight even though they necessarily are sampled
together. Specifying aggegrate.stage
ensures that the
calibration weight adjustments are constant within sampling units at
the specified stage; if the original sampling weights were equal the
final weights will also be equal. The algorithm is as described by
Vanderhoeft (2001, section III.D). Specifying aggregate.index
does the same thing for replicate weight designs; a warning will be
given if the original weights are not constant within levels of
aggregate.index
.
In a model with two-stage sampling, population totals may be available
for the PSUs actually sampled, but not for the whole population. In
this situation, calibrating within each PSU reduces with second-stage
contribution to variance. This generalizes to multistage sampling.
The stage
argument specifies which stage of sampling the totals
refer to. Stage 0 is full population totals, stage 1 is totals for
PSUs, and so on. The default, stage=NULL
is interpreted as
stage 0 when a single population vector is supplied and stage 1 when a
list is supplied. Calibrating to PSU totals will fail (with a message
about an exactly singular matrix) for PSUs that have fewer
observations than the number of calibration variables.
For unbounded linear calibration only, the variance in the calibration
model may depend on covariates. If variance=NULL
the
calibration model has constant variance. If variance
is not NULL
it specifies a linear combination of the columns of the model matrix
and the calibration variance is proportional to that linear
combination.
The design matrix specified by formula (after any aggregation) must be of full rank, with one exception. If the population total for a column is zero and all the observations are zero the column will be ignored. This allows the use of factors where the population happens to have no observations at some level.
In a two-phase design, population
may be omitted when
phase=2
, to specify calibration to the phase-one sample. If the
two-phase design object was constructed using the more memory-efficient
method="approx"
argument to twophase
, calibration of the first
phase of sampling to the population is not supported.
A survey design object.
Deville J-C, Sarndal C-E, Sautory O (1993) Generalized Raking Procedures in Survey Sampling. JASA 88:1013-1020
Kalton G, Flores-Cervantes I (2003) "Weighting methods" J Official Stat 19(2) 81-97
Sarndal C-E, Swensson B, Wretman J. "Model Assisted Survey Sampling". Springer. 1991.
Rao JNK, Yung W, Hidiroglou MA (2002) Estimating equations for the analysis of survey data using poststratification information. Sankhya 64 Series A Part 2, 364-378.
Robins JM, Rotnitzky A, Zhao LP. (1994) Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846-866.
Vanderhoeft C (2001) Generalized Calibration at Statistics Belgium. Statistics Belgium Working Paper No 3. http://www.statbel.fgov.be/studies/paper03_en.asp
postStratify
, rake
for other ways
to use auxiliary information
twophase
and vignette("epi")
for an example of calibration in two-phase designs
survey/tests/kalton.R
for examples replicating those in Kalton & Flores-Cervantes (2003)
make.calfun
for user-defined calibration distances.
data(api) dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) pop.totals<-c(`(Intercept)`=6194, stypeH=755, stypeM=1018) ## For a single factor variable this is equivalent to ## postStratify (dclus1g<-calibrate(dclus1, ~stype, pop.totals)) svymean(~api00, dclus1g) svytotal(~enroll, dclus1g) svytotal(~stype, dclus1g) ## Make weights constant within school district (dclus1agg<-calibrate(dclus1, ~stype, pop.totals, aggregate=1)) svymean(~api00, dclus1agg) svytotal(~enroll, dclus1agg) svytotal(~stype, dclus1agg) ## Now add sch.wide (dclus1g2 <- calibrate(dclus1, ~stype+sch.wide, c(pop.totals, sch.wideYes=5122))) svymean(~api00, dclus1g2) svytotal(~enroll, dclus1g2) svytotal(~stype, dclus1g2) ## Finally, calibrate on 1999 API and school type (dclus1g3 <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069))) svymean(~api00, dclus1g3) svytotal(~enroll, dclus1g3) svytotal(~stype, dclus1g3) ## Same syntax with replicate weights rclus1<-as.svrepdesign(dclus1) (rclus1g3 <- calibrate(rclus1, ~stype+api99, c(pop.totals, api99=3914069))) svymean(~api00, rclus1g3) svytotal(~enroll, rclus1g3) svytotal(~stype, rclus1g3) (rclus1agg3 <- calibrate(rclus1, ~stype+api99, c(pop.totals,api99=3914069), aggregate.index=~dnum)) svymean(~api00, rclus1agg3) svytotal(~enroll, rclus1agg3) svytotal(~stype, rclus1agg3) image(rclus1agg3) ### ## Bounded weights range(weights(dclus1g3)/weights(dclus1)) (dclus1g3b <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069),bounds=c(0.6,1.6))) range(weights(dclus1g3b)/weights(dclus1)) svymean(~api00, dclus1g3b) svytotal(~enroll, dclus1g3b) svytotal(~stype, dclus1g3b) ## generalised raking (dclus1g3c <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069), calfun="raking")) range(weights(dclus1g3c)/weights(dclus1)) (dclus1g3c <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069), calfun=cal.raking)) range(weights(dclus1g3c)/weights(dclus1)) (dclus1g3d <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069), calfun="logit",bounds=c(0.5,2.5))) range(weights(dclus1g3d)/weights(dclus1)) ## Ratio estimators are calibration estimators dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) rstrat<-as.svrepdesign(dstrat) svytotal(~api.stu,dstrat) common<-svyratio(~api.stu, ~enroll, dstrat, separate=FALSE) predict(common, total=3811472) pop<-3811472 ## equivalent to (common) ratio estimator dstratg1<-calibrate(dstrat,~enroll-1, pop, variance=1) svytotal(~api.stu, dstratg1) rstratg1<-calibrate(rstrat,~enroll-1, pop, variance=1) svytotal(~api.stu, rstratg1)