**Post-stratification** is done with the postStratify function, for survey designs with or without replicate weights. This example of post-stratification is based on one at UCLA Academic Technology Services, analysing data from
*Sampling of Populations* by Levy and Lemeshow.

> dpets <- svydesign(id = ~1, weight = ~weight, data = pets, fpc = ~n)Post-stratification requires data on the population distribution of the stratifying variable. This can be supplied in a data frame or a table, here a data frame is used. The other arguments to

> pspets <- postStratify(dpets, ~type, data.frame(type = c(1, 2), Freq = c(850, 450)))The poststratified survey design object can be used in the same way as any other survey design object.For example, to show the precision gained by post-stratification:

> svytotal(~totexp,dpets) total SE totexp 51643 2888.3 > svytotal(~totexp, pspets) total SE totexp 52150 1512.5

pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018)) pop.schwide <- data.frame(sch.wide=c("No","Yes"), Freq=c(1072,5122))Now apply these to

rclus1r <- rake(rclus1, list(~stype,~sch.wide), list(pop.types, pop.schwide))Raking can also be used when the joint distribution of some variables is known. Suppose we know the joint population distribution of school type and the "comparative improvement" target

pop.imptype<-with(apipop, table(comp.imp, stype))We can then call

dclus1r <- rake(dclus1, list(~comp.imp+stype,~sch.wide), list(pop.imptype, pop.schwide))

This example uses the same `dclus1` survey design as above.
First we calibrate to the school type variable. We supply a formula
`~stype` and must specify population totals for the design
matrix produced by this formula.

pop.totals<-c(`(Intercept)`=6194, stypeH=755, stypeM=1018) dclus1g<-calibrate(dclus1, ~stype, pop.totals)We could also have given the formula as

Calibration to a single factor variable is equivalent to post-stratification, as we see from these examples:

> svymean(~api00, dclus1g) mean SE api00 642.31 23.921 > svytotal(~enroll, dclus1g) total SE enroll 3680893 406293 > svytotal(~stype, dclus1g) total SE stypeE 4421 2.308e-14 stypeH 755 5.985e-15 stypeM 1018 5.616e-14We can calibrate using the two variables

> dclus1g2 <- calibrate(dclus1, ~stype+sch.wide, c(pop.totals, sch.wideYes=5122)) > svymean(~api00, dclus1g2) mean SE api00 641 23.829 > svytotal(~enroll, dclus1g2) total SE enroll 3654414 403074 > svytotal(~stype, dclus1g2) total SE stypeE 4421 4.704e-14 stypeH 755 2.921e-15 stypeM 1018 3.821e-14The results of calibration and raking are similar, but not identical.

Finally, we can look at a calibration example with a continuous variable. Imagine that the 1999 API is known for all schools but the 2000 API is known only for the sample. Since there is a high correlation between 1999 and 2000 API (see the graphics example for a scatterplot), calibrating to 1999 API reduces the uncertainty in mean 2000 API considerably.

> svymean(~api00, dclus1g3) mean SE api00 665.31 3.4418 > svytotal(~enroll, dclus1g3) total SE enroll 3638487 385524 > svytotal(~stype, dclus1g3) total SE stypeE 4421 4.376e-14 stypeH 755 2.369e-14 stypeM 1018 2.140e-14The reduction in uncertainty is less impressive for a subpopulation, as knowing the population total does not give complete information about the subpopulation total:

> svymean(~api00, subset(dclus1g3, comp.imp=="Yes")) mean SE api00 672.05 6.5182

These examples have used a calibration working model with constant variance. Using a model with variance proportional to the covariate it is possible to express the standard ratio estimator of a total as a calibration estimator

> dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw, + data = apistrat, fpc = ~fpc) > ratio <- svyratio(~api.stu, ~enroll, dstrat, separate = FALSE) > predict(ratio, total=3811472) $total enroll api.stu 3190038 $se enroll api.stu 29565.98 > dstratg <- calibrate(dstrat, ~enroll-1, pop=3811472, lambda=1) > svytotal(~api.stu, dstratg) total SE api.stu 3190038 29566The parameter

Two refinements of regression calibration are the ability to impose bounds ont he calibration weights and to require that the weights are constant within sampling units.

In the example above, the changes in weights needed for calibration were substantial

> range(weights(dclus1g3)/weights(dclus1)) [1] 0.4185925 1.8332949The original sampling weights were decreased by as much as 60% or increased by over 80%. We can attempt a bounded regression calibration, restricting the calibration weights to be positive and less than 1.5

> (dclus1g3b <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069), bounds=c(0,1.5))) 1 - level Cluster Sampling design With (15) clusters. calibrate(dclus1, ~stype + api99, c(pop.totals, api99 = 3914069), bounds = c(0, 1.5)) > range(weights(dclus1g3b)/weights(dclus1)) [1] 0.4970466 1.5000000The bounds cannot be specified arbitrarily. Attempting a lower bound of 0.5 instead of 0 fails.

> (dclus1g3b <- calibrate(dclus1, ~stype+api99, c(pop.totals, api99=3914069), bounds=c(0.5,1.5))) Error in calibrate.survey.design2(dclus1, ~stype + api99, c(pop.totals, : Failed to achieve bounds in 11 iterations

Another useful restriction is to make weights constant within sampling units. For example, in a survey that samples households and collects data on all individuals in a household the true probability of being sampled must be identical for all individuals in the household and it is desirable for the sampling weight to also be identical.

In the one-stage cluster sample we have been using, all schools in a random sample of school districts are used, so the weights should be equal within school districts. Specifying `aggregage.stage=1`

requests that weights are constant within sampling units at the first stage of sampling.

> (dclus1agg<-calibrate(dclus1, ~stype, pop.totals, aggregate.stage=1)) 1 - level Cluster Sampling design With (15) clusters. calibrate(dclus1, ~stype, pop.totals, aggregate.stage = 1)The extra restrictions on weights will tend to mean that larger weights are needed to achieve calibration

> range(weights(dclus1agg)/weights(dclus1)) [1] 0.3234162 2.2828519 > range(weights(dclus1g)/weights(dclus1)) [1] 0.907064 1.593304For designs specified by replicate weights the sampling units may be unknown. Rather than specifying

`aggregate.stage`

, specify a
formula or vector of unit identifiers as the
`aggregate.index`

argument.
> (rclus1agg3 <- calibrate(rclus1, ~stype+api99, c(pop.totals,api99=3914069), aggregate.index=~dnum)) Call: calibrate(rclus1, ~stype + api99, c(pop.totals, api99 = 3914069), aggregate.index = ~dnum) Unstratified cluster jacknife (JK1) with 15 replicates.

`calibrate` has an argument `calfun` to specify the
calibration function: linear, raking ratio, logit. An number of
European agencies prefer logit calibration for producing fewer
extreme weights

For example, we can perform logit calibration of the one-stage cluster sample:

> dclus1g3d <- calibrate(dclus1, ~stype + api99, c(pop.totals, + api99 = 3914069), calfun = "logit", bounds = c(0.5, 2.5)) > summary(weights(dclus1g3d)/weights(dclus1)) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5944 0.7487 0.9473 1.0000 1.1740 1.9360

Thomas Lumley Last modified: Fri Jul 15 20:17:30 PDT 2005