rake.Rd
Raking uses iterative post-stratification to match marginal distributions of a survey sample to known population margins.
rake(design, sample.margins, population.margins, control = list(maxit =
10, epsilon = 1, verbose=FALSE), compress=NULL)
A survey object
list of formulas or data frames describing sample margins, which must not contain missing values
list of tables or data frames describing corresponding population margins
maxit
controls the number of
iterations. Convergence is declared if the maximum change in a table
entry is less than epsilon
. If epsilon<1
it is
taken to be a fraction of the total sampling weight.
If design
has replicate weights, attempt to
compress the new replicate weight matrix? When NULL
, will
attempt to compress if the original weight matrix was compressed
The sample.margins
should be in a format suitable for postStratify
.
Raking (aka iterative proportional fitting) is known to converge for any table without zeros, and for any table with zeros for which there is a joint distribution with the given margins and the same pattern of zeros. The `margins' need not be one-dimensional.
The algorithm works by repeated calls to postStratify
(iterative proportional fitting), which is efficient for large
multiway tables. For small tables calibrate
will be
faster, and also allows raking to population totals for continuous
variables, and raking with bounded weights.
A raked survey design.
calibrate
for other ways to use auxiliary information.
data(api)
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
rclus1 <- as.svrepdesign(dclus1)
svymean(~api00, rclus1)
#> mean SE
#> api00 644.17 26.329
svytotal(~enroll, rclus1)
#> total SE
#> enroll 3404940 932235
## population marginal totals for each stratum
pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018))
pop.schwide <- data.frame(sch.wide=c("No","Yes"), Freq=c(1072,5122))
rclus1r <- rake(rclus1, list(~stype,~sch.wide), list(pop.types, pop.schwide))
svymean(~api00, rclus1r)
#> mean SE
#> api00 641.23 26.873
svytotal(~enroll, rclus1r)
#> total SE
#> enroll 3647300 463511
## marginal totals correspond to population
xtabs(~stype, apipop)
#> stype
#> E H M
#> 4421 755 1018
svytable(~stype, rclus1r, round=TRUE)
#> stype
#> E H M
#> 4421 755 1018
xtabs(~sch.wide, apipop)
#> sch.wide
#> No Yes
#> 1072 5122
svytable(~sch.wide, rclus1r, round=TRUE)
#> sch.wide
#> No Yes
#> 1072 5122
## joint totals don't correspond
xtabs(~stype+sch.wide, apipop)
#> sch.wide
#> stype No Yes
#> E 472 3949
#> H 334 421
#> M 266 752
svytable(~stype+sch.wide, rclus1r, round=TRUE)
#> sch.wide
#> stype No Yes
#> E 478 3943
#> H 201 554
#> M 393 625
## Do it for a design without replicate weights
dclus1r<-rake(dclus1, list(~stype,~sch.wide), list(pop.types, pop.schwide))
svymean(~api00, dclus1r)
#> mean SE
#> api00 641.23 23.704
svytotal(~enroll, dclus1r)
#> total SE
#> enroll 3647300 400603
## compare to raking with calibrate()
dclus1gr<-calibrate(dclus1, ~stype+sch.wide, pop=c(6194, 755,1018,5122),
calfun="raking")
svymean(~stype+api00, dclus1r)
#> mean SE
#> stypeE 0.71375 0.000
#> stypeH 0.12189 0.000
#> stypeM 0.16436 0.000
#> api00 641.23025 23.704
svymean(~stype+api00, dclus1gr)
#> mean SE
#> stypeE 0.71376 0.000
#> stypeH 0.12189 0.000
#> stypeM 0.16435 0.000
#> api00 641.23032 23.704
## compare to joint post-stratification
## (only possible if joint population table is known)
##
pop.table <- xtabs(~stype+sch.wide,apipop)
rclus1ps <- postStratify(rclus1, ~stype+sch.wide, pop.table)
svytable(~stype+sch.wide, rclus1ps, round=TRUE)
#> sch.wide
#> stype No Yes
#> E 472 3949
#> H 334 421
#> M 266 752
svymean(~api00, rclus1ps)
#> mean SE
#> api00 643.15 27.102
svytotal(~enroll, rclus1ps)
#> total SE
#> enroll 3610219 468086
## Example of raking with partial joint distributions
pop.imp<-data.frame(comp.imp=c("No","Yes"),Freq=c(1712,4482))
dclus1r2<-rake(dclus1, list(~stype+sch.wide, ~comp.imp),
list(pop.table, pop.imp))
svymean(~api00, dclus1r2)
#> mean SE
#> api00 642.62 22.732
## compare to calibrate() syntax with tables
dclus1r2<-calibrate(dclus1, formula=list(~stype+sch.wide, ~comp.imp),
population=list(pop.table, pop.imp),calfun="raking")
svymean(~api00, dclus1r2)
#> mean SE
#> api00 642.61 22.731