data(api)The apistrat data frame has stratified independent sample
dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)stratified on stype, with sampling weights pw. The fpc variable contains the population size for the stratum. As the schools are sampled independently, each record in the data frame is a separate PSU. This is indicated by id=~1. Since the sampling weights could have been determined from the population size an equivalent declaration would be
dstrat <- svydesign(id=~1,strata=~stype, data=apistrat, fpc=~fpc)
The apiclus1 data frame is a cluster sample: all schools in a random sample of school districts.
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)There is no strata argument as the sampling was not stratified. The variable dnum identifies school districts (PSUs) and is specified as the id argument. Again, the weights argument is optional, as the sampling weights can be computed from the population size. To specify sampling with replacement, simply omit the fpc argument:
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1)
A design may have strata and clusters. In that case svydesign assumes that the clusters are numbered uniquely across the entire sample, rather than just within a stratum. This enables some sorts of data errors to be detected. If your clusters are only numbered uniquely within a stratum use the option nest=TRUE to specify this and disable the checking.
The apiclus2 data set contains a two-stage cluster sample. First, school districts were sampled. If there were fewer than five schools in the district, all were taken, otherwise a random sample of five.
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)The multistage nature of the sampling is clear in the id and fpc arguments. At the first stage the sampling units are identified by dnum and the population size by fpc1. At the second stage, units within each school district are identified by snum and the number of units within the district by fpc2. When a finite population correction is not given, and sampling is with replacement, only the first stage of the design is needed. The following two declarations are equivalent for treating the two-stage cluster design as if the first stage were with replacement.
dclus2wr <- svydesign(id=~dnum+snum, weights=~pw, data=apiclus2) dclus2wr2 <- svydesign(id=~dnum, weights=~pw, data=apiclus2)