> library(survey) > options(survey.lonely.psu="remove") > nhis<-read.dta("nhis.dta")The analysis recommended for packages other than SUDAAN uses pseudo-stratum and pseudo-psu variables that we must create before creating the design object.
> nhis$cstratum<-nhis$psupseud %/% 10 > nhis$cpsu<-nhis$psupseud %% 10 > > dnhis1<-svydesign(id = ~cpsu, strata = ~cstratum, weight = ~wtfa, + data = nhis,nest=TRUE)For the full multistage design object we need to create variables that describe the finite population corrections. Data are not provided on population sizes, so we pretend that the non-certainty PSUs are sampled with replacement. We could also use a small number such as 0.01 rather than 0. Since we are specifying the sampling weights separately this would not have much impact. Of course, if we were computing the sampling weights from the population sizes it would be critical to get them right.
> nhis$selfrep<-ifelse(nhis$psutype =="MSA- self-representing",1,0) > nhis$zero<-rep(0,nrow(nhis)) > dnhis2<-svydesign(id = ~psupseud + segnum, strata = ~stratum + subtype, + weight = ~wtfa, data = nhis, + fpc = ~selfrep+zero) Warning message: FPC implies population larger than ten billion. in: as.fpc(fpc, strata, ids)We can now compute means for a few variables to compare the two sets of estimates. The multistage design gives slightly smaller standard errors. It is also distinctly slower to run (7.5s vs 1.6s, still not bad for a national-scale survey).
> svymean(~height+weight+nacute+ncond, dnhis1, na.rm=TRUE) mean SE height 67.306288 0.0253 weight 161.112444 0.2017 nacute 0.063934 0.0015 ncond 0.604453 0.0083 > > svymean(~height+weight+nacute+ncond, dnhis2, na.rm=TRUE) mean SE height 67.306288 0.0235 weight 161.112444 0.1879 nacute 0.063934 0.0014 ncond 0.604453 0.0069