Compute means, variances, ratios and totals for data from complex surveys.

# S3 method for survey.design
svymean(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...) 
# S3 method for survey.design2
svymean(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...) 
# S3 method for twophase
svymean(x, design, na.rm=FALSE,deff=FALSE,...) 
# S3 method for svyrep.design
svymean(x, design, na.rm=FALSE, rho=NULL,
  return.replicates=FALSE, deff=FALSE,...) 
# S3 method for survey.design
svyvar(x, design, na.rm=FALSE,...) 
# S3 method for svyrep.design
svyvar(x, design, na.rm=FALSE, rho=NULL,
   return.replicates=FALSE,...,estimate.only=FALSE) 
# S3 method for survey.design
svytotal(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...) 
# S3 method for survey.design2
svytotal(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...) 
# S3 method for twophase
svytotal(x, design, na.rm=FALSE,deff=FALSE,...) 
# S3 method for svyrep.design
svytotal(x, design, na.rm=FALSE, rho=NULL,
   return.replicates=FALSE, deff=FALSE,...)
# S3 method for svystat
coef(object,...)
# S3 method for svrepstat
coef(object,...)
# S3 method for svystat
vcov(object,...)
# S3 method for svrepstat
vcov(object,...)
# S3 method for svystat
confint(object,  parm, level = 0.95,df =Inf,...)
# S3 method for svrepstat
confint(object,  parm, level = 0.95,df =Inf,...)
cv(object,...)
deff(object, quietly=FALSE,...)
make.formula(names)

Arguments

x

A formula, vector or matrix

design

survey.design or svyrep.design object

na.rm

Should cases with missing values be dropped?

influence

Should a matrix of influence functions be returned (primarily to support svyby)

rho

parameter for Fay's variance estimator in a BRR design

return.replicates

Return the replicate means/totals?

deff

Return the design effect (see below)

object

The result of one of the other survey summary functions

quietly

Don't warn when there is no design effect computed

estimate.only

Don't compute standard errors (useful when svyvar is used to estimate the design effect)

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

df

degrees of freedom for t-distribution in confidence interval, use degf(design) for number of PSUs minus number of strata

...

additional arguments to methods,not currently used

names

vector of character strings

Details

These functions perform weighted estimation, with each observation being weighted by the inverse of its sampling probability. Except for the table functions, these also give precision estimates that incorporate the effects of stratification and clustering.

Factor variables are converted to sets of indicator variables for each category in computing means and totals. Combining this with the interaction function, allows crosstabulations. See ftable.svystat for formatting the output.

With na.rm=TRUE, all cases with missing data are removed. With na.rm=FALSE cases with missing data are not removed and so will produce missing results. When using replicate weights and na.rm=FALSE it may be useful to set options(na.action="na.pass"), otherwise all replicates with any missing results will be discarded.

The svytotal and svreptotal functions estimate a population total. Use predict on svyratio and svyglm, to get ratio or regression estimates of totals.

svyvar estimates the population variance. The object returned includes the full matrix of estimated population variances and covariances, but by default only the diagonal elements are printed. To display the whole matrix use as.matrix(v) or print(v, covariance=TRUE).

The design effect compares the variance of a mean or total to the variance from a study of the same size using simple random sampling without replacement. Note that the design effect will be incorrect if the weights have been rescaled so that they are not reciprocals of sampling probabilities. To obtain an estimate of the design effect comparing to simple random sampling with replacement, which does not have this requirement, use deff="replace". This with-replacement design effect is the square of Kish's "deft".

The design effect for a subset of a design conditions on the size of the subset. That is, it compares the variance of the estimate to the variance of an estimate based on a simple random sample of the same size as the subset, taken from the subpopulation. So, for example, under stratified random sampling the design effect in a subset consisting of a single stratum will be 1.0.

The cv function computes the coefficient of variation of a statistic such as ratio, mean or total. The default method is for any object with methods for SE and coef.

make.formula makes a formula from a vector of names. This is useful because formulas as the best way to specify variables to the survey functions.

Value

Objects of class "svystat" or "svrepstat", which are vectors with a "var" attribute giving the variance and a "statistic" attribute giving the name of the statistic.

These objects have methods for vcov, SE, coef,

confint, svycontrast.

Author

Thomas Lumley

See also

svydesign, as.svrepdesign, svrepdesign for constructing design objects.

degf to extract degrees of freedom from a design.

svyquantile for quantiles

ftable.svystat for more attractive tables

svyciprop for more accurate confidence intervals for proportions near 0 or 1.

svyttest for comparing two means.

svycontrast for linear and nonlinear functions of estimates.

Examples


  data(api)

  ## one-stage cluster sample
  dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

  svymean(~api00, dclus1, deff=TRUE)
#>          mean      SE   DEff
#> api00 644.169  23.542 9.3459
  svymean(~factor(stype),dclus1)
#>                    mean     SE
#> factor(stype)E 0.786885 0.0463
#> factor(stype)H 0.076503 0.0268
#> factor(stype)M 0.136612 0.0296
  svymean(~interaction(stype, comp.imp), dclus1)
#>                                       mean     SE
#> interaction(stype, comp.imp)E.No  0.174863 0.0260
#> interaction(stype, comp.imp)H.No  0.038251 0.0161
#> interaction(stype, comp.imp)M.No  0.060109 0.0246
#> interaction(stype, comp.imp)E.Yes 0.612022 0.0417
#> interaction(stype, comp.imp)H.Yes 0.038251 0.0161
#> interaction(stype, comp.imp)M.Yes 0.076503 0.0217
  svyquantile(~api00, dclus1, c(.25,.5,.75))
#> $api00
#>      quantile ci.2.5 ci.97.5       se
#> 0.25      552    492     627 31.47166
#> 0.5       652    561     714 35.66788
#> 0.75      719    696     777 18.88300
#> 
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
  svytotal(~enroll, dclus1, deff=TRUE)
#>          total      SE   DEff
#> enroll 3404940  932235 31.311
  svyratio(~api.stu, ~enroll, dclus1)
#> Ratio estimator: svyratio.survey.design2(~api.stu, ~enroll, dclus1)
#> Ratios=
#>            enroll
#> api.stu 0.8497087
#> SEs=
#>              enroll
#> api.stu 0.008386297

  v<-svyvar(~api00+api99, dclus1)
  v
#>       variance     SE
#> api00    11183 1386.4
#> api99    12735 1450.1
  print(v, cov=TRUE)
#>             variance     SE
#> api00:api00    11183 1386.4
#> api99:api00    11516 1412.9
#> api00:api99    11516 1412.9
#> api99:api99    12735 1450.1
  as.matrix(v)
#>          api00    api99
#> api00 11182.82 11516.33
#> api99 11516.33 12735.21
#> attr(,"var")
#>         api00   api00   api99   api99
#> api00 1922144 1920707 1920707 1851400
#> api00 1920707 1996169 1996169 2004475
#> api99 1920707 1996169 1996169 2004475
#> api99 1851400 2004475 2004475 2102736
#> attr(,"statistic")
#> [1] "variance"


  # replicate weights - jackknife (this is slower)
  dstrat<-svydesign(id=~1,strata=~stype, weights=~pw,
        data=apistrat, fpc=~fpc)
  jkstrat<-as.svrepdesign(dstrat)

  svymean(~api00, jkstrat)
#>         mean     SE
#> api00 662.29 9.4089
  svymean(~factor(stype),jkstrat)
#>                   mean SE
#> factor(stype)E 0.71376  0
#> factor(stype)H 0.12189  0
#> factor(stype)M 0.16435  0
  svyvar(~api00+api99,jkstrat)
#>       variance     SE
#> api00    15191 1263.8
#> api99    16518 1327.1

  svyquantile(~api00, jkstrat, c(.25,.5,.75))
#> $api00
#>      quantile ci.2.5 ci.97.5       se
#> 0.25      565    535     597 15.71945
#> 0.5       668    642     694 13.18406
#> 0.75      756    726     778 13.18406
#> 
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
  svytotal(~enroll, jkstrat)
#>          total     SE
#> enroll 3687178 114642
  svyratio(~api.stu, ~enroll, jkstrat)
#> Ratio estimator: svyratio.svyrep.design(~api.stu, ~enroll, jkstrat)
#> Ratios=
#>            enroll
#> api.stu 0.8369569
#> SEs=
#>             [,1]
#> [1,] 0.007772509

  # coefficients of variation
  cv(svytotal(~enroll,dstrat))
#>          enroll
#> enroll 0.031092
  cv(svyratio(~api.stu, ~enroll, jkstrat))
#>             enroll
#> api.stu 0.00928663

  # extracting information from the results
  coef(svytotal(~enroll,dstrat))
#>  enroll 
#> 3687178 
  vcov(svymean(~api00+api99,jkstrat))
#>          api00    api99
#> api00 88.52817 91.80068
#> api99 91.80068 99.28025
#> attr(,"means")
#> [1] 662.2874 629.3948
  SE(svymean(~enroll, dstrat))
#>          enroll
#> enroll 18.50851
  confint(svymean(~api00+api00, dclus1))
#>          2.5 %   97.5 %
#> api00 598.0275 690.3113
  confint(svymean(~api00+api00, dclus1), df=degf(dclus1))
#>          2.5 %   97.5 %
#> api00 593.6763 694.6625

  # Design effect
  svymean(~api00, dstrat, deff=TRUE)
#>           mean       SE   DEff
#> api00 662.2874   9.4089 1.2045
  svymean(~api00, dstrat, deff="replace")
#>           mean       SE   DEff
#> api00 662.2874   9.4089 1.1656
  svymean(~api00, jkstrat, deff=TRUE)
#>           mean       SE   DEff
#> api00 662.2874   9.4089 1.2045
  svymean(~api00, jkstrat, deff="replace")
#>           mean       SE   DEff
#> api00 662.2874   9.4089 1.1656
 (a<-svytotal(~enroll, dclus1, deff=TRUE))
#>          total      SE   DEff
#> enroll 3404940  932235 31.311
  deff(a)
#>   enroll 
#> 31.31066 

## weights that are *already* calibrated to population size
sum(weights(dclus1))
#> [1] 6194
nrow(apipop)
#> [1] 6194
cdclus1<- svydesign(id=~dnum, weights=~pw, data=apiclus1,
fpc=~fpc,calibrate.formula=~1)
SE(svymean(~enroll, dclus1))
#>          enroll
#> enroll 45.19137
## not equal to SE(mean)
SE(svytotal(~enroll, dclus1))/nrow(apipop)
#>          enroll
#> enroll 150.5061
## equal to SE(mean)
SE(svytotal(~enroll, cdclus1))/nrow(apipop)
#>          enroll
#> enroll 45.19137