surveysummary.Rd
Compute means, variances, ratios and totals for data from complex surveys.
# S3 method for survey.design
svymean(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...)
# S3 method for survey.design2
svymean(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...)
# S3 method for twophase
svymean(x, design, na.rm=FALSE,deff=FALSE,...)
# S3 method for svyrep.design
svymean(x, design, na.rm=FALSE, rho=NULL,
return.replicates=FALSE, deff=FALSE,...)
# S3 method for survey.design
svyvar(x, design, na.rm=FALSE,...)
# S3 method for svyrep.design
svyvar(x, design, na.rm=FALSE, rho=NULL,
return.replicates=FALSE,...,estimate.only=FALSE)
# S3 method for survey.design
svytotal(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...)
# S3 method for survey.design2
svytotal(x, design, na.rm=FALSE,deff=FALSE,influence=FALSE,...)
# S3 method for twophase
svytotal(x, design, na.rm=FALSE,deff=FALSE,...)
# S3 method for svyrep.design
svytotal(x, design, na.rm=FALSE, rho=NULL,
return.replicates=FALSE, deff=FALSE,...)
# S3 method for svystat
coef(object,...)
# S3 method for svrepstat
coef(object,...)
# S3 method for svystat
vcov(object,...)
# S3 method for svrepstat
vcov(object,...)
# S3 method for svystat
confint(object, parm, level = 0.95,df =Inf,...)
# S3 method for svrepstat
confint(object, parm, level = 0.95,df =Inf,...)
cv(object,...)
deff(object, quietly=FALSE,...)
make.formula(names)
A formula, vector or matrix
survey.design
or svyrep.design
object
Should cases with missing values be dropped?
Should a matrix of influence functions be returned
(primarily to support svyby
)
parameter for Fay's variance estimator in a BRR design
Return the replicate means/totals?
Return the design effect (see below)
The result of one of the other survey summary functions
Don't warn when there is no design effect computed
Don't compute standard errors (useful when
svyvar
is used to estimate the design effect)
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
the confidence level required.
degrees of freedom for t-distribution in confidence
interval, use degf(design)
for number of PSUs minus number of
strata
additional arguments to methods,not currently used
vector of character strings
These functions perform weighted estimation, with each observation being weighted by the inverse of its sampling probability. Except for the table functions, these also give precision estimates that incorporate the effects of stratification and clustering.
Factor variables are converted to sets of indicator variables for each
category in computing means and totals. Combining this with the
interaction
function, allows crosstabulations. See
ftable.svystat
for formatting the output.
With na.rm=TRUE
, all cases with missing data are removed. With
na.rm=FALSE
cases with missing data are not removed and so will
produce missing results. When using replicate weights and
na.rm=FALSE
it may be useful to set
options(na.action="na.pass")
, otherwise all replicates with any
missing results will be discarded.
The svytotal
and svreptotal
functions estimate a
population total. Use predict
on svyratio
and
svyglm
, to get ratio or regression estimates of totals.
svyvar
estimates the population variance. The object returned
includes the full matrix of estimated population variances and
covariances, but by default only the diagonal elements are printed. To
display the whole matrix use as.matrix(v)
or print(v,
covariance=TRUE)
.
The design effect compares the variance of a mean or total to the
variance from a study of the same size using simple random sampling
without replacement. Note that the design effect will be incorrect if
the weights have been rescaled so that they are not reciprocals of
sampling probabilities. To obtain an estimate of the design effect
comparing to simple random sampling with replacement, which does not
have this requirement, use deff="replace"
. This with-replacement
design effect is the square of Kish's "deft".
The design effect for a subset of a design conditions on the size of the subset. That is, it compares the variance of the estimate to the variance of an estimate based on a simple random sample of the same size as the subset, taken from the subpopulation. So, for example, under stratified random sampling the design effect in a subset consisting of a single stratum will be 1.0.
The cv
function computes the coefficient of variation of a
statistic such as ratio, mean or total. The default method is for any
object with methods for SE
and coef
.
make.formula
makes a formula from a vector of names. This is
useful because formulas as the best way to specify variables to the
survey functions.
Objects of class "svystat"
or "svrepstat"
,
which are vectors with a "var"
attribute giving the variance
and a "statistic"
attribute giving the name of the
statistic.
These objects have methods for vcov
, SE
, coef
,
confint
, svycontrast
.
svydesign
, as.svrepdesign
,
svrepdesign
for constructing design objects.
degf
to extract degrees of freedom from a design.
svyquantile
for quantiles
ftable.svystat
for more attractive tables
svyciprop
for more accurate confidence intervals for
proportions near 0 or 1.
svyttest
for comparing two means.
svycontrast
for linear and nonlinear functions of estimates.
data(api)
## one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svymean(~api00, dclus1, deff=TRUE)
#> mean SE DEff
#> api00 644.169 23.542 9.3459
svymean(~factor(stype),dclus1)
#> mean SE
#> factor(stype)E 0.786885 0.0463
#> factor(stype)H 0.076503 0.0268
#> factor(stype)M 0.136612 0.0296
svymean(~interaction(stype, comp.imp), dclus1)
#> mean SE
#> interaction(stype, comp.imp)E.No 0.174863 0.0260
#> interaction(stype, comp.imp)H.No 0.038251 0.0161
#> interaction(stype, comp.imp)M.No 0.060109 0.0246
#> interaction(stype, comp.imp)E.Yes 0.612022 0.0417
#> interaction(stype, comp.imp)H.Yes 0.038251 0.0161
#> interaction(stype, comp.imp)M.Yes 0.076503 0.0217
svyquantile(~api00, dclus1, c(.25,.5,.75))
#> $api00
#> quantile ci.2.5 ci.97.5 se
#> 0.25 552 492 627 31.47166
#> 0.5 652 561 714 35.66788
#> 0.75 719 696 777 18.88300
#>
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
svytotal(~enroll, dclus1, deff=TRUE)
#> total SE DEff
#> enroll 3404940 932235 31.311
svyratio(~api.stu, ~enroll, dclus1)
#> Ratio estimator: svyratio.survey.design2(~api.stu, ~enroll, dclus1)
#> Ratios=
#> enroll
#> api.stu 0.8497087
#> SEs=
#> enroll
#> api.stu 0.008386297
v<-svyvar(~api00+api99, dclus1)
v
#> variance SE
#> api00 11183 1386.4
#> api99 12735 1450.1
print(v, cov=TRUE)
#> variance SE
#> api00:api00 11183 1386.4
#> api99:api00 11516 1412.9
#> api00:api99 11516 1412.9
#> api99:api99 12735 1450.1
as.matrix(v)
#> api00 api99
#> api00 11182.82 11516.33
#> api99 11516.33 12735.21
#> attr(,"var")
#> api00 api00 api99 api99
#> api00 1922144 1920707 1920707 1851400
#> api00 1920707 1996169 1996169 2004475
#> api99 1920707 1996169 1996169 2004475
#> api99 1851400 2004475 2004475 2102736
#> attr(,"statistic")
#> [1] "variance"
# replicate weights - jackknife (this is slower)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw,
data=apistrat, fpc=~fpc)
jkstrat<-as.svrepdesign(dstrat)
svymean(~api00, jkstrat)
#> mean SE
#> api00 662.29 9.4089
svymean(~factor(stype),jkstrat)
#> mean SE
#> factor(stype)E 0.71376 0
#> factor(stype)H 0.12189 0
#> factor(stype)M 0.16435 0
svyvar(~api00+api99,jkstrat)
#> variance SE
#> api00 15191 1263.8
#> api99 16518 1327.1
svyquantile(~api00, jkstrat, c(.25,.5,.75))
#> $api00
#> quantile ci.2.5 ci.97.5 se
#> 0.25 565 535 597 15.71945
#> 0.5 668 642 694 13.18406
#> 0.75 756 726 778 13.18406
#>
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
svytotal(~enroll, jkstrat)
#> total SE
#> enroll 3687178 114642
svyratio(~api.stu, ~enroll, jkstrat)
#> Ratio estimator: svyratio.svyrep.design(~api.stu, ~enroll, jkstrat)
#> Ratios=
#> enroll
#> api.stu 0.8369569
#> SEs=
#> [,1]
#> [1,] 0.007772509
# coefficients of variation
cv(svytotal(~enroll,dstrat))
#> enroll
#> enroll 0.031092
cv(svyratio(~api.stu, ~enroll, jkstrat))
#> enroll
#> api.stu 0.00928663
# extracting information from the results
coef(svytotal(~enroll,dstrat))
#> enroll
#> 3687178
vcov(svymean(~api00+api99,jkstrat))
#> api00 api99
#> api00 88.52817 91.80068
#> api99 91.80068 99.28025
#> attr(,"means")
#> [1] 662.2874 629.3948
SE(svymean(~enroll, dstrat))
#> enroll
#> enroll 18.50851
confint(svymean(~api00+api00, dclus1))
#> 2.5 % 97.5 %
#> api00 598.0275 690.3113
confint(svymean(~api00+api00, dclus1), df=degf(dclus1))
#> 2.5 % 97.5 %
#> api00 593.6763 694.6625
# Design effect
svymean(~api00, dstrat, deff=TRUE)
#> mean SE DEff
#> api00 662.2874 9.4089 1.2045
svymean(~api00, dstrat, deff="replace")
#> mean SE DEff
#> api00 662.2874 9.4089 1.1656
svymean(~api00, jkstrat, deff=TRUE)
#> mean SE DEff
#> api00 662.2874 9.4089 1.2045
svymean(~api00, jkstrat, deff="replace")
#> mean SE DEff
#> api00 662.2874 9.4089 1.1656
(a<-svytotal(~enroll, dclus1, deff=TRUE))
#> total SE DEff
#> enroll 3404940 932235 31.311
deff(a)
#> enroll
#> 31.31066
## weights that are *already* calibrated to population size
sum(weights(dclus1))
#> [1] 6194
nrow(apipop)
#> [1] 6194
cdclus1<- svydesign(id=~dnum, weights=~pw, data=apiclus1,
fpc=~fpc,calibrate.formula=~1)
SE(svymean(~enroll, dclus1))
#> enroll
#> enroll 45.19137
## not equal to SE(mean)
SE(svytotal(~enroll, dclus1))/nrow(apipop)
#> enroll
#> enroll 150.5061
## equal to SE(mean)
SE(svytotal(~enroll, cdclus1))/nrow(apipop)
#> enroll
#> enroll 45.19137