Survey statistics on subsets

Compute survey statistics on subsets of a survey defined by factors.

svyby(formula, by ,design,...)
# S3 method for default
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
 na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
multicore=getOption("survey.multicore"))
# S3 method for survey.design2
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, influence=covmat, 
 na.rm.by=FALSE, na.rm.all=FALSE, stringsAsFactors=TRUE,
 multicore=getOption("survey.multicore"))

# S3 method for svyby
SE(object,...)
# S3 method for svyby
deff(object,...)
# S3 method for svyby
coef(object,...)
# S3 method for svyby
confint(object,  parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)
svybys(formula,  bys,  design, FUN, ...)

Arguments

formula,x: A formula specifying the variables to pass to FUN (or a matrix, data frame, or vector)
by: A formula specifying factors that define subsets, or a list of factors.
design: A svydesign or svrepdesign object
FUN: A function taking a formula and survey design object as its first two arguments.
...: Other arguments to FUN. NOTE: if any of the names of these are partial matches to formula,by, or design, you must specify the formula,by, or design argument by name, not just by position.
deff: Request a design effect from FUN
keep.var: If FUN returns a svystat object, extract standard errors from it
keep.names: Define row names based on the subsets
verbose: If TRUE, print a label for each subset as it is processed.
vartype: Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance
drop.empty.groups: If FALSE, report NA for empty groups, if TRUE drop them from the output
na.rm.by: If true, omit groups defined by NA values of the by variables

na.rm.all: If true, check for groups with no non-missing observations for variables defined by formula and treat these groups as empty. Doesn't make much sense without na.rm=TRUE
covmat: If TRUE, compute covariances between estimates for different subsets. Allows svycontrast to be used on output. Requires that FUN supports either return.replicates=TRUE or influence=TRUE
return.replicates: Only for replicate-weight designs. If TRUE, return all the replicates as the "replicates" attribute of the result
influence: Return the influence functions of the result
multicore: Use multicore package to distribute subsets over multiple processors?
stringsAsFactors: Convert any string variables in formula to factors before calling FUN, so that the factor levels will be the same in all groups (See Note below). Potentially slow.
parm: a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
level: the confidence level required.
df: degrees of freedom for t-distribution in confidence interval, use degf(design) for number of PSUs minus number of strata
object: An object of class "svyby"
bys: one-sided formula with each term specifying a grouping (rather than being combined to give a grouping

Value

An object of class "svyby": a data frame showing the factors and the results of FUN.

For unwtd.count, the unweighted number of non-missing observations in the data matrix specified by x for the design.

Details

The variance type "ci" asks for confidence intervals, which are produced by confint. In some cases additional options to FUN will be needed to produce confidence intervals, for example, svyquantile needs ci=TRUE or keep.var=FALSE.

unwtd.count is designed to be passed to svyby to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with multicore=TRUE is useful only for fairly large problems and on computers with sufficient memory. The multicore package is incompatible with some GUIs, although the Mac Aqua GUI appears to be safe.

The variant svybys creates a separate table for each term in bys rather than creating a joint table.

Note

The function works by making a lot of calls of the form FUN(formula, subset(design, by==i)), where formula is re-evaluated in each subset, so it is unwise to use data-dependent terms in formula. In particular, svyby(~factor(a), ~b, design=d, svymean), will create factor variables whose levels are only those values of a present in each subset. If a is a character variable then svyby(~a, ~b, design=d, svymean) creates factor variables implicitly and so has the same problem. Either use update.survey.design to add variables to the design object instead or specify the levels explicitly in the call to factor. The stringsAsFactors=TRUE option converts all character variables to factors, which can be slow, set it to FALSE if you have predefined factors where necessary.

Note

Asking for a design effect (deff=TRUE) from a function that does not produce one will cause an error or incorrect formatting of the output. The same will occur with keep.var=TRUE if the function does not compute a standard error.

Examples

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99, ~stype, dclus1, svymean)
#>   stype    api99       se
#> E     E 607.7917 22.81660
#> H     H 595.7143 41.76400
#> M     M 608.6000 32.56064
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
#>   stype api99 ci_l ci_u
#> E     E   615  520  684
#> H     H   593  443  715
#> M     M   615  575  700
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
#>   stype   1   2   3        4
#> E     E 615 520 684 38.23224
#> H     H 593 443 715 57.51442
#> M     M 615 575 700 28.39638
svyby(~api99, list(school.type=apiclus1$stype), dclus1, svymean)
#>   school.type    api99       se
#> E           E 607.7917 22.81660
#> H           H 595.7143 41.76400
#> M           M 608.6000 32.56064
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
#>   stype    api99    api00 ci_l.api99 ci_l.api00 ci_u.api99 ci_u.api00
#> E     E 607.7917 648.8681   563.0720   605.0385   652.5114   692.6976
#> H     H 595.7143 618.5714   513.8583   544.0531   677.5702   693.0897
#> M     M 608.6000 631.4400   544.7823   569.4866   672.4177   693.3934
#>   DEff.api99 DEff.api00
#> E   5.895734   6.583674
#> H   2.211866   2.228259
#> M   2.226990   2.163900
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
#>       stype sch.wide statistic.api99 statistic.api00
#> E.No      E       No        601.6667        596.3333
#> H.No      H       No        662.0000        659.3333
#> M.No      M       No        611.3750        606.3750
#> E.Yes     E      Yes        608.3485        653.6439
#> H.Yes     H      Yes        577.6364        607.4545
#> M.Yes     M      Yes        607.2941        643.2353
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)
#>       stype sch.wide statistic
#> E.No      E       No        12
#> H.No      H       No         3
#> M.No      M       No         8
#> E.Yes     E      Yes       132
#> H.Yes     H      Yes        11
#> M.Yes     M      Yes        17

rclus1<-as.svrepdesign(dclus1)

svyby(~api99, ~stype, rclus1, svymean)
#>   stype    api99       se
#> E     E 607.7917 25.83542
#> H     H 595.7143 50.75106
#> M     M 608.6000 34.82521
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
#>   stype api99 se.api99
#> E     E   615 41.96221
#> H     H   593      NaN
#> M     M   615 45.88854
svyby(~api99, list(school.type=apiclus1$stype), rclus1, svymean, vartype="cv")
#>   school.type    api99   cv.api99
#> E           E 607.7917 0.04250704
#> H           H 595.7143 0.08519362
#> M           M 608.6000 0.05722184
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
#>   stype    enroll       se DEff.enroll
#> E     E 2109717.1 631349.4  125.039075
#> H     H  535594.9 226716.6    4.645816
#> M     M  759628.1 213635.5   13.014932
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
#>       stype sch.wide statistic.api99 statistic.api00
#> E.No      E       No        601.6667        596.3333
#> H.No      H       No        662.0000        659.3333
#> M.No      M       No        611.3750        606.3750
#> E.Yes     E      Yes        608.3485        653.6439
#> H.Yes     H      Yes        577.6364        607.4545
#> M.Yes     M      Yes        607.2941        643.2353
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)
#>       stype sch.wide statistic
#> E.No      E       No        12
#> H.No      H       No         3
#> M.No      M       No         8
#> E.Yes     E      Yes       132
#> H.Yes     H      Yes        11
#> M.Yes     M      Yes        17

## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
#>          E         H         M
#> E 667.4691  752.7184  823.3275
#> H 752.7184 2575.6700  920.7676
#> M 823.3275  920.7676 1212.7954
svycontrast(mns, c(E = 1, M = -1))
#>          contrast     SE
#> contrast -0.80833 15.284

str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))
#> Classes ‘svyby’ and 'data.frame':	3 obs. of  3 variables:
#>  $ stype: Factor w/ 3 levels "E","H","M": 1 2 3
#>  $ api99: num  608 596 609
#>  $ se   : num  25.8 50.8 34.8
#>  - attr(*, "svyby")=List of 7
#>   ..$ margins  : int 1
#>   ..$ nstats   : num 1
#>   ..$ vars     : int 1
#>   ..$ deffs    : logi FALSE
#>   ..$ statistic: chr "svymean"
#>   ..$ variables: chr "api99"
#>   ..$ vartype  : chr "se"
#>  - attr(*, "replicates")= num [1:15, 1:3] 607 611 609 606 609 ...
#>   ..- attr(*, "scale")= num 0.915
#>   ..- attr(*, "rscales")= num [1:15] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..- attr(*, "mse")= logi FALSE
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:3] "E" "H" "M"
#>  - attr(*, "call")= language svyby.default(~api99, ~stype, rclus1, svymean, return.replicates = TRUE)

tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
#>              E           H           M
#> E 398602047550 77170909363 74761561157
#> H  77170909363 51400414315 34777311300
#> M  74761561157 34777311300 45640120138
svycontrast(tots, quote(E/H))
#>          nlcon     SE
#> contrast 3.939 1.4319


## comparing subgroups uses the delta method unless replicates are present
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
#>            nlcon    SE
#> contrast 0.52377 0.262
meanlogs<-svyby(~log(enroll),~stype,svymean, design=rclus1,covmat=TRUE,return.replicates=TRUE)
svycontrast(meanlogs, quote(exp(E-H)))
#>            nlcon     SE
#> contrast 0.52377 0.2423


## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE, 
  vartype=c("se","cv","cvpct","var")))
#> [1] "E"
#> [1] "H"
#> [1] "M"
#>   stype    enroll       se cv.enroll cv%.enroll          var DEff.enroll
#> E     E 2109717.1 631349.4 0.2992578   29.92578 398602047550  125.039075
#> H     H  535594.9 226716.6 0.4232987   42.32987  51400414315    4.645816
#> M     M  759628.1 213635.5 0.2812369   28.12369  45640120138   13.014932
deff(a)
#> [1] 125.039075   4.645816  13.014932
SE(a)
#> [1] 631349.4 226716.6 213635.5
cv(a)
#>         E         H         M 
#> 0.2992578 0.4232987 0.2812369 
coef(a)
#>         E         H         M 
#> 2109717.1  535594.9  759628.1 
confint(a, df=degf(rclus1))
#>       2.5 %  97.5 %
#> E 755607.37 3463827
#> H  49336.14 1021854
#> M 301425.60 1217831

## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)
#>   stype api.stu/enroll se.api.stu/enroll
#> E     E      0.8532672        0.01253361
#> H     H      0.8300683        0.01472607
#> M     M      0.8536738        0.01114203

ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)
#>               E             H             M
#> E  0.0001570913 -1.178219e-04  1.186882e-04
#> H -0.0001178219  2.168572e-04 -9.293321e-05
#> M  0.0001186882 -9.293321e-05  1.241448e-04

## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
#>         comp.imp sch.wide    api00       se
#> No.No         No       No 608.0435 28.98769
#> No.Yes        No      Yes 654.0741 32.66871
#> Yes.Yes      Yes      Yes 648.4060 22.47502
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)
#>         comp.imp sch.wide    api00       se
#> No.No         No       No 608.0435 28.98769
#> Yes.No       Yes       No       NA       NA
#> No.Yes        No      Yes 654.0741 32.66871
#> Yes.Yes      Yes      Yes 648.4060 22.47502

## Multiple tables
svybys(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
#> [[1]]
#>     comp.imp   api00       se
#> No        No 632.900 29.41719
#> Yes      Yes 648.406 22.47502
#> 
#> [[2]]
#>     sch.wide    api00       se
#> No        No 608.0435 28.98769
#> Yes      Yes 649.3625 23.42657
#>

Arguments

Value

Details

Note

Note

See also

Examples