Compute quantiles for data from complex surveys. oldsvyquantile is the version of the function from before version 4.1 of the package, available for backwards compatibility. See svyquantile for the current version

# S3 method for survey.design
oldsvyquantile(x, design, quantiles, alpha=0.05,
   ci=FALSE, method = "linear", f = 1,
   interval.type=c("Wald","score","betaWald"), na.rm=FALSE,se=ci,
   ties=c("discrete","rounded"), df=NULL,...)
# S3 method for svyrep.design
oldsvyquantile(x, design, quantiles,
   method ="linear", interval.type=c("probability","quantile"), f = 1,
   return.replicates=FALSE, ties=c("discrete","rounded"),na.rm=FALSE,
   alpha=0.05,df=NULL,...)

Arguments

x

A formula, vector or matrix

design

survey.design or svyrep.design object

quantiles

Quantiles to estimate

method

see approxfun

f

see approxfun

ci

Compute a confidence interval? (relatively slow; needed for svyby)

se

Compute standard errors from the confidence interval length?

alpha

Level for confidence interval

interval.type

See Details below

ties

See Details below

df

Degrees of freedom for a t-distribution. Inf requests a Normal distribution, NULL uses degf. Not relevant for type="betaWald"

return.replicates

Return the replicate means?

na.rm

Remove NAs?

...

arguments for future expansion

Details

The definition of the CDF and thus of the quantiles is ambiguous in the presence of ties. With ties="discrete" the data are treated as genuinely discrete, so the CDF has vertical steps at tied observations. With ties="rounded" all the weights for tied observations are summed and the CDF interpolates linearly between distinct observed values, and so is a continuous function. Combining interval.type="betaWald" and ties="discrete" is (close to) the proposal of Shah and Vaish(2006) used in some versions of SUDAAN.

Interval estimation for quantiles is complicated, because the influence function is not continuous. Linearisation cannot be used directly, and computing the variance of replicates is valid only for some designs (eg BRR, but not jackknife). The interval.type option controls how the intervals are computed.

For survey.design objects the default is interval.type="Wald". A 95% Wald confidence interval is constructed for the proportion below the estimated quantile. The inverse of the estimated CDF is used to map this to a confidence interval for the quantile. This is the method of Woodruff (1952). For "betaWald" the same procedure is used, but the confidence interval for the proportion is computed using the exact binomial cdf with an effective sample size proposed by Korn & Graubard (1998).

If interval.type="score" we use a method described by Binder (1991) and due originally to Francisco and Fuller (1986), which corresponds to inverting a robust score test. At the upper and lower limits of the confidence interval, a test of the null hypothesis that the cumulative distribution function is equal to the target quantile just rejects. This was the default before version 2.9. It is much slower than "Wald", and Dorfman & Valliant (1993) suggest it is not any more accurate.

Standard errors are computed from these confidence intervals by dividing the confidence interval length by 2*qnorm(alpha/2).

For replicate-weight designs, ordinary replication-based standard errors are valid for BRR and Fay's method, and for some bootstrap-based designs, but not for jackknife-based designs. interval.type="quantile" gives these replication-based standard errors. The default, interval.type="probability" computes confidence on the probability scale and then transforms back to quantiles, the equivalent of interval.type="Wald" for survey.design objects (with alpha=0.05).

There is a confint method for svyquantile objects; it simply extracts the pre-computed confidence interval.

Value

returns a list whose first component is the quantiles and second component is the confidence intervals. For replicate weight designs, returns an object of class svyrepstat.

Author

Thomas Lumley

See also

svykm for quantiles of survival curves

svyciprop for confidence intervals on proportions.

References

Binder DA (1991) Use of estimating functions for interval estimation from complex surveys. Proceedings of the ASA Survey Research Methods Section 1991: 34-42

Dorfman A, Valliant R (1993) Quantile variance estimators in complex surveys. Proceedings of the ASA Survey Research Methods Section. 1993: 866-871

Korn EL, Graubard BI. (1998) Confidence Intervals For Proportions With Small Expected Number of Positive Counts Estimated From Survey Data. Survey Methodology 23:193-201.

Francisco CA, Fuller WA (1986) Estimation of the distribution function with a complex survey. Technical Report, Iowa State University.

Shao J, Tu D (1995) The Jackknife and Bootstrap. Springer.

Shah BV, Vaish AK (2006) Confidence Intervals for Quantile Estimation from Complex Survey Data. Proceedings of the Section on Survey Research Methods.

Woodruff RS (1952) Confidence intervals for medians and other position measures. JASA 57, 622-627.

Examples


  data(api)
  ## population
  quantile(apipop$api00,c(.25,.5,.75))
#> 25% 50% 75% 
#> 565 667 761 

  ## one-stage cluster sample
  dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
  oldsvyquantile(~api00, dclus1, c(.25,.5,.75),ci=TRUE)
#> $quantiles
#>         0.25 0.5  0.75
#> api00 551.75 652 717.5
#> 
#> $CIs
#> , , api00
#> 
#>            0.25      0.5     0.75
#> (lower 490.5049 557.1892 691.0000
#> upper) 626.9903 714.0000 772.7397
#> 
#> 
  oldsvyquantile(~api00, dclus1, c(.25,.5,.75),ci=TRUE,interval.type="betaWald")
#> $quantiles
#>         0.25 0.5  0.75
#> api00 551.75 652 717.5
#> 
#> $CIs
#> , , api00
#> 
#>            0.25      0.5     0.75
#> (lower 495.3133 554.2038 684.0202
#> upper) 636.7716 714.7962 761.0759
#> 
#> 
  oldsvyquantile(~api00, dclus1, c(.25,.5,.75),ci=TRUE,df=NULL)
#> $quantiles
#>         0.25 0.5  0.75
#> api00 551.75 652 717.5
#> 
#> $CIs
#> , , api00
#> 
#>            0.25      0.5     0.75
#> (lower 490.5049 557.1892 691.0000
#> upper) 626.9903 714.0000 772.7397
#> 
#> 

  dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
  (qapi<-oldsvyquantile(~api00, dclus1, c(.25,.5,.75),ci=TRUE, interval.type="score"))
#> $quantiles
#>         0.25 0.5  0.75
#> api00 551.75 652 717.5
#> 
#> $CIs
#> , , api00
#> 
#>            0.25      0.5     0.75
#> (lower 512.0003 565.9999 662.0000
#> upper) 638.9998 702.9995 753.9996
#> 
#> 
  SE(qapi)
#>     0.25      0.5     0.75 
#> 29.60654 31.93782 21.44726 

  #stratified sample
  dstrat<-svydesign(id=~1, strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
  oldsvyquantile(~api00, dstrat, c(.25,.5,.75),ci=TRUE)
#> $quantiles
#>           0.25      0.5     0.75
#> api00 562.2056 667.2358 755.1226
#> 
#> $CIs
#> , , api00
#> 
#>            0.25      0.5     0.75
#> (lower 534.0000 636.2683 724.9898
#> upper) 594.4526 681.0000 777.0392
#> 
#> 

  #stratified sample, replicate weights
  # interval="probability" is necessary for jackknife weights
  rstrat<-as.svrepdesign(dstrat)
  oldsvyquantile(~api00, rstrat, c(.25,.5,.75), interval.type="probability")
#> Statistic:
#>          api00
#> q0.25 562.2056
#> q0.5  667.2358
#> q0.75 755.1226
#> SE:
#>          api00
#> q0.25 15.26348
#> q0.5  11.42383
#> q0.75 13.26813


  # BRR method
  data(scd)
  repweights<-2*cbind(c(1,0,1,0,1,0), c(1,0,0,1,0,1), c(0,1,1,0,0,1),
              c(0,1,0,1,1,0))
  scdrep<-svrepdesign(data=scd, type="BRR", repweights=repweights)
#> Warning: No sampling weights provided: equal probability assumed
  oldsvyquantile(~arrests+alive, design=scdrep, quantile=0.5, interval.type="quantile")
#> Statistic:
#>      arrests alive
#> q0.5     185    30
#> SE:
#>       arrests    alive
#> q0.5 15.02706 4.756574
  oldsvyquantile(~arrests+alive, design=scdrep, quantile=0.5, interval.type="quantile",df=NULL)
#> Statistic:
#>      arrests alive
#> q0.5     185    30
#> SE:
#>       arrests    alive
#> q0.5 15.02706 4.756574