svytable {survey} R Documentation

## Contingency tables for survey data

### Description

Contingency tables and chisquared tests of association for survey data.

### Usage

```## S3 method for class 'survey.design':
svytable(formula, design, Ntotal = NULL, round = FALSE,...)
## S3 method for class 'svyrep.design':
svytable(formula, design, Ntotal = sum(weights(design, "sampling")), round = FALSE,...)
## S3 method for class 'survey.design':
## S3 method for class 'svyrep.design':
## S3 method for class 'svytable':
summary(object, statistic = c("F",
degf(design, ...)
## S3 method for class 'survey.design2':
degf(design, ...)
## S3 method for class 'svyrep.design':
degf(design, tol=1e-5,...)
```

### Arguments

 `formula` Model formula specifying margins for the table (using `+` only) `design` survey object `statistic` See Details below `Ntotal` A population total or set of population stratum totals to normalise to. `round` Should the table entries be rounded to the nearest integer? `na.rm` Remove missing values `object` Output from `svytable` `...` Other arguments for future expansion `tol` Tolerance for `qr` in computing the matrix rank

### Details

The `svytable` function computes a weighted crosstabulation. In many cases it is easier to use `svytotal` or `svymean`, which also produce standard errors, design effects, etc.

The frequencies in the table can be normalised to some convenient total such as 100 or 1.0 by specifying the `Ntotal` argument. If the formula has a left-hand side the mean or sum of this variable rather than the frequency is tabulated.

The `Ntotal` argument can be either a single number or a data frame whose first column gives the (first-stage) sampling strata and second column the population size in each stratum. In this second case the `svytable` command performs `post-stratification': tabulating and scaling to the population within strata and then adding up the strata.

As with other `xtabs` objects, the output of `svytable` can be processed by `ftable` for more attractive display. The `summary` method for `svytable` objects calls `svychisq` for a test of independence.

`svychisq` computes first and second-order Rao-Scott corrections to the Pearson chisquared test, and two Wald-type tests.

The default (`statistic="F"`) is the Rao-Scott second-order correction. The p-values are computed with a Satterthwaite approximation to the distribution. The alternative `statistic="Chisq"` adjusts the Pearson chisquared statistic by a design effect estimate and then compares it to the chisquared distribution it would have under simple random sampling.

The `statistic="Wald"` test is that proposed by Koch et al (1975) and used by the SUDAAN software package. It is a Wald test based on the differences between the observed cells counts and those expected under independence. The adjustment given by `statistic="adjWald"` reduces the statistic when the number of PSUs is small compared to the number of degrees of freedom of the test. Rao and Thomas (1990) compare these tests and find the adjustment benefical.

`statistic="lincom"` uses the exact asymptotic distribution, which is a linear combination of chi-squared variables (see `pchisqsum`, and `statistic="saddlepoint"` uses a saddlepoint approximation to this distribution.

For designs using replicate weights the code is essentially the same as for designs with sampling structure, since the necessary variance computations are done by the appropriate methods of `svytotal` and `svymean`. The exception is that the degrees of freedom is computed as one less than the rank of the matrix of replicate weights (by `degf`).

At the moment, `svychisq` works only for 2-dimensional tables.

### Value

The table commands return an `xtabs` object, `svychisq` returns a `htest` object.

### Note

Rao and Scott (1984) leave open one computational issue. In computing `generalised design effects' for these tests, should the variance under simple random sampling be estimated using the observed proportions or the the predicted proportions under the null hypothesis? `svychisq` uses the observed proportions, following simulations by Sribney (1998), and the choices made in Stata

### References

Davies RB (1973). "Numerical inversion of a characteristic function" Biometrika 60:415-7

Koch, GG, Freeman, DH, Freeman, JL (1975) "Strategies in the multivariate analysis of data from complex surveys" International Statistical Review 43: 59-78

Rao, JNK, Scott, AJ (1984) "On Chi-squared Tests For Multiway Contigency Tables with Proportions Estimated From Survey Data" Annals of Statistics 12:46-60.

Sribney WM (1998) "Two-way contingency tables for survey or clustered data" Stata Technical Bulletin 45:33-49.

Thomas, DR, Rao, JNK (1990) "Small-sample comparison of level and power for simple goodness-of-fit statistics under cluster sampling" JASA 82:630-636

`svytotal` and `svymean` report totals and proportions by category for factor variables.

See `svyby` and `ftable.svystat` to construct more complex tables of summary statistics.

See `svyloglin` for loglinear models.

### Examples

```  data(api)
xtabs(~sch.wide+stype, data=apipop)

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
summary(dclus1)

(tbl <- svytable(~sch.wide+stype, dclus1))
svychisq(~sch.wide+stype, dclus1)
summary(tbl, statistic="Chisq")