model.frame()
extracts variables from a survey design, weights()
extracts the weights.
> data(fpc) > withfpc <- svydesign(ids = ~psuid, strata = ~stratid, + fpc = ~Nh, variables = ~x, data = fpc, nest = TRUE) > model.frame(withfpc) x 1 2.8 2 4.1 3 6.8 4 6.8 5 9.2 6 3.7 7 6.6 8 4.2 > weights(withfpc) 1 2 3 4 5 6 7 8 3 3 3 3 3 4 4 4 >
Replicate weights for new estimators Replicate weight variances can be computed for arbitrary statistics using withReplicates
. The statistic can be specified either as a function of two arguments, a data frame and a weights, or as an expression where the sampling weights are represented by .weights
. As an example, using the dclus1
design we have used in several earlier examples, we have two ways to compute a ratio.
> withReplicates(rclus1, quote(sum(.weights*api.stu)/sum(.weights*enroll))) theta SE [1,] 0.84971 0.0095 > withReplicates(rclus1, function(w,data) sum(w*data$api.stu)/sum(w*data$enroll)) theta SE [1,] 0.84971 0.0095
Delta method: svycontrast
will compute
nonlinear functions of estimates, using symbolic differentiation to
calculate the necessary derivatives for the delta method. This can be used
to compute new summary statistics of categorical data. For example
svykappa
computes marginal and joint proportions
mean SE a 0.27322 0.0303 b 0.72678 0.0303 A 0.12568 0.0204 B 0.87432 0.0204 a.A 0.12568 0.0204 b.A 0.00000 0.0000 a.B 0.14754 0.0171 b.B 0.72678 0.0303and then constructs an expression
(a.A + b.B - (a * A + b * B))/(1 - (a * A + b * B))
and
evaluates it with svycontrast
.
M-estimation The svymle
function allows estimation by maximizing a user-specified additive objective function, with a regression model for any parameters of the objective function.
For example, suppose we want a censored lognormal regression model. The objective function is a vectorized function to compute the loglikelihood of each observation. The input x
is a two-column matrix with first column the observed value and second column the censoring estimator.
lcens<-function(x,mean,sd){ ifelse(x[,2]==1, dnorm(log(x[,1]),mean,sd,log=TRUE), pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE) ) }Taylor linearization variances also require the derivative of the objective function
gcens<- function(x,mean,sd){ dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE) dm<-ifelse(x[,2]==1, 2*(log(x[,1]) - mean)/(2*sd^2), dz*-1/sd) ds<-ifelse(x[,2]==1, (log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)), ds<- dz*-(log(x[,1])-mean)/(sd*sd)) cbind(dm,ds) }
We use a data set from the "survival" package
data(pbc, package="survival") biasmodel<-glm(I(trt>0)~age*edema,data=pbc) pbc$randprob<-fitted(biasmodel) dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema, data=subset(pbc,trt>0))
The loglikelihood function has two parameters: the mean and sd of log(x)
. We specify a regression model for each parameter. The mean depends on three variables
mean = I(cbind(time, status)) ~ bili + protime + alband the standard deviation is just a constant
sd = ~1It's necessary to specify starting values for the optimization. We start the mean model with an intercept of 10 and the standard deviation at 1.
start = list(c(10, 0, 0, 0), c(1))
Putting this all together
> summary(svymle(loglike = lcens, gradient = gcens, design = dpbc, + formulas = list(mean = I(cbind(time, status)) ~ bili + protime + + alb, sd = ~1), start = list(c(10, 0, 0, 0), c(1)))) Survey-sampled mle: svymle(loglike = lcens, gradient = gcens, design = dpbc, formulas = list(mean = I(cbind(time, status)) ~ bili + protime + alb, sd = ~1), start = list(c(10, 0, 0, 0), c(1))) Coef SE p.value mean.(Intercept) 8.9803185 1.27851656 <0.001 mean.bili -0.0927236 0.01648777 <0.001 mean.protime -0.3689851 0.09495636 <0.001 mean.alb 0.9535612 0.17204268 <0.001 sd.(Intercept) 0.9888172 0.07561830 <0.001 Stratified Independent Sampling design svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(pbc, trt > 0))