## Programming with survey objects

Extracting information `model.frame()` extracts variables from a survey design, `weights()` extracts the weights.
```> data(fpc)
> withfpc <- svydesign(ids = ~psuid, strata = ~stratid,
+     fpc = ~Nh, variables = ~x, data = fpc, nest = TRUE)
> model.frame(withfpc)
x
1 2.8
2 4.1
3 6.8
4 6.8
5 9.2
6 3.7
7 6.6
8 4.2
> weights(withfpc)
1 2 3 4 5 6 7 8
3 3 3 3 3 4 4 4
>```

Replicate weights for new estimators Replicate weight variances can be computed for arbitrary statistics using `withReplicates`. The statistic can be specified either as a function of two arguments, a data frame and a weights, or as an expression where the sampling weights are represented by `.weights`. As an example, using the `dclus1` design we have used in several earlier examples, we have two ways to compute a ratio.

```> withReplicates(rclus1, quote(sum(.weights*api.stu)/sum(.weights*enroll)))
theta     SE
[1,] 0.84971 0.0095
>  withReplicates(rclus1, function(w,data) sum(w*data\$api.stu)/sum(w*data\$enroll))
theta     SE
[1,] 0.84971 0.0095
```

Delta method: `svycontrast` will compute nonlinear functions of estimates, using symbolic differentiation to calculate the necessary derivatives for the delta method. This can be used to compute new summary statistics of categorical data. For example `svykappa` computes marginal and joint proportions

```       mean     SE
a   0.27322 0.0303
b   0.72678 0.0303
A   0.12568 0.0204
B   0.87432 0.0204
a.A 0.12568 0.0204
b.A 0.00000 0.0000
a.B 0.14754 0.0171
b.B 0.72678 0.0303
```
and then constructs an expression `(a.A + b.B - (a * A + b * B))/(1 - (a * A + b * B))` and evaluates it with `svycontrast`.

M-estimation The `svymle` function allows estimation by maximizing a user-specified additive objective function, with a regression model for any parameters of the objective function. For example, suppose we want a censored lognormal regression model. The objective function is a vectorized function to compute the loglikelihood of each observation. The input `x` is a two-column matrix with first column the observed value and second column the censoring estimator.

```  lcens<-function(x,mean,sd){
ifelse(x[,2]==1,
dnorm(log(x[,1]),mean,sd,log=TRUE),
pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
)
}
```
Taylor linearization variances also require the derivative of the objective function
```  gcens<- function(x,mean,sd){
dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)
dm<-ifelse(x[,2]==1,
2*(log(x[,1]) - mean)/(2*sd^2),
dz*-1/sd)
ds<-ifelse(x[,2]==1,
(log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
ds<- dz*-(log(x[,1])-mean)/(sd*sd))
cbind(dm,ds)
}
```

We use a data set from the "survival" package

```data(pbc, package="survival")
biasmodel<-glm(I(trt>0)~age*edema,data=pbc)
pbc\$randprob<-fitted(biasmodel)
dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema, data=subset(pbc,trt>0))
```

The loglikelihood function has two parameters: the mean and sd of `log(x)`. We specify a regression model for each parameter. The mean depends on three variables

```mean = I(cbind(time, status)) ~ bili + protime +  alb
```
and the standard deviation is just a constant
```sd = ~1
```
It's necessary to specify starting values for the optimization. We start the mean model with an intercept of 10 and the standard deviation at 1.
```start = list(c(10, 0, 0, 0), c(1))
```

Putting this all together

```> summary(svymle(loglike = lcens, gradient = gcens, design = dpbc,
+     formulas = list(mean = I(cbind(time, status)) ~ bili + protime +
+         alb, sd = ~1), start = list(c(10, 0, 0, 0), c(1))))
Survey-sampled mle:
svymle(loglike = lcens, gradient = gcens, design = dpbc, formulas = list(mean = I(cbind(time,
status)) ~ bili + protime + alb, sd = ~1), start = list(c(10,
0, 0, 0), c(1)))
Coef         SE p.value
mean.(Intercept)  8.9803185 1.27851656  <0.001
mean.bili        -0.0927236 0.01648777  <0.001
mean.protime     -0.3689851 0.09495636  <0.001
mean.alb          0.9535612 0.17204268  <0.001
sd.(Intercept)    0.9888172 0.07561830  <0.001
Stratified Independent Sampling design
svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(pbc,
trt > 0))
```