Maximises a user-specified likelihood parametrised by multiple linear predictors to data from a complex sample survey and computes the sandwich variance estimator of the coefficients. Note that this function maximises an estimated population likelihood, it is not the sample MLE.

svymle(loglike, gradient = NULL, design, formulas, start = NULL, control
= list(), na.action="na.fail", method=NULL, lower=NULL,upper=NULL,influence=FALSE,...)
# S3 method for svymle
summary(object, stderr=c("robust", "model"),...)

Arguments

loglike

vectorised loglikelihood function

gradient

Derivative of loglike. Required for variance computation and helpful for fitting

design

a survey.design object

formulas

A list of formulas specifying the variable and linear predictors: see Details below

start

Starting values for parameters

control

control options for the optimiser: see the help page for the optimiser you are using.

lower,upper

Parameter bounds for bobyqa

influence

Return the influence functions (primarily for svyby)

na.action

Handling of NAs

method

"nlm" to use nlm,"uobyqa" or "bobyqa" to use those optimisers from the minqa package; otherwise passed to optim

...

Arguments to loglike and gradient that are not to be optimised over.

object

svymle object

stderr

Choice of standard error estimator. The default is a standard sandwich estimator. See Details below.

Details

Optimization is done by nlm by default or if method=="nlm". Otherwise optim is used and method specifies the method and control specifies control parameters.

The design object contains all the data and design information from the survey, so all the formulas refer to variables in this object. The formulas argument needs to specify the response variable and a linear predictor for each freely varying argument of loglike.

Consider for example the dnorm function, with arguments x, mean, sd and log, and suppose we want to estimate the mean of y as a linear function of a variable z, and to estimate a constant standard deviation. The log argument must be fixed at FALSE to get the loglikelihood. A formulas argument would be list(~y, mean=~z, sd=~1). Note that the data variable y must be the first argument to dnorm and the first formula and that all the other formulas are labelled. It is also permitted to have the data variable as the left-hand side of one of the formulas: eg list( mean=y~z, sd=~1).

The two optimisers from the minqa package do not use any derivatives to be specified for optimisation, but they do assume that the function is smooth enough for a quadratic approximation, ie, that two derivatives exist.

The usual variance estimator for MLEs in a survey sample is a `sandwich' variance that requires the score vector and the information matrix. It requires only sampling assumptions to be valid (though some model assumptions are required for it to be useful). This is the stderr="robust" option, which is available only when the gradient argument was specified.

If the model is correctly specified and the sampling is at random conditional on variables in the model then standard errors based on just the information matrix will be approximately valid. In particular, for independent sampling where weights and strata depend on variables in the model the stderr="model" should work fairly well.

Value

An object of class svymle

Author

Thomas Lumley

See also

Examples


 data(api)

 dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat)

 ## fit with glm
 m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat)
 ## fit as mle (without gradient)
 m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat, 
    formulas=list(mean=api00~api99+ell, sd=~1),
    start=list(c(80,1,0),c(20)), log=TRUE)
 ## with gradient
 gr<- function(x,mean,sd,log){
   dm<-2*(x - mean)/(2*sd^2)
   ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi))
         cbind(dm,ds)
      }
 m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, 
    formulas=list(mean=api00~api99+ell, sd=~1),
    start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")
#> Warning: NaNs produced

 summary(m0)
#> 
#> Call:
#> svyglm(formula = api00 ~ api99 + ell, design = dstrat, family = "gaussian")
#> 
#> Survey design:
#> svydesign(id = ~1, strata = ~stype, weight = ~pw, fpc = ~fpc, 
#>     data = apistrat)
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 79.89525   14.61154   5.468 1.38e-07 ***
#> api99        0.92798    0.01879  49.395  < 2e-16 ***
#> ell         -0.07312    0.13966  -0.524    0.601    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for gaussian family taken to be 721.8345)
#> 
#> Number of Fisher Scoring iterations: 2
#> 
 summary(m1,stderr="model")
#> Survey-sampled mle: 
#> svymle(loglike = dnorm, gradient = NULL, design = dstrat, formulas = list(mean = api00 ~ 
#>     api99 + ell, sd = ~1), start = list(c(80, 1, 0), c(20)), 
#>     log = TRUE)
#>                         Coef          SE p.value
#> mean.(Intercept) 79.86241974 14.29745044  <0.001
#> mean.api99        0.92800541  0.01949138  <0.001
#> mean.ell         -0.07270673  0.11779611   0.537
#> sd.(Intercept)   26.79938270  1.33994314  <0.001
#> Stratified Independent Sampling design
#> svydesign(id = ~1, strata = ~stype, weight = ~pw, fpc = ~fpc, 
#>     data = apistrat)
 summary(m2)
#> Survey-sampled mle: 
#> svymle(loglike = dnorm, gradient = gr, design = dstrat, formulas = list(mean = api00 ~ 
#>     api99 + ell, sd = ~1), start = list(c(80, 1, 0), c(20)), 
#>     method = "BFGS", log = TRUE)
#>                         Coef          SE p.value
#> mean.(Intercept) 79.98826217 14.61363749  <0.001
#> mean.api99        0.92785231  0.01878935  <0.001
#> mean.ell         -0.07369146  0.13966667   0.598
#> sd.(Intercept)   26.79911727  1.73962068  <0.001
#> Stratified Independent Sampling design
#> svydesign(id = ~1, strata = ~stype, weight = ~pw, fpc = ~fpc, 
#>     data = apistrat)

 ## Using offsets
 m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, 
    formulas=list(mean=api00~api99+offset(ell)+ell, sd=~1),
    start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")
#> Warning: NaNs produced
#> Warning: NaNs produced


## demonstrating multiple linear predictors

 m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, 
    formulas=list(mean=api00~api99+offset(ell)+ell, sd=~stype),
    start=list(c(80,1,0),c(20,0,0)), log=TRUE, method="BFGS")
#> Warning: NaNs produced
#> Warning: NaNs produced



 ## More complicated censored lognormal data example
 ## showing that the response variable can be multivariate

 data(pbc, package="survival")
 pbc$randomized <- with(pbc, !is.na(trt) & trt>0)
 biasmodel<-glm(randomized~age*edema,data=pbc)
 pbc$randprob<-fitted(biasmodel)
 dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema,
    data=subset(pbc,randomized))


## censored logNormal likelihood
 lcens<-function(x,mean,sd){
    ifelse(x[,2]==1,
           dnorm(log(x[,1]),mean,sd,log=TRUE),
           pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
           )
 }

 gcens<- function(x,mean,sd){

        dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)

        dm<-ifelse(x[,2]==1,
                   2*(log(x[,1]) - mean)/(2*sd^2),
                   dz*-1/sd)
        ds<-ifelse(x[,2]==1,
                   (log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
                   ds<- dz*-(log(x[,1])-mean)/(sd*sd))
        cbind(dm,ds)      
 }

m<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="newuoa",
      formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin,
                    sd=~1),
         start=list(c(10,0,0,0),c(1)))

summary(m)
#> Survey-sampled mle: 
#> svymle(loglike = lcens, gradient = gcens, design = dpbc, formulas = list(mean = I(cbind(time, 
#>     status > 0)) ~ bili + protime + albumin, sd = ~1), start = list(c(10, 
#>     0, 0, 0), c(1)), method = "newuoa")
#>                        Coef         SE p.value
#> mean.(Intercept)  8.2946443 1.17233306  <0.001
#> mean.bili        -0.0935106 0.01653163  <0.001
#> mean.protime     -0.2927046 0.08862664  <0.001
#> mean.albumin      0.8764400 0.15764369  <0.001
#> sd.(Intercept)    0.9440687 0.06612149  <0.001
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(pbc, 
#>     randomized))

## the same model, but now specifying the lower bound of zero on the
## log standard deviation

mbox<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="bobyqa",
      formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin, sd=~1),
      lower=list(c(-Inf,-Inf,-Inf,-Inf),0), upper=Inf,
      start=list(c(10,0,0,0),c(1)))


## The censored lognormal model is now available in svysurvreg()

summary(svysurvreg(Surv(time,status>0)~bili+protime+albumin,
        design=dpbc,dist="lognormal"))
#> 
#> Call:
#> svysurvreg(formula = Surv(time, status > 0) ~ bili + protime + 
#>     albumin, design = dpbc, dist = "lognormal")
#>               Value Std. Error     z       p
#> (Intercept)  8.2946     1.1317  7.33 2.3e-13
#> bili        -0.0935     0.0164 -5.71 1.1e-08
#> protime     -0.2927     0.0857 -3.42 0.00063
#> albumin      0.8764     0.1555  5.64 1.7e-08
#> Log(scale)  -0.0576     0.0697 -0.83 0.40869
#> 
#> Scale= 0.944 
#> 
#> Log Normal distribution
#> Loglik(model)= NA   Loglik(intercept only)= NA
#> 	Chisq=  on 3 degrees of freedom, p=  
#> Number of Newton-Raphson Iterations: 5 
#> n= 312 
#> 

## compare svymle scale value after log transformation
svycontrast(m, quote(log(`sd.(Intercept)`)))
#>              nlcon   SE
#> contrast -0.057556 0.07