svymle.Rd
Maximises a user-specified likelihood parametrised by multiple linear predictors to data from a complex sample survey and computes the sandwich variance estimator of the coefficients. Note that this function maximises an estimated population likelihood, it is not the sample MLE.
vectorised loglikelihood function
Derivative of loglike
. Required for variance computation and helpful for fitting
a survey.design
object
A list of formulas specifying the variable and linear predictors: see Details below
Starting values for parameters
control options for the optimiser: see the help page for the optimiser you are using.
Parameter bounds for bobyqa
Return the influence functions (primarily for svyby)
Handling of NA
s
"nlm"
to use nlm
,"uobyqa"
or
"bobyqa"
to use those optimisers from the minqa
package; otherwise passed to optim
Arguments to loglike
and gradient
that are
not to be optimised over.
svymle
object
Choice of standard error estimator. The default is a standard sandwich estimator. See Details below.
Optimization is done by nlm
by default or if
method=="nlm"
. Otherwise optim
is used and method
specifies the method and control
specifies control parameters.
The design
object contains all the data and design information
from the survey, so all the formulas refer to variables in this object.
The formulas
argument needs to specify the response variable and
a linear predictor for each freely varying argument of loglike
.
Consider for example the dnorm
function, with arguments
x
, mean
, sd
and log
, and suppose we want to
estimate the mean of y
as a linear function of a variable
z
, and to estimate a constant standard deviation. The log
argument must be fixed at FALSE
to get the loglikelihood. A
formulas
argument would be list(~y, mean=~z, sd=~1)
. Note
that the data variable y
must be the first argument to
dnorm
and the first formula and that all the other formulas are
labelled. It is also permitted to have the data variable as the
left-hand side of one of the formulas: eg list( mean=y~z, sd=~1)
.
The two optimisers from the minqa
package do not use any
derivatives to be specified for optimisation, but they do assume
that the function is smooth enough for a quadratic approximation, ie,
that two derivatives exist.
The usual variance estimator for MLEs in a survey sample is a `sandwich'
variance that requires the score vector and the information matrix. It
requires only sampling assumptions to be valid (though some model
assumptions are required for it to be useful). This is the
stderr="robust"
option, which is available only when the gradient
argument was specified.
If the model is correctly specified and the sampling is at random
conditional on variables in the model then standard errors based on just
the information matrix will be approximately valid. In particular, for
independent sampling where weights and strata depend on variables in the
model the stderr="model"
should work fairly well.
An object of class svymle
data(api)
dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat)
## fit with glm
m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat)
## fit as mle (without gradient)
m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat,
formulas=list(mean=api00~api99+ell, sd=~1),
start=list(c(80,1,0),c(20)), log=TRUE)
## with gradient
gr<- function(x,mean,sd,log){
dm<-2*(x - mean)/(2*sd^2)
ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi))
cbind(dm,ds)
}
m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat,
formulas=list(mean=api00~api99+ell, sd=~1),
start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")
#> Warning: NaNs produced
summary(m0)
#>
#> Call:
#> svyglm(formula = api00 ~ api99 + ell, design = dstrat, family = "gaussian")
#>
#> Survey design:
#> svydesign(id = ~1, strata = ~stype, weight = ~pw, fpc = ~fpc,
#> data = apistrat)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 79.89525 14.61154 5.468 1.38e-07 ***
#> api99 0.92798 0.01879 49.395 < 2e-16 ***
#> ell -0.07312 0.13966 -0.524 0.601
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for gaussian family taken to be 721.8345)
#>
#> Number of Fisher Scoring iterations: 2
#>
summary(m1,stderr="model")
#> Survey-sampled mle:
#> svymle(loglike = dnorm, gradient = NULL, design = dstrat, formulas = list(mean = api00 ~
#> api99 + ell, sd = ~1), start = list(c(80, 1, 0), c(20)),
#> log = TRUE)
#> Coef SE p.value
#> mean.(Intercept) 79.86241974 14.29745044 <0.001
#> mean.api99 0.92800541 0.01949138 <0.001
#> mean.ell -0.07270673 0.11779611 0.537
#> sd.(Intercept) 26.79938270 1.33994314 <0.001
#> Stratified Independent Sampling design
#> svydesign(id = ~1, strata = ~stype, weight = ~pw, fpc = ~fpc,
#> data = apistrat)
summary(m2)
#> Survey-sampled mle:
#> svymle(loglike = dnorm, gradient = gr, design = dstrat, formulas = list(mean = api00 ~
#> api99 + ell, sd = ~1), start = list(c(80, 1, 0), c(20)),
#> method = "BFGS", log = TRUE)
#> Coef SE p.value
#> mean.(Intercept) 79.98826217 14.61363749 <0.001
#> mean.api99 0.92785231 0.01878935 <0.001
#> mean.ell -0.07369146 0.13966667 0.598
#> sd.(Intercept) 26.79911727 1.73962068 <0.001
#> Stratified Independent Sampling design
#> svydesign(id = ~1, strata = ~stype, weight = ~pw, fpc = ~fpc,
#> data = apistrat)
## Using offsets
m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat,
formulas=list(mean=api00~api99+offset(ell)+ell, sd=~1),
start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")
#> Warning: NaNs produced
#> Warning: NaNs produced
## demonstrating multiple linear predictors
m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat,
formulas=list(mean=api00~api99+offset(ell)+ell, sd=~stype),
start=list(c(80,1,0),c(20,0,0)), log=TRUE, method="BFGS")
#> Warning: NaNs produced
#> Warning: NaNs produced
## More complicated censored lognormal data example
## showing that the response variable can be multivariate
data(pbc, package="survival")
pbc$randomized <- with(pbc, !is.na(trt) & trt>0)
biasmodel<-glm(randomized~age*edema,data=pbc)
pbc$randprob<-fitted(biasmodel)
dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema,
data=subset(pbc,randomized))
## censored logNormal likelihood
lcens<-function(x,mean,sd){
ifelse(x[,2]==1,
dnorm(log(x[,1]),mean,sd,log=TRUE),
pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
)
}
gcens<- function(x,mean,sd){
dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)
dm<-ifelse(x[,2]==1,
2*(log(x[,1]) - mean)/(2*sd^2),
dz*-1/sd)
ds<-ifelse(x[,2]==1,
(log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
ds<- dz*-(log(x[,1])-mean)/(sd*sd))
cbind(dm,ds)
}
m<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="newuoa",
formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin,
sd=~1),
start=list(c(10,0,0,0),c(1)))
summary(m)
#> Survey-sampled mle:
#> svymle(loglike = lcens, gradient = gcens, design = dpbc, formulas = list(mean = I(cbind(time,
#> status > 0)) ~ bili + protime + albumin, sd = ~1), start = list(c(10,
#> 0, 0, 0), c(1)), method = "newuoa")
#> Coef SE p.value
#> mean.(Intercept) 8.2946443 1.17233306 <0.001
#> mean.bili -0.0935106 0.01653163 <0.001
#> mean.protime -0.2927046 0.08862664 <0.001
#> mean.albumin 0.8764400 0.15764369 <0.001
#> sd.(Intercept) 0.9440687 0.06612149 <0.001
#> Stratified Independent Sampling design (with replacement)
#> svydesign(id = ~1, prob = ~randprob, strata = ~edema, data = subset(pbc,
#> randomized))
## the same model, but now specifying the lower bound of zero on the
## log standard deviation
mbox<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="bobyqa",
formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin, sd=~1),
lower=list(c(-Inf,-Inf,-Inf,-Inf),0), upper=Inf,
start=list(c(10,0,0,0),c(1)))
## The censored lognormal model is now available in svysurvreg()
summary(svysurvreg(Surv(time,status>0)~bili+protime+albumin,
design=dpbc,dist="lognormal"))
#>
#> Call:
#> svysurvreg(formula = Surv(time, status > 0) ~ bili + protime +
#> albumin, design = dpbc, dist = "lognormal")
#> Value Std. Error z p
#> (Intercept) 8.2946 1.1317 7.33 2.3e-13
#> bili -0.0935 0.0164 -5.71 1.1e-08
#> protime -0.2927 0.0857 -3.42 0.00063
#> albumin 0.8764 0.1555 5.64 1.7e-08
#> Log(scale) -0.0576 0.0697 -0.83 0.40869
#>
#> Scale= 0.944
#>
#> Log Normal distribution
#> Loglik(model)= NA Loglik(intercept only)= NA
#> Chisq= on 3 degrees of freedom, p=
#> Number of Newton-Raphson Iterations: 5
#> n= 312
#>
## compare svymle scale value after log transformation
svycontrast(m, quote(log(`sd.(Intercept)`)))
#> nlcon SE
#> contrast -0.057556 0.07