Lonely PSUs

Variance estimation in sample surveys involves variances computed within primary sampling units. If only one PSU is sampled from a particular stratum the variance can't be computed (there is no unbiased estimator and the standard estimator gives 0/0).

Certainty PSUs

One exception to this is "certainty" PSUs in sampling without replacement, where the population has only one PSU in the stratum. With 100% sampling, there is no contribution to the variance from the first stage of sampling in this stratum. The easiest way to tell R you have certainty PSUs is to use the fpc argument to svydesign. For example, these data are from p60 of Lehtonen & Pakhinen Practical Methods for Design and Analysis of Complex Surveys
 unemp <- read.table(textConnection(" id str clu wt hou85 ue91 lab91
  1 2 1 1 26881 4123 33786
  3 1 10 1.004 9230 1623 13727
  4 1 4 1.893 4896 760 5919
  5 1 7 2.173 4264 767 5823
  6 1 32 2.971 3119 568 4011
  7 1 26 4.762 1946 331 2543
  8 1 18 6.335 1463 187 1448
  9 1 13 13.730 675 129 927
 +     header = TRUE)
The first observation is sampled with certainty from a population stratum of size 1, the remaining observations are sampled from a population stratum of size 31.
 > dunemp <- svydesign(id = ~clu, strata = ~str, weight = ~wt, data = unemp, 
 +     fpc = c(1, rep(31, 7)))
 > svymean(~ue91, dunemp)
        mean     SE
 ue91 445.18 132.39
 > svytotal(~ue91, dunemp)
      total     SE
 ue91 15077 458.53

Other lonely PSUs

More generally, some sort of ad hoc adjustment is needed. The best adjustment is probably to combine the single-PSU stratum with another well-chosen stratum, but there are some fully automatic adjustments available. The form of the adjustment is controlled by a global option. The default is
which makes it an error to have a stratum with a single, non-certainty PSU.


a single-PSU stratum makes no contribution to the variance (for multistage sampling it makes no contribution at that level of sampling). This is an alternative to specifying fpc, and might be useful for compatibility with other software.


the data for the single-PSU stratum are centered at the sample grand mean rather than the stratum mean. This is conservative.


the stratum contribution to the variance is taken to be the average of all the strata with more than one PSU. This might be appropriate if the lonely PSUs were due to data missing at random rather than to design deficiencies.

Difficulties in estimating variances also arise when only one PSU in a stratum has observations in a particular domain or subpopulation. R gives a warning rather than an error when this occurs, and can optionally apply the "adjust" and "average" corrections. To apply the corrections, set

and set options("survey.lonely.psu") to the adjustment method you want to use.

Replicate weights

The same problems occur for replicate-weight estimation. The main difference is that the survey.lonely.psu option is used by as.svrepdesign in constructing replicate weights, rather than being consulted as each analysis is done.
Thomas Lumley
Last modified: Tue Apr 11 10:48:00 PDT 2006