picture of complicated surveyor instrument

Survey analysis in R

This is the homepage for the "survey" package, which provides facilities in R for analyzing data from complex surveys. The current version is 3.29. A much earlier version (2.2) was published in Journal of Statistical Software

An experimental package for very large surveys such as the American Community Survey can be found here

A port of a much older version of the survey package (version 3.6-8) to S-PLUS 8.0 is available from CSAN (thanks to Patrick Aboyoun at Insightful).

Features:

The NEWS file gives a history of features and bug fixes.

Comparison shopping:
Alan Zaslavsky keeps a comprehensive list of survey analysis software for the ASA Section on Survey Research Methods.

User-generated ratings and reviews of this package (and others) at crantastic.


Using the survey package:


Technical notes and comparisons with other software

Some examples (in PDF) translated from Stata and SUDAAN examples at UCLA Academic Technology Services.

Notes on the sparse matrix algorithms used in version 3.15 for two-phase designs (and perhaps more widely in future versions)

Notes on standard errors for survival curves.

A 2009 CDC report compared five other survey analysis packages in the context of the Youth Risk Behaviors Survey. I have written an extension that does the same feature comparisons and results comparisons with R and the survey package. Some of this is copied from the CDC report (which I believe is in the public domain), but they are (of course) not responsible for any of the conclusions or results.

Anthony Damico has R scripts for downloading and analysing major US government surveys at Github. He reported on comparisons of the survey package with SAS, Stata, SUDAAN in The R Journal 1(2) 37-45


Tutorials

I have a course at statistics.com, which will be repeated as demand permits

Here are slides from a Continuing Education course at JSM 2012.

I gave a workshop on two-phase designs at the 3rd North American Congress of Epidemiology, in Montreal, June 21,2011

I gave a two-day course for the Washington (DC) Statistical Society, March 23-24 2010. First day on R, second day on the survey package

Norman Breslow and I gave the course at STATISTICALPS 2009, at the beginning of September in the Italian Alps. The course will include an introduction to the survey package, but will focus on two-phase designs in epidemiology. We will have some code and data up soon.

Slides from a short tutorial at the US Census Bureau, August 10.

I gave a tutorial at useR 2009, on the afternoon of July 7, 2009.

A 1.5 hour brief introduction to R, including a bit on the survey package, at the AAPOR conference, Friday May 15, 10:30am.

There was a one-day course at the University of Copenhagen Center for Health and Society on April 3, 2009. Slides are available at that link.

Tobias Verbeke has packaged data sets and exercises from Sharon Lohr's Sampling: Design and Analysis for use with the survey package.

I gave a short course for the Washington Statistical Society on March 15-16 2007. The first day was on R and the slides are a selection from these. The second day was on the survey package, slides here.

Norman Breslow and I gave a short course on complex survey designs for epidemiology at the 2008 WNAR (Biometric Society) meeting, UC Davis, June 22, 2008. My sessions were an overview of the survey package and an introduction to calibration. Norm's data sets and code are also online

There is an article on version 3.6-12 of the package in the January 2008 issue of Survey Statistician (note: large PDF file)


book 
cover 
picture I have written a book on survey analysis, based around the survey package. The book is called Complex Surveys: a guide to analysis using R. It has just been published by John Wiley & Sons. It already has a web site


Help pages:

The PEAS project at Napier University has Practical Examplars for the Analysis of Surveys using R (as well as other packages).
Thomas Lumley