In a dataset, survey weights are captured by a variable that indicates how much each respondent should “count.” For example, a respondent who has a 1 for a weight variable would count as a single respondent; a respondent who receives a .5 for a weight variable should count as half a respondent; and a respondent who has a 2 would count as two respondents. Use the survey package for weighting statistical analyses. The 2010 UMass Poll Massachusetts Exit Poll (http://people.umass.edu/schaffne/mass_exitpoll.dta) includes a variable called"weight.” Respondents in the exit poll were weighted to account for the stratified nature of the sampling procedure. By stratifying towns by geography and other factors, not every voter had an equal chance of being sampled. Thus, the weight variable adjusts for this fact. First, just one time at the beginning of your session you need to create a survey design object to tell R that your dataset includes a weight variable. This is done with the svydesign function: library(foreign) dat <- read.dta("mass_exitpoll.dta") library(survey) svy.dat <- svydesign(ids = ~1, data = dat, weights = dat$weight)
The arguments data and weight specify the data and variable that contains the weights. ‘ids’ is to specify sampling cluster membership. Since we aren’t accounting for the cluster sampling here, we set ids = 1 to indicate that all respondents originated from the same cluster (or primary sampling unit, PSU). Once you have done this, R now knows which variable to use to weight the data (and how to apply those weights). However, you still have to ask R to weight the data whenever you run an analysis. For example, let’s say we wanted to look at the vote for Governor. If we just tabulated the variable by proportion, we would get the following: prop.table(table(dat$qc_gov)) However, the above tabulation does not weight the data to adjust for the sampling approach. To do this, you need to use survey’s analytical functions, which is usually just svy before the default function, a tilde ~ before the variable or formula, and the survey design object specified (see the package description for the full list of survey functions). So, now you can do the same tabulation but apply the weights to it: prop.table(svytable(~qc_gov, design = svy.dat)) Next post will go over how to create the weights yourself using post-stratification raking! Fancy! This post is CHAPTER 5.1 of an R packet I am creating for a Survey Methods graduate seminar taught by Brian Schaffner at the University of Massachusetts Amherst. The seminar is usually only taught in Stata, so I translated all the exercises, assignments, and examples used in the course for R users. Other chapters include: Logit/Probit Models, Ordinal Logit/Probit Models, Multinomial Logit Models, Count Models, Creating Post-Stratification Weights, Item Scaling, Matching/Balancing. |

R Tutorials >