R Tutorials‎ > ‎

Using Weights in R

posted Mar 22, 2016, 5:16 PM by Mia Costa   [ updated Oct 21, 2017, 4:02 PM ]
In a dataset, survey weights are captured by a variable that indicates how much each respondent should “count.” For example, a respondent who has a 1 for a weight variable would count as a single respondent; a respondent who receives a .5 for a weight variable should count as half a respondent; and a respondent who has a 2 would count as two respondents. Use the survey package for weighting statistical analyses.

The 2010 UMass Poll Massachusetts Exit Poll (http://people.umass.edu/schaffne/mass_exitpoll.dta) includes a variable called"weight.” Respondents in the exit poll were weighted to account for the stratified nature of the sampling procedure. By stratifying towns by geography and other factors, not every voter had an equal chance of being sampled. Thus, the weight variable adjusts for this fact.

First, just one time at the beginning of your session you need to create a survey design object to tell R that your dataset includes a weight variable. This is done with the svydesign function: 

dat <- read.dta("mass_exitpoll.dta") 

svy.dat <- svydesign(ids = ~1, data = dat, weights = dat$weight)

The arguments data and weight specify the data and variable that contains the weights. ‘ids’ is to specify sampling cluster membership. Since we aren’t accounting for the cluster sampling here, we set ids = 1 to indicate that all respondents originated from the same cluster (or primary sampling unit, PSU). Once you have done this, R now knows which variable to use to weight the data (and how to apply those weights). However, you still have to ask R to weight the data whenever you run an analysis.

For example, let’s say we wanted to look at the vote for Governor. If we just tabulated the variable by proportion, we would get the following: 


However, the above tabulation does not weight the data to adjust for the sampling approach. To do this, you need to use survey’s analytical functions, which is usually just svy before the default function, a tilde ~ before the variable or formula, and the survey design object specified (see the package description for the full list of survey functions). So, now you can do the same tabulation but apply the weights to it:

prop.table(svytable(~qc_gov, design = svy.dat))

Notice that applying the weights had a pretty big effect on the results. With the sampling weight accounted for, Deval Patrick receives 52% of the vote, much closer to what he actually received in the election.

Next post will go over how to create the weights yourself using post-stratification raking!