Date: Tue, 25 Oct 2005 21:03:56 -0400
Reply-To: Iris Hui <iris_hui@BERKELEY.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Iris Hui <iris_hui@BERKELEY.EDU>
Subject: Re: Sample Weights
I really want to thank you for your thoughtful response. You always make my
life complicated---in a good way. :-)
>>My first question is: in order to create one weight variable for the
>>cumulative file, is it acceptable to put these different weights into one
>>column and just call that 'final weight'? If not, is there a better way to
>My answer is 'maybe'. It really depends on a *lot* of questions. For
>What is going to be done with the data?
>What analyses do people want to perform with this 'stack' of surveys?
--I think the cumulative file will be used to look at how things changed
overtime. Say, the relationship between income and the likelihood to vote
Rep, or relationship between union membership and likelihood to vote Dem,
whether the relationship strengthens or dampens overtime.
As far as statistical analyses are concerned (like what kind of time-series
models will be used to address the temporal effect), I don't know yet.
>How were the surveys taken?
>Which 'weights' are real survey sample weights, and which 'weights' are
>about reasonable weights, and which 'weights' are actually _post_hoc_
>computed from raking?
>Do some surveys have strata or clusters, and just how different are the
>How reliable are the surveys, and how 'valid' are the results, and who took
>polls each time?
>What is the target population for each sample, and what is the actual
>population for each sample?
>Are any of the surveys 'snowball' samples or 'convenience' samples?
>Are the survey responses: exactly the same, almost the same, sort of the
>sort of different, or totally unrelated?
--These I know better. All the surveys were done using stratified random
sampling. They are representative sample of adults over 18. In terms of
reliability and who took these polls, sorry, I can't reveal too much here.
But say, the sampling for these surveys observed the highest standard in
the field. Highly reliable, scientific and well respected.
>Okay, I have a few more concerns, but these are a starter set. Show these
>to the people who asked you to slap the survey together, and show them the
>If the target populations are not the same, there is no reason to think the
>weights should match up, or that the results should be comparable. You'll
>never be able to tell whether any differences are due to the differences in
>year, or the differences in the populations interviewed, or the differences
>in several other things.
>If any of the surveys are snowball samples, convenience samples, or samples
>that had to be raked afterward, then trying to relate them to the results
>properly-drawn samples is just a huge, expensive waste of time. Any
>you find is likely to be a consequence of the biases introduced by the bad
>sampling process, and this cannot be separated, no matter how many times
>people wave magic wands over the data murmuring mystical phrases like
>'raking' and 'Heckman two-stage model' and such.
>Biases due to the organiztions paying for and organizing the samples cannot
>ignored either. This can be a major headache in any analysis.
>If the response is not *precisely* the same (and the questions used to get
>that response are not *precisely* the same), then you may end up with
>questionnaire biases instead of real effects.
--target population remains stable overtime and all surveys are done with
great precision. hm...can I sleep better at night?
>After all this, we *still* don't have the weights. Longitudinal
>(repeat, may) need the weights scaled as is. Different types of evalutions
>may need the weights scaled so that cross-survey totals end up with the
>correct population numbers. So the weight issue cannot be settled until
>you actually know what analysis is needed.
>>My second question is: more surveys were done during the election years
>>(usually 5-6 polls per year) and fewer were done during off-years (3-4 per
>>year). If I put them into one cumulative file and do over-time analyses,
>>I giving more weight to polls done in election-year? Do I need to create
>>another weight variable in the cumulative file to adjust for the uneven
>>number of surveys done over time?
>Is that vague enough? I think that you're going to have to deal with a lot
>of temporal issues here. How exactly are you going to analyze the data
>time'? If you're treating some response as a repeated measures problem,
>then you may instead want to treat the intervals between surveys as
>non-constant. It really depends on what the heck is going to be done with
>the data, and how the survey information is going to be treated. Make
>someone come up with some concrete decisions on how to analyze these
>I would consider NOT analyzing them like this (at least not yet) and
>considering them as a sequence of point estimates with known (computable)
>standard errors. Maybe meta-analysis would be a decent way of approaching
>these results first. I'd want a lot of exploratory data anlaysis first
>decided what would be a reasonable approach. You can't just treat these
>as panel data from a single longitudinal study.
--hm...say I want to look at the relationship between income and likelihood
to register as Rep over time. How should I approach this question? OLS
can't be used. At the same time, I need to control for a bunch of
confounding factors, such as education, sex, age, marital status etc, in
order to isolate the effect of income on one's likelihood to register as
>>Any advice would be highly appreciated! Thanks!
>Make the people who said "put some surveys done over the last 40 years
>do some work here! This is a non-trivial problem, and just cleaning up the
>and preparing the data will take time and care. The analysis of these data
>be problematic, to say the least. If they think you can slap stuff
>week so they can use PROC REG to look for trends, then show them this note.
>If they don't get the hint, have this note bronzed, and then hit them over
>head with it. :-)
--haha. I wish. But I need a job to pay bills. :(