```Date: Wed, 18 Oct 2006 23:33:12 -0700 Reply-To: David L Cassell Sender: "SAS(r) Discussion" From: David L Cassell Subject: Re: Survey data analysis Comments: To: stringplayer_2@YAHOO.COM In-Reply-To: <20061012235310.76348.qmail@web32205.mail.mud.yahoo.com> Content-Type: text/plain; format=flowed stringplayer_2@YAHOO.COM wrote: > >Hi all (and David in particular), > >Just when I think I have survived the summer from h***, I find >myself stuck with an analysis problem where I simply don't know >what I need to do. Following is a description of the data that >I have. Dale, it's good to hear from you again! I was getting kind of worried, since you weren't answering private emails to your stringplayer address. Well, at least you weren't answering mine. :-) :-) >Students in essentially all colleges and universities in a >particular region of the country were surveyed about smoking >behaviors before and after an intervention that took place in a >randomly selected set of half of the schools. Within each school, >surveys were administered to freshmen, sophomores, juniors, and >seniors with different probabilities of selection. The probability >of selection differed between baseline and follow-up. > >School size determined how many students received the survey. >At most, 750 freshmen were selected at random to receive the >survey in the baseline period. If there were fewer than 750 >freshmen in a particular school, then all freshmen received the >survey. If there were more than 750 freshmen, then a random >sample from the registrars list was selected to receive the survey. >For sophomores, juniors, and seniors, the maximum number to receive >the survey in each class was 200. Just as with the freshmen, if >there were fewer than 200 in a class, all in that class received >the survey. > >Baseline freshmen who responded to the survey were sent the >follow-up survey which was administered two years later following >an intervention is some schools. In addition, a random sample >of up to 200 from each class received the final survey. Again, >if the class size was less than 200, then all in the class >received the survey. > >The survey administration rates are very high, especially in smaller >schools. Of course, the number of surveys returned is another matter. >The number of surveys returned ranges from about 25% to 50% of the >surveys administered. > >This study then combines elements of a complex sampling design with >an experimental design. I believe that the sample design takes >precedence over the experimental design. However, when it comes >to analysis of data from a survey sample, I am a pure novice. With >that in mind, I have a few questions. > > 1) What exactly is the design here? We have clusters (schools) > and within each school at each survey time point we select > students in a particular class without replacement. Okay, I agree here. I doubt the survey intrument went to every possible school, so there may be cluster issues at your first stage, and there may be non-response issues at stage 1 also. (Are the schools not surveyed different in meaningful ways from those that participated? That may affect your definition of your target population.) Let's just start with the first time point only. We have to do a couple things: [1] find out if all schools were surveyed; [2] decide based on the subject-matter experts' opinions whether to treat this as a random sampling process, or as a division into a sampled/target population and an unsampled population; [3] based on #2, decide how to adjust weights. Now we move on to stage 2 and we sample students within schools. You have what sounds like a stratified sample in each school, even if the sample turns out to be a census some of the time. > 2) How do I construct the weights when there is survey nonresponse? > If there were no nonresponse, then weights would be calculated > as the number of students in the school/class/time point > combination divided by the number of surveys administered > in the same combination, right? But when we have nonresponse, Right. (Assuming equal selection probabilities.) > my understanding is that the survey weights take a more > complex form. We don't just use the number of returned surveys > for a given school/class/time combination as the denominator > when computing the survey weights. Is that correct? This depends on how your Principal Investigator wants to treat the non-response. My personal view is that in cases like this we have to assume that the sampled population may be substantively different from the non-response group. In that case, I typically try to get samples from a small but random subset of the non-responders, using whatever means are available (although sending them to Abu Ghraib is usually considered a last resort. :-) so that we can make some non-response bias adjustments. I'm usually way too far down the line to get that. ("Are you nuts? We did that survey four years ago, so it's way too late for that. What? So it took a while to get the data put together and cleaned...") At that point, I usually advocate to treat the non-response group as a separate unsampled part of the overall population. We estimate the size of that sub-population, we caveat the reports as not having access to that portion of the population, and we do estimates on the sampled sub-population only. So, if you go this route, you leave the weights alone because you are shrinking the 'target' population. The sum of the weights is now your estimate of the size of the target population, and the sum of weights of the non-responders gives you the size of that sub-population. The alternative is to pretend that the non-responders are exactly like the responders, but just had a bad hairday or something and couldn't come out of the bathroom to fill out the survey. In that case, you end up having to adjust the survey weights upward. Typically, it's done in a group-by-group fashion, where 'group' is deliberately super-vague here because I usually try to aggregate to the highest level that is reasonable (as decided by the subject-matter experts and/or the survey design). > 3) Based on the design that is specified for 1), what statements > (and options?) are required for the SURVEYLOGISTIC procedure? Well, the key point you need to have is that population totals, CLUSTER variables, STRATA variables, etc. need to be based on stage 1 of the sample. The remaining variablility is done under the hood. But the weights have to be computed for each stage of the sample, adjusted for each stage, and then multiplied across stages to get a weight that scales from student up to the region. If you're going to be at PNWSUG in 12 days, we can talk about this in more detail. HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ Find a local pizza place, music store, museum and more…then map the best route! http://local.live.com?FORM=MGA001 ```

Back to: Top of message | Previous page | Main SAS-L page