Date: Wed, 1 Apr 2009 09:13:45 -0500
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Any ideas for smart way of including text responses into
existing data set -- re: recoded variables?
Content-Type: text/plain; charset=ISO-8859-1
The point is not to replace your survey with free text responses entirely.
That's lazy, bad research. However, including free text responses in
surveys can (sometimes) be very useful in discovering things you didn't know
to ask. Again, I'm not talking about well-controlled scientific studies;
although I am not a scientist, I certainly could see the problem there. But
in a fact-finding study (such as a customer service satisfaction study, or
as Mary noted a fact-finding medical research study) they definitely have
You also typically find a lot of people that answer "other" to questions
really mean one of your intended responses, and you can code them back up to
those responses; you could just eliminate other, but particularly in
situations where other is a valid response ("Was your recent doctor's visit
a) to a Primary Care doctor, b) to an Internal Medicine specialist, c) to a
Dermatologist, d) to an OB/GYN, e) to the Emergency Room, or f) Other
Specialist"), where you don't want to list every potential kind of doctor,
you'll find responses of f) Other Specialist where if you include a free
text field they list "Gastroenterologist" (which you might consider Internal
Medicine), "Family Doctor" (primary care), etc.; clearly you want to recode
those back to the original data and not lose valid responses, especially if
you are working with a small sample size.
Also, you have "unaided" answers. For example, imagine this study:
"Please describe any symptoms you are feeling right now related to PTSD."
and then follow the question up with
"Please check which of the following PTSD symptoms you are feeling right
( ) Anxiety ( ) Sleeplessness ( ) Depression ( ) Suicidal Thoughts (...
If you'd put the second question solely in the survey, I guarantee you'd
find a different result than if you put them both in. This is standard in
market research, where the goal is to find which brands (say) a consumer can
mention off the top of their head, and then list the brands of interest; not
only to find out brands that we might not have included in the survey (say,
some local brand that we weren't aware of, or a small brand that is
performing better than expected), but also because knowing what people think
of off the top of their head is useful. If 80% of people recognize your
brand name, but only 5% list it off the top of their head when asked, you're
probably not doing as well as if it's 60% recognize and 40% list it off the
top of their head. That would be the difference between Chick-Fil-A and
In&Out Burger, I'd suspect [one is a national brand with low awareness but
high recognition due to an effective advertizing campaign, while the other
is a brand with only super-regional presence but high awareness in that area
- and no, I'm not basing this off any real survey.]
Anyhow, I think to some extent this comes down to the type of research
you're doing, and thus the differences in opinion ;) I certainly woudn't
suggest my fiancée (an immunologist) do research with free-text questions,
were she to do any sort of human survey research, but market research and
some less controlled health research certainly make good use of free-text
questions, and find more value than you'd imagine. :)
On Wed, Apr 1, 2009 at 8:47 AM, Kevin Viel <email@example.com> wrote:
> On Tue, 31 Mar 2009 16:50:45 -0500, Mary <mlhoward@AVALON.NET> wrote:
> >I agree that free-text responses have their place, even in medicine.
> >I once worked on a Persian Gulf War Syndrome Study (the
> >first Iraq War); there were thousands of exact multiple choice
> >questions, but then there was also a transcript of a 15 minute
> >interview given with each veteran (both those in the Gulf War
> >and those in other war situations like Bosnia, and also those
> >with "Gulf War Syndrome" and those without.
> >Several years later, there was an anthrax scare on the media
> >and Congress, and the question came up at the time, would
> >giving mass anthrax vaccinations be safe? This
> >was never one of our study questions, but then it took on
> >great importance and the free text questions gave us valuable
> >information at that point.
> >Variables with categories can be coded from free-text questions,
> >but you can't always think of everything you will do in the first
> >place, and free-text questions play a valuable role in developing
> >the next study, adding more variables to capture something mentioned
> >in the free text question.
> I still disagree, strongly.
> What is the budget for the NIH? "The NIH invests over $29 billion
> annually in medical research for the American people."
> For a nation of 300+ million with many top universities, that is a paltry
> When something "comes up", it should be worth the effort to pursue it.
> This includes a new, appropriate survey and adequate sample size.
> Sure there are first line of investigations. I am currently employed in
> such an effort.
> How do you deal with standardizing the responses? What if someone choses
> not to write or fails to recall? How will you know? What is the
> hypothesis and how does the study/question design serve it?
> "I took an aspirin"
> "I took aspirin b.i.d"
> Great. 400 mg? 88 mg? Half of 88 mg?
> What if veterans who failed to answer had mental disorders, such as
> depression, more often than those who answered a paragraph or more? What
> defines a missing response? How can you validate it (against other survey
> We spend $100 million to send the Casini satellite to Saturn. Do we have
> a study of human health outcomes that even comes close to that amount?
> Sure that HapMap might have cost $50+ million, not quite a study and its
> utility is very dubious. The Mayo Clinic's database of the people in the
> county in which it is located must run near that. DeCode?
> We are finding humans and their diseases are mostly complex. Sorry if the
> studies to learn about them also require immense planning, time, and
> Let me just illustrate: stipulate that the lifetime prevalence of major
> depressive disorder is 13% in the general, non-institutionalized
> populaiton of the US. The affected patient may suffer debilitation. Not
> one "hard" measure has yet to be found that might be used to discriminate
> cases. Read the DSM-IV and you will find a list that I find mostly
> subjective and certainly not easy to validate.