Date: Wed, 23 Oct 2002 16:11:45 -0400
Reply-To: Quentin McMullen <QuentinMcMullen@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Quentin McMullen <QuentinMcMullen@WESTAT.COM>
Subject: Re: data dictionary in SAS?
Content-Type: text/plain; charset="iso-8859-1"
William Kossack [mailto:kossackw@NJC.ORG] asks:
> I'm working on several large studies with large datasets and I've been
> asked to setup a data dictionaries for them.
>
> In SAS?
>
> Where do I start? Please give me a clue?
I think you need to start by thinking about what information you want to
have in the dictionary, what organization you want it to have, and what
format(s) you want (ASCII? PDF? html?).
These questions take a lot of thought, and it's nice to have them answered
before starting to code. When I think data dictionary, I think descriptive
information on each variable (what could that mean? perhaps frequencies,
perhaps mean/sd/min/max, perhaps UNIVARIATE... ). But I think in terms of
a data dictionary that I would give to an analyst. If I were making data
dictonaries for programmers, I would want to include information on the type
of each variable, length, format, etc. In any case, a good data dictionary
is a great asset to a project/division/company. It's worth spending time
thinking about who the users of the data dictionary will be, and what their
needs are.
Once you have designed the layout and content for the dictionary, coding
should flow from that. I'd think in terms of FREQ, MEANS, and UNIVARIATE to
get summary statistics for each variable. And plenty of use of dictionary
tables (esp dictionary.tables and dictionary.columns). And then ODS gives
you plenty of options is terms of file formats.
There are lots of fun challenges, for example, assuming you want to give
FREQs on character variables, what do you want to do when a character
variable has 200 different levels? What's a good way to summarize a SAS
date variable (probably don't want to summarize BirthDate as average number
of days since 1/1/1960)?
Were I to start working on a %MakeDictionary macro, I think I would approach
it as a structured-programming exercise. There are lots of little tasks,
and each one would get it's own little macro (%FindVarType, %DescribeNumVar,
%DescribeCharVar, %DescribeDateVar... ) [My thanks to Ed Heaton for
introducing me to such organization.] Without that, I'd fear the entire
macro would get out-of-hand.
I hope this hasn't been so general as to be unhelpful. I'll look forward to
seeing other people's thoughts, and perhaps references to existing macros???
Kind Regards,
--Quentin
|