Date: Tue, 13 May 2003 16:39:06 -0400
Reply-To: Richard Ristow <firstname.lastname@example.org>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <email@example.com>
Subject: Re: Altering data files while importing them
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 09:39 AM 5/13/2003 +0200, L Johansson wrote:
>I wonder if it is possible to write a script that will take care of
>the entire process of opening a new data file and formatting it. I.e.,
>what I'd like to achieve is for SPSS to 'automatically' give variables
>with certain names a
>pre-defined type and length. For instance, the variable 'Sex' should
>always be imported as string accepting six characters.
>However, the data files in tab separated ASCII format that I have to
>import are far from identical. The variables they contain vary from
>case to case [i.e., it seems, from file to file, not "case to case" in
>the SPSS sense]. [I want a script to] define and alter variables that
>exist in each individual file and ignore instructions that concern
>variables that are non-existent.
>I hope what I've written makes sense.
I've done things like this myself, but for import from ACCESS rather
than from tab-delimited files.
What you're trying to alter is the *DICTIONARY* information in the data
-- the properties of the variables, like length and format -- rather
than the DATA. (For the latter, you'd add transformation commands after
the import, as I'm sure you know.)
I'm not a "scripter" myself, so I'm not sure what tools scripting
offers; but there are other ways. In general, you
a.) Read the "dictionary" information you have into an SPSS file, or
some other database system. In this file, each variable in your data
becomes an SPSS 'case', and the attributes -- name, variable label,
format, etc. -- become SPSS variables.
In your case, it sounds like the variable names are in the files, maybe
as the first rows, and can be read in.
b.) Compute any attributes you can, that you don't have from the file.
In your case, for example, if the variable is named "SEX", you can
assign for format A6.
c.) From this file, write SPSS code to declare the variables. In
general, each variable should have
. A NUMERIC or STRING statement, giving its type and format; this also
defines the length of string variables
. A VARIABLE LABEL statement, if meaningful
. Possibly, VARIABLE WIDTH and ALIGNMENT statements.
For example, my own code writes a lot of variable definitions like this:
VARIABLE LABELS COUNTY "FN - COUNTY CODE".
- FORMATS COUNTY (A2).
- VARIABLE WIDTH COUNTY (5).
- VARIABLE ALIGNMENT COUNTY (RIGHT).
(Note that there's NOT a "STRING" statement. In my case, the data type
is set by the ODBC code that reads the variables from ACCESS.)
It's not child's play, but you can write SPSS syntax that will read the
dictionary, write the SPSS code for it, and write the DATA LIST command
to read the data, and run that against each file you have. I'm not sure
there's anything in scripting that would make it easier.