|Date: ||Sun, 3 Jul 2005 22:49:22 -0400|
|Reply-To: ||Richard Ristow <email@example.com>|
|Sender: ||"SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>|
|From: ||Richard Ristow <firstname.lastname@example.org>|
|Subject: ||Re: SPSS perfromance issue|
|Content-Type: ||text/plain; charset="us-ascii"; format=flowed|
At 09:32 AM 7/3/2005, Timothy Hennigar wrote:
>I used to make an effort to be parsimonious with my [SPSS] code:
>naming many variables with one variable label statement, adding all my
>labels with one value label statement, doing one recode over all my
>variables again with one recode command, creating variables using
>vector, or in do repeat loops, etc etc ...
>[I have been] developing a program to generate my code for me. [It]
>generates several files of code - it creates the files first, adds
>cleaning code, checking code, variable creating code, labelling code,
>ctables construction, etc and ultimately saves a final data file.
>But I find it vastly simpler in writing this program to have the
>computer generate code - lets say - on a variable by variable
>approach. My thinking has been that I am not so concerned about how
>pretty my files look [...] Each variable gets its OWN unique variable
>label statement, value label statement, etc ...
"Pretty" can be argued both ways. I use SPSS code-generating code
myself, and have approached it your way, for about the same reasons. A
code snippet can look like,
STRING ACC_TBL (A8).
- COMPUTE ACC_TBL = SUBSTR(LTRIM(ABRV_TBL),1,8).
- VARIABLE LABELS ACC_TBL 'Access table that data comes from'.
- VARIABLE WIDTH ACC_TBL (6).
- VARIABLE ALIGNMENT ACC_TBL (LEFT).
VARIABLE LABELS REGION "REGION code".
- FORMATS REGION (A2).
- VARIABLE WIDTH REGION (5).
- VARIABLE ALIGNMENT REGION (LEFT).
VARIABLE LABELS SUB_CODE "SUB_CODE, within REGION".
- FORMATS SUB_CODE (A1).
- VARIABLE WIDTH SUB_CODE (5).
- VARIABLE ALIGNMENT SUB_CODE (LEFT).
VARIABLE LABELS CLIENT# "Client number, within SUB_CODE".
- FORMATS CLIENT# (F6).
- VARIABLE WIDTH CLIENT# (5).
- VARIABLE ALIGNMENT CLIENT# (RIGHT).
This is all variable-definition syntax. It would be much more compact
written the way you used to. It might or might not be more readable;
having each variable defined completely in one place helps readability,
>Will performance - running time - suffer appreciably because of the
>way I now have my code - will performance suffer more the larger the
>file (or data file) size, etc ... if so, how much of a penalty are we
There's a theoretical performance penalty: the parser has much more
code to process. But I doubt that you'll ever see it; the parser is
fast. There'll be no dependence on the number of cases in the file.
>AND I have quite a few more 'execute' statements along the way ...
Now, THAT's another matter. An EXECUTE statement costs a fairly large
amount, roughly proportional to the file size (product of variables and
For really small files it won't matter, especially if they're small
enough to cache in RAM. For big files, it can matter a lot, no matter
what machine you have.
Usually, design can eliminate most EXECUTEs. For your case, we'd have
to know a lot more about your logic, and particularly why the EXECUTEs
got there in the first place.