Date: Thu, 10 Jul 2003 11:33:47 -0400
Reply-To: Mike Rhoads <RHOADSM1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mike Rhoads <RHOADSM1@WESTAT.COM>
Subject: Re: SAS not recognizing spaces in large variable
Content-Type: text/plain
Debbie,
In quick answer to your "space loss" problem -- the $ informat is
deliberately designed to discard leading spaces. Since you want to keep
them, use the $CHAR informat instead, which will retain the leading spaces:
INPUT ... @2887 bigvar $CHAR4630. ... ;
Note that the same is true on output -- $ drops leading blanks. If you want
to keep your carefully-retained blanks in any raw output file you are
creating, use $CHAR4630. rather than $4630. in your PUT statement as well.
HTH!
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: Debbie Cooper [mailto:debbie.cooper@STEVENSONCOMPANY.COM]
Sent: Thursday, July 10, 2003 11:20 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SAS not recognizing spaces in large variable
I read data from File A with column input. It contains (in addition to
others) fields id and product number. It also has multiple records per id.
Later on I receive File B. It has a field ID (as well as others) and has
one record per ID. I merge the two files so that I now have File C with
multiple records per ID but each record will have a different product number
but the same information that came from File B will be duplicated for each
record. Now I need to take the new File C and split it out into 200 Sas
datasets based on product number. I have a couple of problems going into
this. The first is that File B contains about 500 fields that will be
common to all 200 datasets. So I need these fields in the 200 datasets.
This is so we don't have to change our existing reporting structure. The
second is that in the middle of File B is a block of variables about 4000
characters in length that are specific features to a product. So for
example, product 1 will always find its features in columns 2887-2955,
product 2 will have its features in columns 2956-3018 and so on. Previously
we received this data in one file with multiple records per respondent and
split it out using a Perl script. The problem was that we then read the
split files into SAS input programs and we had to keep cross-referencing
column positions. So if feature 1 for product 1 was on the file layout
given to me as being in column 2887, I had to know that in my product
specific SAS input program, I read in the first 500 variables that are
common to all products and feature 1 should start in, for example, column
501. This was a real mess anytime features were added. I'd like to keep
the same column positions that are in the file layout. So what I've done is
this (in a simplistic example) for FILE B.
FILENAME FILEB ("c:\a-detrep\smallinputfile.txt");
data detail;
infile FILEB LRECL=8000 PAD ;
INPUT
@2 id 7.
@37 firstvar 1.
@38 secondvar 4.
.
.
.
@2886 xvar 1.
@2887 bigvar $4630.
@7260 nextvar 1.
.
.
.
I then merge this file with FILE A to get FILE C and then do this:
FILENAME PRODFILE "c:\data\test.txt" ;
data _null_;
file PRODFILE ls = 8000;
set detailaftermerge; /* File C */
put
@1 id 7. @8 productnum 3. @2887 bigvar $4360.;
;
run;
The idea being that I would then use input statements for each product
program to pull from test.txt and then merge the resulting dataset with
detail to get the common fields dropping the bigvar. I know this seems like
a really inefficent approach but there are two purposes to this: One is
that the SAS input programs for the 200 products are already written since
this was done years ago. The thought is that all I would have to do is
strip out the common fields in each input program and then put the correct
column positons in. The second is that we don't want to do any more
crossrefencing of column positions. So if feature 1 of product 1 is on the
layout as being in column 2887, we want to keep it at column position 2887.
My basic problem is that bigvar looses spaces when I read it in. The SAS
input is not recognizing spaces within that variable and "shifts" values
within the variable. I'm open to any suggestions as to how to do this more
efficiently.
Thanks,
Debbie