LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Dec 2003 11:28:31 -0500
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Debugging SAS with many variables (newbie)
Comments: To: Carl Kyonka <Carl.Kyonka@ENBRIDGE.COM>
Content-Type: text/plain

Carl: Been there, done that. What a pain! Two comments: First, more than thirty variables per dataset (table) virtually guarantees bad database design. You may want to start restructuring/reshaping your data into linked datasets. The SAS INPUT statement supports restructuring. The column pointers read data efficiently from different positions in the file layout. For example, INFILE DS1; INPUT @1 key1 $char10. @44 key2 $char7. @296 date mmddyy10. @305 value 8.

INFILE DS2; INPUT @44 key2 @222 outcome 8.

If you separate your data into sets of directly related variables, you'll find it easier to find data type errors (the obvious problems). You will also find less obvious problems in data integrity. Putting sufficient keys in each dataset makes combining data a SAS MERGE or JOIN problem that SAS handles very precisely and efficiently.

Repeating groups of variables and comments tend to multiply the number of variables in a dataset and make data difficult to normalize. Try writing the INPUT specifications for the first group with a suffix of 1 or 0 in the variable names, test that, and then copy the specifications for the first group and change the suffix to 2, 3, ... A SAS macroprogram will generate specifications for repeating groups automatically, but may prove to be too complicated for infrequent use. Use PROC TRANSPOSE to normalize repeating groups of variables.

Second, when I receive a really large and messy system file, I use DBMSCOPY to view the data and specify field positions, lengths, types, etc. SAS now offers DBMSCOPY as a separate product. If you have to capture data from input files, DBMSCOPY helps speed up the process. Sig

-----Original Message----- From: Carl Kyonka [mailto:Carl.Kyonka@ENBRIDGE.COM] Sent: Thursday, December 04, 2003 10:39 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Debugging SAS with many variables (newbie)

I have a file with mixed numeric and alphabetic variables. I am slogging my way through writing an INPUT statement to read these records. I get errors with a somewhat helpful dump of the input record and the variables as they stand when an error occurred. My difficulty is that there are lots of variables and finding the ones I want is awkward. I can use the find command to locate each variable in the sequence they are listed in the INPUT statement, but that is time-consuming. This may be well documented, but I do not know what to call this record dump so I have not found it in on-line doc. What I would like is an option to sort the variables being dumped either alphabetically or in the INPUT sequence. Or a tactic to help debugging the INPUT statement. Thanks for the time, Carl Kyonka

Back to: Top of message | Previous page | Main SAS-L page