Date: Fri, 22 Nov 2002 10:10:58 -0500
Reply-To: Larry Bertolini <bertolini.1@OSU.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Larry Bertolini <bertolini.1@OSU.EDU>
Organization: Ohio State University
Subject: Re: Parsing a comma delimited column
Content-Type: text/plain; charset=us-ascii
One caveat to keep in mind, when working with the SCAN function...
If you don't explicitly specify a length for character variables, SAS
assigns a length, based on its "best guess". For example, in the
following data step, variable "foo" is assigned a length of 6 bytes:
foo = substr("abcdefghijklmnop",1,6);
In my experience, variables created by the SCAN function tend
to be assigned a length of 200 bytes.
So you might want to add:
length varA $6 varB $4 varC $2;
> You could use the scan function to seperate the variables. Scan returns
> character variables so I included some lines to convert varB to numeric
> because you refered to '_' as being missing value.
> Hope this gets you started.
> 1 data test;
> 2 length var1 $ 16;
> 3 var1 = "abcdab,1984,ca";
> 4 output;
> 5 var1 = "abcdab,_,ca";
> 6 output;
> 7 run;
> 9 data new(drop=var1);
> 10 set test;
> 11 varA = scan(var1,1,',');
> 12 varB = scan(var1,2,',');
> 13 varC = scan(var1,3,',');
> 14 if varB = '_'
> 15 then varBn = .;
> 16 else varBn = input(varB,$4.);
> 17 run;
> 19 proc print data=new;
> 20 run;
> Jellinghaus To: SAS-L@LISTSERV.UGA.EDU
> <michelle@EMODE.C cc:
> OM> Subject: Parsing a comma delimited column
> Sent by: "SAS(r)
> 22/11/2002 09:44
> Please respond to
> Hi folks,
> I have a datafile, pulled from an oracle database with an SQL query, that
> has a few columns which contain comma delimited value strings of varying
> length. I would like to read the values within these columns into SAS as a
> series of individual variables/columns by splitting the string on the
> Here's a sample:
> VAR1 is one of many variable of its kind within a single data set.
> VAR1 has values like "abcdab,1984,ca"
> The objective is to parse this string (housed within the existing VAR1
> column) into 3 separate variables:
> varA: "abcdab"
> varB: "1984"
> varC: "ca"
> If a value is missing, it is represented by a single underscore. For
> if VAR1 = "abcdab,_,ca", then
> varA = "abcdab"
> varB = "_"
> varC = "ca"
> thereby shortening the string from a max possible of 14 characters to
> characters due to a missing value (varB) which could be up to 4 characters
> in length.
> Does anyone have any suggestions?
> Thank you,