Date: Fri, 8 Dec 2006 09:16:54 -0500
Reply-To: "data _null_;" <datanull@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "data _null_;" <datanull@GMAIL.COM>
Subject: Re: finding a string
In-Reply-To: <1165574706.266949.219190@16g2000cwy.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
While I like the ideas already proposed to fish for the date SAS does
provide ways to actually locate strings of this type. I used the old
RX functions because I'm using V8.2 and have the documentation handy.
data work.parts;
infile cards eof=eof;
rx = rxparse("$d[$d] $'-/' $d[$d] $'-/' $d$d[$d$d]");
do while(1);
input CODE:$4. DESC&$40.;
s=0; l=0;
call rxsubstr(rx,desc,s,l);
if s then date = input(substr(desc,s,l),ddmmyy10.);
output;
end;
return;
eof:
call rxfree(rx);
stop;
format date ddmmyy10.;
cards;
A100 AAAAAA0 PRODUCTION - STOP 4/06/2006
A101 AAAAAA1 NO LONGER PRODUCED 04/7/06
A102 AAAAAA2 STOP - 4-7-06
A103 AAAAAA3 04-7-06 PRODUCTION STOP
;;;;
run;
proc print;
run;
On 12/8/06, alves <alves.paulo@gmail.com> wrote:
> Hi everyone.
>
> Quick question.
>
> I have a dataser of approx. 10.000.000 observations that has 2
> variables. CODE and DESCRIPTION. do not know why, but the person who
> created this dataset instead of creating a new variable to extra
> information, just added it to the DESCRIPTION. I have a list of the
> most common strings that were added and I manage to filter them out. My
> problem is with dates. An example.
>
> CODE DESC
> A100 AAAAAA0 PRODUCTION - STOP 4/06/2006
> A101 AAAAAA1 NO LONGER PRODUCED 04/7/06
> A102 AAAAAA2 STOP - 4-7-06
> A103 AAAAAA3 04-7-06 PRODUCTION STOP
>
> This is a simplified version, AAAAAA can be any string (a product
> name).
>
> the date can basically appear in any format and anywhere!! after
> anything... the only reference point I have is the "/" or "-" ...
>
> So after looking to this mess, I want to do two things. 1) Clean it; 2)
> Kill the person who did it, after a long torture!!!!
>
> Thanks in advance.
>
|