Date: Mon, 19 Jan 2009 10:58:27 -0600
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Trim data by x characters
In-Reply-To: <c2192a610901190832r7a960017g2927711c9554f157@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Indeed, the descriptions of the code values would be immensely helpful in
designing a solution.
For the particular example listed below, I'd use a PRXCHANGE. Let's see if
I can construct the prx for this:
newvar2=prxchange("s/(code\d+)(\w+)/$2/",1,var2);
would match anything that had (code#) or more at the start; if it needed to
allow just code, you would replace + with * :
newvar2=prxchange("s/(code\d*)(\w+)/$2/",1,var2);
So, this would match it:
data test;
format var1 $2.;
format var2 newvar2 $50.;
input var1 $ var2 $;
newvar2=prxchange("s/(code\d*)(\w+)/$2/",1,var2);
datalines;
AA code12thedescription
BB code4thedescription
CC codethedescriptiondone
;;;;
run;
proc print; run;
Any other definition of code would necessitate replacing the first
expression inside the ( ) with something else, of course.
If there is a delimiter, it's easier:
data test;
format var1 $2.;
format var2 newvar2 $50.;
input var1 $ var2 $;
newvar2=prxchange("s/(\w+)_(\w+)/$2/",1,var2);
datalines;
AA code12_thedescription
BB code4_thedescription
CC code_thedescriptiondone
;;;;
run;
proc print; run;
where _ is the delimiter. (\w+) means match one or more word-like
characters (alphanumeric) including underscore, but we force it to find an
underscore separately. Note that it will find the LAST underscore only, so
if there are actual underscores in legitimate values, that will cause a
problem and you'd need to use for the first part [a-zA-Z0-9] instead of \w
in order to make sure the first underscore doesn't count:
newvar2=prxchange("s/([a-zA-Z0-9]+)_(\w+)/$2/",1,var2);
-Joe
On Mon, Jan 19, 2009 at 10:32 AM, SAS_learner <proccontents@gmail.com>wrote:
> Hello jaheuk,
>
> How would your solution would work if there no delimter in the search
> string
> (Just thinking of the possibilites) , Say like your example Var2 values are
>
> Var1 Var2
> AA code12thedescription
> BB cod4thedescription
> CC codethedescriptiondone
>
> I think If Origonal Poster sends us How the description strings are then we
> can write better code snipnet .
>
> thanks
> SL
>
> On Mon, Jan 19, 2009 at 4:56 AM, jaheuk <hejacobs@gmail.com> wrote:
>
> > DATA STEP solutions:
> > ---------------------------------
> >
> > DATA one;
> > input var1 $ var2 $ 30. ;
> > cards;
> > AA
> > ;
> > RUN;
> >
> > DATA two;
> > SET one;
> > var3 = substr(var2,1,index(var2,'_')-1);
> > RUN;
> > *******************************************************;
> >
> > DATA one;
> > input var1 $ var2 $ 30. ;
> > cards;
> > AA 12_thedescription
> > BB 456987_thedescription
> > ;
> > RUN;
> >
> > DATA two;
> > SET one;
> > var3 = input(substr(var2,1,index(var2,'_')-1),best.);
> > RUN;
> >
> >
> >
> >
> > Regards,
> > H.
> >
> >
> >
> >
> > On 19 jan, 09:48, mikeymay <mikey...@gmail.com> wrote:
> > > I have a dataset that holds a variable with a prefix code prior to the
> > > description.
> > >
> > > I want to trim the variable by the number of characters the prefix
> > > code has but the prefix code can be 4, 5, 6, 7 or 8 characters long.
> > >
> > > Is there a way of trimming the data using a procedure?
> > >
> > > Thanks
> >
>
|