LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 19 Jan 2009 10:58:27 -0600
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: Trim data by x characters
Comments: cc: mikeymay@gmail.com
In-Reply-To:  <c2192a610901190832r7a960017g2927711c9554f157@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Indeed, the descriptions of the code values would be immensely helpful in designing a solution.

For the particular example listed below, I'd use a PRXCHANGE. Let's see if I can construct the prx for this:

newvar2=prxchange("s/(code\d+)(\w+)/$2/",1,var2); would match anything that had (code#) or more at the start; if it needed to allow just code, you would replace + with * : newvar2=prxchange("s/(code\d*)(\w+)/$2/",1,var2);

So, this would match it:

data test; format var1 $2.; format var2 newvar2 $50.; input var1 $ var2 $;

newvar2=prxchange("s/(code\d*)(\w+)/$2/",1,var2); datalines; AA code12thedescription BB code4thedescription CC codethedescriptiondone ;;;; run; proc print; run;

Any other definition of code would necessitate replacing the first expression inside the ( ) with something else, of course.

If there is a delimiter, it's easier:

data test; format var1 $2.; format var2 newvar2 $50.; input var1 $ var2 $;

newvar2=prxchange("s/(\w+)_(\w+)/$2/",1,var2); datalines; AA code12_thedescription BB code4_thedescription CC code_thedescriptiondone ;;;; run; proc print; run;

where _ is the delimiter. (\w+) means match one or more word-like characters (alphanumeric) including underscore, but we force it to find an underscore separately. Note that it will find the LAST underscore only, so if there are actual underscores in legitimate values, that will cause a problem and you'd need to use for the first part [a-zA-Z0-9] instead of \w in order to make sure the first underscore doesn't count:

newvar2=prxchange("s/([a-zA-Z0-9]+)_(\w+)/$2/",1,var2);

-Joe

On Mon, Jan 19, 2009 at 10:32 AM, SAS_learner <proccontents@gmail.com>wrote:

> Hello jaheuk, > > How would your solution would work if there no delimter in the search > string > (Just thinking of the possibilites) , Say like your example Var2 values are > > Var1 Var2 > AA code12thedescription > BB cod4thedescription > CC codethedescriptiondone > > I think If Origonal Poster sends us How the description strings are then we > can write better code snipnet . > > thanks > SL > > On Mon, Jan 19, 2009 at 4:56 AM, jaheuk <hejacobs@gmail.com> wrote: > > > DATA STEP solutions: > > --------------------------------- > > > > DATA one; > > input var1 $ var2 $ 30. ; > > cards; > > AA > > ; > > RUN; > > > > DATA two; > > SET one; > > var3 = substr(var2,1,index(var2,'_')-1); > > RUN; > > *******************************************************; > > > > DATA one; > > input var1 $ var2 $ 30. ; > > cards; > > AA 12_thedescription > > BB 456987_thedescription > > ; > > RUN; > > > > DATA two; > > SET one; > > var3 = input(substr(var2,1,index(var2,'_')-1),best.); > > RUN; > > > > > > > > > > Regards, > > H. > > > > > > > > > > On 19 jan, 09:48, mikeymay <mikey...@gmail.com> wrote: > > > I have a dataset that holds a variable with a prefix code prior to the > > > description. > > > > > > I want to trim the variable by the number of characters the prefix > > > code has but the prefix code can be 4, 5, 6, 7 or 8 characters long. > > > > > > Is there a way of trimming the data using a procedure? > > > > > > Thanks > > >


Back to: Top of message | Previous page | Main SAS-L page