LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 12 Mar 2011 10:19:15 -0500
Reply-To:   Ben Herman <baherman@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ben Herman <baherman@GMAIL.COM>
Subject:   Re: Checking for whitespace

Sterling, This became a problem for me last year when people started copy-pasting items from the web into our main database. Basically, non-breaking spaces (&NBSP.) printer control characters, bell, and other special HTML and extended character set characters pollute the data. Sometimes these show up in my data as '0D'x (carriage return) sometimes as 'FF'x and sometimes ASCII control characters depending on the character and how the input system processes the data.

I now use prxchange to filter the data to just include the printing alpha- numeric character set

%macro strip_nonprinting(var); &var=prxchange("s/[^\x20-\x7E]//',-1, &var.);

/* this macro can be extended to loop over all variables in a dataset, or a subset of the variables -- or the looping can be done in the data step as presented here */

/*argueably this example would be more clear if the macro were just prxchange("s/[^\x20-\x7E]//',-1, &var.); allowing the actual assignment to occur in the datastep so that the reassignment of the variable is not obscured. choose what works for you */ %mend strip_nonprinting;

%let dataset=<Yourdata>; Data &dataset_stripped; Set &dataset; array chars {*} _CHARACTER_;

do _i=1 to dim(chars); drop _i; %strip_nonprinting(chars(_i)); end; run;

/* note I am sending this from my laptop typing from memory, I have not run the above code to verify that it works without modification. */

Cheers, Ben

On Wed, 9 Mar 2011 11:57:59 -0800, Sterling Paramore <gnilrets@GMAIL.COM> wrote:

>Thank you data _null_! It turns out that I had a carriage return at the end >of the variable. > >The strangest and most infuriating thing about this is that the source of >the data was from an EG import of a spreadsheet and things have been working >just fine for the past year. Then we upgraded to EG 4.3 and suddenly, for >just this one spreadsheet, it started putting carriage returns at the end of >the strings! Furthermore, if I recreate the import task in EG 4.3 (rather >than using the one that was originally created in a 4.2 project), the >carriage returns disappear! > >Looks like I get to go back and recreate the import tasks for my 20+ >spreadsheets in case this type of garbage is happening somewhere unnoticed. > (BTW, I would love to abandon the need to use EG import tasks, but I still >can't figure out a good way to read spreadsheets stored on a sharepoint or >windows network with a UNIX SAS server). > >Anyway, thanks for the help. > > >On Wed, Mar 9, 2011 at 11:25 AM, Robin R High <rhigh@unmc.edu> wrote: > >> Sterling, >> >> I've had this situation numerous times when there are extra spaces in the >> variable's value that don't appear in print, so can make the test by >> adding a function to eliminate spaces around the variable name: >> >> data test; >> set have; >> isequal = (STRIP(ClaimLine_MajSvc) = "Phys"); >> put ClaimLine_MajSvc isequal; >> run; >> >> Robin High >> UNMC >> >> >> >> >> >> >> From: >> Sterling Paramore <gnilrets@GMAIL.COM> >> To: >> SAS-L@LISTSERV.UGA.EDU >> Date: >> 03/09/2011 12:59 PM >> Subject: >> Checking for whitespace >> Sent by: >> "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> >> >> >> >> Dear SAS-L, >> >> I'm having some puzzling results occur when doing some character >> comparisons >> and I suspect the problem is with strange whitespace characters showing >> up. >> For example: >> >> data test; >> set have; >> >> isequal = (ClaimLine_MajSvc = "Phys"); >> put ClaimLine_MajSvc isequal; >> >> run; >> >> In log I see... >> >> >> Phys 0 >> Phys 0 >> OP 0 >> Phys 0 >> Phys 0 >> >> >> So I know that ClaimLine_MajSvc is Phys, but the evaluation is showing up >> false! >> >> My question is, what's the best way to find out which whitespace character >> is messing this up? >> >> Thanks, >> Sterling >> >> >> >>


Back to: Top of message | Previous page | Main SAS-L page