|
Sterling,
This became a problem for me last year when people started copy-pasting
items from the web into our main database. Basically, non-breaking spaces
(&NBSP.) printer control characters, bell, and other special HTML and
extended character set characters pollute the data. Sometimes these show
up in my data as '0D'x (carriage return) sometimes as 'FF'x and sometimes
ASCII control characters depending on the character and how the input
system processes the data.
I now use prxchange to filter the data to just include the printing alpha-
numeric character set
%macro strip_nonprinting(var);
&var=prxchange("s/[^\x20-\x7E]//',-1, &var.);
/* this macro can be extended to loop over all variables in a dataset, or
a subset of the variables -- or the looping can be done in the data step
as presented here */
/*argueably this example would be more clear if the macro were just
prxchange("s/[^\x20-\x7E]//',-1, &var.);
allowing the actual assignment to occur in the datastep so that the
reassignment of the variable is not obscured. choose what works for you */
%mend strip_nonprinting;
%let dataset=<Yourdata>;
Data &dataset_stripped; Set &dataset;
array chars {*} _CHARACTER_;
do _i=1 to dim(chars); drop _i;
%strip_nonprinting(chars(_i));
end;
run;
/* note I am sending this from my laptop typing from memory, I have not
run the above code to verify that it works without modification. */
Cheers,
Ben
On Wed, 9 Mar 2011 11:57:59 -0800, Sterling Paramore <gnilrets@GMAIL.COM>
wrote:
>Thank you data _null_! It turns out that I had a carriage return at the
end
>of the variable.
>
>The strangest and most infuriating thing about this is that the source of
>the data was from an EG import of a spreadsheet and things have been
working
>just fine for the past year. Then we upgraded to EG 4.3 and suddenly, for
>just this one spreadsheet, it started putting carriage returns at the end
of
>the strings! Furthermore, if I recreate the import task in EG 4.3 (rather
>than using the one that was originally created in a 4.2 project), the
>carriage returns disappear!
>
>Looks like I get to go back and recreate the import tasks for my 20+
>spreadsheets in case this type of garbage is happening somewhere
unnoticed.
> (BTW, I would love to abandon the need to use EG import tasks, but I
still
>can't figure out a good way to read spreadsheets stored on a sharepoint or
>windows network with a UNIX SAS server).
>
>Anyway, thanks for the help.
>
>
>On Wed, Mar 9, 2011 at 11:25 AM, Robin R High <rhigh@unmc.edu> wrote:
>
>> Sterling,
>>
>> I've had this situation numerous times when there are extra spaces in
the
>> variable's value that don't appear in print, so can make the test by
>> adding a function to eliminate spaces around the variable name:
>>
>> data test;
>> set have;
>> isequal = (STRIP(ClaimLine_MajSvc) = "Phys");
>> put ClaimLine_MajSvc isequal;
>> run;
>>
>> Robin High
>> UNMC
>>
>>
>>
>>
>>
>>
>> From:
>> Sterling Paramore <gnilrets@GMAIL.COM>
>> To:
>> SAS-L@LISTSERV.UGA.EDU
>> Date:
>> 03/09/2011 12:59 PM
>> Subject:
>> Checking for whitespace
>> Sent by:
>> "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
>>
>>
>>
>> Dear SAS-L,
>>
>> I'm having some puzzling results occur when doing some character
>> comparisons
>> and I suspect the problem is with strange whitespace characters showing
>> up.
>> For example:
>>
>> data test;
>> set have;
>>
>> isequal = (ClaimLine_MajSvc = "Phys");
>> put ClaimLine_MajSvc isequal;
>>
>> run;
>>
>> In log I see...
>>
>>
>> Phys 0
>> Phys 0
>> OP 0
>> Phys 0
>> Phys 0
>>
>>
>> So I know that ClaimLine_MajSvc is Phys, but the evaluation is showing
up
>> false!
>>
>> My question is, what's the best way to find out which whitespace
character
>> is messing this up?
>>
>> Thanks,
>> Sterling
>>
>>
>>
>>
|