Date: Fri, 13 Jun 2003 16:07:07 GMT
Reply-To: MyraAO <vze2ptxm@VERIZON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: MyraAO <vze2ptxm@VERIZON.NET>
Subject: Re: Help me out for getting rid of weird characters
Content-Type: multipart/alternative;
This is a multi-part message in MIME format.
------=_NextPart_000_0036_01C331A4.4DE6E140
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
"Art" <artshen@hotmail.com> wrote in message =
news:f5738cf8.0306111129.3b41a652@posting.google.com...
> Hi,=20
> I have a data set with all kinds of variables. What I need to do is
> to check character variavles to make sure they contain only English
> characters and commonly used characters, like quotes, underscores,
> etc. All other characters, eg. German or Franch characters, will be
> find out and a report should be generated with number of counts. I
> know some software can do this job. However if this can be done in SAS
> in a efficient way, that would be great. Anyone have any idea?
>=20
> Thanks
>=20
> Art
I wrote a macro (see below) at my last job, with a little help from one =
of the masters, Paul Dorfman (much MORE than a little help, actually). =
It replaces certain ASCII characters with a space. Preceding the macro =
is the code used to call it.
In the statement DO R =3D 0 TO 31,127 are the ASCII characters that I =
want replaced with a blank. You can use the numbers for any ASCII =
characters (see: http://www.asciitable.com/). In your case, you might =
add 128-151 to get rid of "foreign" letters.
Good luck.
Myra
=3D=3D=3D
%let database =3D ; /* The name of the database being created. =
*/
%let lrecl =3D ; /* The record length of the input file. =
*/
%let infile =3D ; /* The name of the input file to be used in the =
filename statement. */
/************************************************************************=
****************/
/* For delimited input files, uncomment the following 2 macro variables. =
Make sure the */
/* correct delimiter is provided. =
*/
/************************************************************************=
****************/
*%let delim =3D dsd dlm=3D'~';
*%let rfmt =3D;
/************************************************************************=
****************/
/* For fixed file formats, uncomment the following 2 macro variables. =
*/
/************************************************************************=
****************/
*%let delim =3D ;
*%let rfmt =3D recfm=3Df;
filename input (
"&infile"
);
/************************************************************************=
****************/
/* The variable list inputs are included in the following macro, whether =
the datasets */
/* are fixed field or delimited. Your list should be appropriate for the =
layout. */
/************************************************************************=
****************/
%macro varlist;
%mend varlist;
data &database;
%nonprint(rfmt=3D&rfmt,lrecl=3D&lrecl,delim=3D&delim); =20
/************************************************************************=
****************/
/* Any additional code comes after the NONPRINT macro and before the =
STOP statement. */
/* Code should come in this order by type of code: =
*/
/* length, format, or attribute statements; =
*/
/* additional code to manipulate the data; =
*/
/* drop or keep statements; =
*/
/* label statements. =
*/
/************************************************************************=
****************/
stop;
run;
=3D=3D=3D=3D=3D
/************************************************************************=
****************/
/* PROGRAM: nonprint.sas =
*/
/* AUTHOR: Myra A Oltsik =
*/
/* ORIGINAL DATE: 02/21/02 =
*/
/* PURPOSE: A macro to read in files and also checks for =
non-printable */
/* characters. Those characters are counted into a =
variable and then */
/* changed to a space. =
*/
/* LAST CHANGE: =
*/
/************************************************************************=
****************/
/************************************************************************=
****************/
/* PARAMETERS: =
*/
/* The macro is passed 3 parameters: =
*/
/* rfmt -- a field which notes if the input file is fixed format =
*/
/* lrecl -- a field which gives the record length =
*/
/* delim -- a field indicates that the input file is delimited, and =
has the delim- */
/* iter in it =
*/
/* The macro also references a macro included in the called program =
which lists all the */
/* variables to be read in, their positions (if fixed field) and their =
informats. This */
/* macro is %VARLIST. The macro also drops those fields only needed to =
check for the */
/* non-printable characters. =
*/
/************************************************************************=
****************/
%macro nonprint(rfmt=3D,lrecl=3D,delim=3D);
length npstr $33.;
do r =3D 0 to 31,127;
p ++ 1;
substr (npstr, p) =3D byte(r);
end;
do until (eof);
infile input &rfmt lrecl=3D&lrecl &delim truncover ignoredoseof =
end=3Deof;
input
%varlist;
;
array cc _character_;
__nonprintable_flag =3D 0;
do _i_ =3D 2 to hbound(cc);
do __nonprintable_flag =3D __nonprintable_flag by +1 until ( p =
=3D 0 );
p =3D indexc (cc, npstr);
if p then substr (cc, p, 1) =3D ' ';
end;
end;
output;
end;
drop
r
p
npstr
;
%mend nonprint;
------=_NextPart_000_0036_01C331A4.4DE6E140
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1170" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV>"Art" <<A =
href=3D"mailto:artshen@hotmail.com">artshen@hotmail.com</A>>=20
wrote in message <A=20
href=3D"news:f5738cf8.0306111129.3b41a652@posting.google.com">news:f5738c=
f8.0306111129.3b41a652@posting.google.com</A>...</DIV>
<DIV>> Hi, <BR>> I have a data set with all kinds of =
variables.=20
What I need to do is<BR>> to check character variavles to make sure =
they=20
contain only English<BR>> characters and commonly used characters, =
like=20
quotes, underscores,<BR>> etc. All other characters, eg. German or =
Franch=20
characters, will be<BR>> find out and a report should be generated =
with=20
number of counts. I<BR>> know some software can do this job. =
However if=20
this can be done in SAS<BR>> in a efficient way, that would be great. =
Anyone=20
have any idea?<BR>> <BR>> Thanks<BR>> <BR>> Art</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>I wrote a macro (see below) at =
my last job,=20
with a little help from one of the masters, Paul Dorfman (much MORE than =
a=20
little help, actually). It replaces certain ASCII characters with a =
space.=20
Preceding the macro is the code used to call it.</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>In the statement DO R =3D 0 TO =
31,127 are the=20
ASCII characters that I want replaced with a blank. You can use =
the numbers=20
for any ASCII characters (see: </FONT><A =
href=3D"http://www.asciitable.com/"><FONT=20
face=3D"Courier New" size=3D2>http://www.asciitable.com/</FONT></A><FONT =
face=3D"Courier New" size=3D2>). In your case, you might add 128-151 to =
get rid of=20
"foreign" letters.</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>Good luck.</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2>Myra</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>=3D=3D=3D</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>%let =
database =3D ;=20
/* The name of the database being=20
created. =
*/<BR>%let=20
lrecl =3D ; /* The record =
length of the=20
input=20
file. &n=
bsp;=20
*/<BR>%let infile =3D ; /* The name =
of the=20
input file to be used in the filename statement. */</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New"=20
size=3D2>/***************************************************************=
*************************/<BR>/*=20
For delimited input files, uncomment the following 2 macro variables. =
Make sure=20
the */<BR>/* correct delimiter is=20
provided. &nbs=
p;  =
; =
&=
nbsp; =20
*/<BR>/******************************************************************=
**********************/<BR>*%let=20
delim =3D dsd dlm=3D'~';<BR>*%let =
rfmt =20
=3D;</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New"=20
size=3D2>/***************************************************************=
*************************/<BR>/*=20
For fixed file formats, uncomment the following 2 macro=20
variables. &nb=
sp; =20
*/<BR>/******************************************************************=
**********************/<BR>*%let=20
delim =3D ;<BR>*%let rfmt =3D=20
recfm=3Df;</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>filename input=20
(<BR> "&infile</FONT><FONT =
face=3D"Courier New"=20
size=3D2>"<BR> );</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New"=20
size=3D2>/***************************************************************=
*************************/<BR>/*=20
The variable list inputs are included in the following macro, whether =
the=20
datasets */<BR>/* are fixed field or delimited. Your list =
should be=20
appropriate for the layout. =20
*/<BR>/******************************************************************=
**********************/<BR>%macro=20
varlist;<BR>%mend varlist;</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>data =
&database;<BR> =20
%nonprint(rfmt=3D&rfmt,lrecl=3D&lrecl,delim=3D&delim); &=
nbsp;=20
<BR>/********************************************************************=
********************/<BR>/*=20
Any additional code comes after the NONPRINT macro and before the STOP=20
statement. */<BR>/* Code should come in this order by =
type of=20
code: &n=
bsp; &nb=
sp; &nbs=
p; =20
*/<BR>/* length, format, or attribute=20
statements; &n=
bsp; &nb=
sp; &nbs=
p; =20
*/<BR>/* additional code to manipulate the=20
data; &n=
bsp; &nb=
sp; &nbs=
p; =20
*/<BR>/* drop or keep=20
statements; &n=
bsp; &nb=
sp; &nbs=
p;  =
; =20
*/<BR>/* label=20
statements. &n=
bsp; &nb=
sp; &nbs=
p;  =
; =
=20
*/<BR>/******************************************************************=
**********************/<BR> =20
stop;<BR>run;</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New" size=3D2>=3D=3D=3D=3D=3D</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New"=20
size=3D2>/***************************************************************=
*************************/<BR>/*=20
PROGRAM: =20
nonprint.sas &=
nbsp; &n=
bsp; &nb=
sp; &nbs=
p; =20
*/<BR>/* AUTHOR: Myra A=20
Oltsik &=
nbsp; &n=
bsp; &nb=
sp; &nbs=
p; =20
*/<BR>/* ORIGINAL DATE:=20
02/21/02  =
; =
&=
nbsp; &n=
bsp; &nb=
sp; =20
*/<BR>/* PURPOSE: A macro to read in =
files=20
and also checks for=20
non-printable =
=20
*/<BR>/*  =
; =20
characters. Those characters are counted into a variable and=20
then =20
*/<BR>/*  =
; =20
changed to a=20
space. &=
nbsp; &n=
bsp; &nb=
sp; &nbs=
p; =20
*/<BR>/* LAST=20
CHANGE: =
&=
nbsp; &n=
bsp; &nb=
sp; &nbs=
p;  =
;=20
*/<BR>/******************************************************************=
**********************/</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><FONT face=3D"Courier New"=20
size=3D2>/***************************************************************=
*************************/<BR>/*=20
PARAMETERS: &n=
bsp; &nb=
sp; &nbs=
p;  =
; =
&=
nbsp; =20
*/<BR>/* The macro is passed 3=20
parameters: &n=
bsp; &nb=
sp; &nbs=
p;  =
; =20
*/<BR>/* rfmt -- a field which notes if the =
input file=20
is fixed=20
format &=
nbsp; =20
*/<BR>/* lrecl -- a field which gives the record=20
length &=
nbsp; &n=
bsp; =20
*/<BR>/* delim -- a field indicates that the input =
file is=20
delimited, and has the delim- =20
*/<BR>/*  =
; =20
iter in=20
it  =
; =
&=
nbsp; &n=
bsp; &nb=
sp; =20
*/<BR>/* The macro also references a macro included in the called =
program which=20
lists all the */<BR>/* variables to be read in, their positions (if =
fixed field)=20
and their informats. This */<BR>/* macro is %VARLIST. The macro =
also drops=20
those fields only needed to check for the */<BR>/*=20
non-printable=20
characters. &n=
bsp; &nb=
sp; &nbs=
p;  =
; =
=20
*/<BR>/******************************************************************=
**********************/</FONT></DIV>
<DIV><FONT face=3D"Courier New" size=3D2></FONT> </DIV>
<DIV><BR><FONT face=3D"Courier New" size=3D2>%macro=20
nonprint(rfmt=3D,lrecl=3D,delim=3D);<BR> length npstr=20
$33.;<BR> do r =3D 0 to =
31,127;<BR> p ++=20
1;<BR> substr (npstr, p) =3D=20
byte(r);<BR> end;<BR> do until=20
(eof);<BR> infile input &rfmt =
lrecl=3D&lrecl=20
&delim truncover ignoredoseof =
end=3Deof;<BR> =20
input<BR> =20
%varlist;<BR> =
;<BR> =20
array cc _character_;<BR> =
__nonprintable_flag =3D=20
0;<BR> do _i_ =3D 2 to=20
hbound(cc);<BR> do=20
__nonprintable_flag =3D __nonprintable_flag by +1 until ( p =3D 0=20
);<BR> =
p =3D=20
indexc (cc,=20
npstr);<BR> &n=
bsp; if=20
p then substr (cc, p, 1) =3D '=20
';<BR> =20
end;<BR> =
end;<BR> =20
output;<BR> end;<BR> =20
drop<BR> =
r<BR> =20
p<BR> npstr<BR> ;<BR>%mend=20
nonprint;<BR></FONT></DIV>
<DIV><BR></DIV></BODY></HTML>
------=_NextPart_000_0036_01C331A4.4DE6E140--
-------