LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2004, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 23 Jul 2004 17:55:48 +0200
Reply-To:     "Groeneveld, Jim" <jim.groeneveld@VITATRON.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Groeneveld, Jim" <jim.groeneveld@VITATRON.COM>
Subject:      Re: Binary Data in Raw Data Inputs
Comments: To: "Terjeson, Mark" <TERJEM@DSHS.WA.GOV>
Content-Type: text/plain; charset="iso-8859-1"

Hi Mark,

Characters with ascii values 128-255 are not nonprintable. They are printable quite well, but give different characters in different character sets. See http://www.asciitable.com/ for the "extended" ascii character in the PC-8 character set.

Regards - Jim. -- . . . . . . . . . . . . . . . .

Jim Groeneveld, MSc. Biostatistician Science Team Vitatron B.V. Meander 1051 6825 MJ Arnhem Tel: +31/0 26 376 7365 Fax: +31/0 26 376 7305 Jim.Groeneveld@Vitatron.com www.vitatron.com

My statistics are quite predictable, but my computer may be quite unpredictable.

[common disclaimer]

-----Original Message----- From: Terjeson, Mark [mailto:TERJEM@DSHS.WA.GOV] Sent: Friday, July 23, 2004 17:29 To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Binary Data in Raw Data Inputs

PS: a FILENAME statement below helps.

Also, control characters are 0-31 space 32, printable characters 32-126, delete/rubout 127, other nonprintable characters 128-255 (usually for PC symbols)

-----Original Message----- From: Terjeson, Mark [mailto:TERJEM@DSHS.WA.GOV] Sent: Friday, July 23, 2004 8:18 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Binary Data in Raw Data Inputs

Hi,

Like everyone has mentioned the hex=1A decimal=26 char=^Z can be overcome a couple of the aforementioned ways. A hex viewer sure makes it easy to see those nonprintable character values and where they are at. One example is UltraEdit, just hit cntl-H and it will toggle your text into hex representation which is real handy.

If you are stuck without an editor to see the byte values, either in decimal or hex values, you can always do it in SAS!

You can tweak programs such as these to stream through a file and remove bad characters, or change certain characters, or even add characters, etc.

Some folks have heard the terms printable and nonprintable characters, but what are they? A byte can contain the decimal values from 0-255. Each of these 256 values has been assigned letter/number/symbol/controlcode meanings. In essence, the value 15 can have different visual representations, such as 15 in decimal, 0F in hexidecimal(hex), or 17 in octal, or have a meaning of cntl-O, or ...

If you want to check out more on what a byte is, or what different number bases are all about you can check out: http://listserv.uga.edu/cgi-bin/wa?A2=ind0107C&L=sas-l&P=R39591

There are two datasteps here as examples one creates a text file with only printable characters and one datastep creates the full spectrum of printable and nonprintable. Plus a couple of datasteps that read the text file in, one byte at a time, and then you can send them to a file or to the log in character representation or decimal or hex values to investigate the bytes yourself. These are samples for small file sizes, but you can expand on these suggestions to handle large files as well. If a person was looking for certain things you then can write additional logic to look for them, change them, delete them, etc.

filename flatfile 'C:\temp\flatfile.txt';

* create sample data ; * printable and nonprintable characters ; data _null_; length c $128; file flatfile; c = collate(0,25); put @1 c @; * on PCs the A1(26) ; * is and EOF marker ; * so have to skipit ; c = collate(27,127); put @26 c @; c = collate(128,255); put @128 c; run;

* create sample data ; * printable characters only ; data _null_; file flatfile; put 'hello'; put 'goodbye'; run;

* read file one byte at a time ; data pchar; length pchar $1; infile flatfile lrecl=1000; input pchar $1. @@; run;

* read file one byte at a time ; data _null_; length c $ 1; fnrc=filename('foo','c:\temp\flatfile.txt'); fid=fopen('foo'); do while (fread(fid) eq 0); recnum+1; do i=1 to frlen(fid); fgrc=fget(fid,c,1); put 'just read byte ' i 'of record ' recnum 'and now ' c= $hex2. 'hex ' c=; end; end; run;

Hope this is helpful, Mark Terjeson Reporting, Analysis, and Procurement Section Information Services Division Department of Social and Health Services State of Washington mailto:terjem@dshs.wa.gov

-----Original Message----- From: Groeneveld, Jim [mailto:jim.groeneveld@VITATRON.COM] Sent: Friday, July 23, 2004 1:05 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Binary Data in Raw Data Inputs

Hi Paul [C],

Well, actually just the hex 1A (byte(26)) would suffice. But it might be worthwhile to know which control character(s), Matt actually has in his data. Matt, could you search for some of them with a hex lister?

Regards - Jim. -- . . . . . . . . . . . . . . . .

Jim Groeneveld, MSc. Biostatistician Science Team Vitatron B.V. Meander 1051 6825 MJ Arnhem Tel: +31/0 26 376 7365 Fax: +31/0 26 376 7305 Jim.Groeneveld@Vitatron.com www.vitatron.com

My statistics are quite predictable, but my computer may be quite unpredictable.

[common disclaimer]

-----Original Message----- From: Choate, Paul@DDS [mailto:pchoate@DDS.CA.GOV] Sent: Friday, July 23, 2004 00:23 To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Binary Data in Raw Data Inputs

Matt -

There may be a DOS end-of-file mark in your data. SAS reads HEX 1A 0D as a DOS end of file mark. This is documented in a SAS note at Support.SAS.COM. --------------------------------------------------------------------------- options IgnoreDOSEOF;

SN-003632 When reading a binary file as text, the SAS System stops reading the input file after encountering a Ctrl + Z character ---------------------------------------------------------------------------- If the SAS System encounters a Ctrl + Z or Hex 1a character when reading a binary file as text, input stops as the character is treated as an end of file character. There is a new option for Version 8.2, IgnoreDOSEOF, which will allow these characters to be read. ---------------------------------------------------------------------------

Before I knew what was going on I originally got around it with a hex editor, or I read through the file with SAS one byte at a time and fixed it.

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Matt Pettis Sent: Thursday, July 22, 2004 2:10 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Binary Data in Raw Data Inputs

Hi,

I am trying to read in data in IIS weblogs that *should* be just ascii data. However, occasionally, I get fields that contain non-ascii characters. This is confirmed by viewing the raw log in an editor and seeing non-displayable characters (as boxes). I believe that these characters are causing my datastep to stop and not process further lines. These lines are rare, so I do not care if I lose this record, but I do care that I lose all of the records after it. Does anybody have any ideas on how do handle lines like these so that the datastep can continue past this?

Thanks in advance for any ideas, Matt Pettis


Back to: Top of message | Previous page | Main SAS-L page