Date: Tue, 2 Oct 2001 12:21:05 +0100
Reply-To: Peter Crawford <peter.crawford@DB.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Crawford <peter.crawford@DB.COM>
Subject: Re: Antwort: RE: Reading a CSV File
Content-type: text/plain; charset=iso-8859-1
Hi Zhaoping
I remember.
Without seeing the data, I can only guess at why this the problem
appears now, for the first time.
Perhaps there was some byte following the last '0A'x.
I don't think it could be a '0D'x (following the last '0A'x) without disturbing
the reading of all rows.
You might avoid this and any other empty observations, by testing that
some "key feature" value is non-empty before output..
I get this kind of problem when (dde) reading from a spreadsheet range
with a row which is "empty" but "present"
You can use the sas display manager to examine the file with command
fslist 'your csv file' ; hex; bottom
You should see the character and hex representation of the last few lines
as seen by FSLIST - I assume this should be interpreted the same as an
infile statement would interpret the data. It might help clarify things.
Others on sas-l may know better the cause of such a problem.
good luck
Peter Crawford
Datum: 01/10/2001 19:26
An: Peter Crawford/Zentrale/DeuBaExt@Zentrale
Betreff: RE: Antwort: RE: Reading a CSV File
Nachrichtentext:
Hi, Peter. Hope you still remember me.
The dim='2C0D' has been working very nicely.
In a recent CSV file reading, however, I got another
subtle problem--the SAS dataset read from the CSV
file had one more observation than in the CSV file.
It was the last observation, and all variables exhibited
missing status.
Why didn't the system stop at the end of the original
file, or how was this shadowy observation created ?
Thanks.
Zhaoping
-----Original Message-----
From: Peter Crawford [mailto:peter.crawford@db.com]
Sent: Tuesday, August 28, 2001 12:44 PM
To: Wang, Zhaoping (USPC Marketing & Investments Division)
Subject: Antwort: RE: Reading a CSV File
dsd is still necessary,
how else is ,, to imply an empty cell ?
DSD also interprets
"lastname, firstname"
as one character cell having value
lastname, firstname
i.e. it allows data cells to contain a delimiter character as long as
the cell is quoted with double quote marks(")
The explicit dlm='2c0d'x overrides the implied dlm=',' that comes with
option DSD.
Don't consider it a trick - it is only using standard facilities.
With dlm='2c0d'x this method considers '0d'x and '2c'x to be delimiters
anywhere they occur, unless masked by quote marks(")
DLM='0C0D'x
defines two alternative delimiters
This ensures that
DSD DLM='09'x
accepts commas in columns as data, not delimiters
for further explanation read up the on-line doc and (if yu can find it) SAS
technical report P222 Changes & Enhancements to the SAS System v6.07
good luck
(you know how the earn an obligation ! )
Datum: 28/08/2001 15:59
An: Peter Crawford/Zentrale/DeuBaExt@Zentrale
Betreff: RE: Reading a CSV File
Nachrichtentext:
The only question I now have is this: Does this trick "replace"
DSD (comma delimiter as default) OR they are alternatives
so that DSD takes care of comma and the trick takes care
of "comma plus unprintable" at the end of a line ?
I think you're at least one of the very few best programmers
in the world.
Thanks again and regards.
Zhaoping
-----Original Message-----
From: Peter Crawford [mailto:peter.crawford@db.com]
Sent: Tuesday, August 28, 2001 11:12 AM
To: Wang, Zhaoping (USPC Marketing & Investments Division)
Cc: sas-l@listserv.uga.edu
Subject: RE: Reading a CSV File
here is my reasoning
As you are receiving "comma separated values" that is, .Csv data, then
probably you already use infile option DSD, to support consecutive comma
and to avoid using spaces as delimiters.
Add infile option DLM='2c0d'x. As in..
Infile <your data file> DSD DLM='2c0d'x ...any.other.options... ;
This replaces the default delimiter of comma, with comma ('2c'x) *and*
'0D'x = a carriage return character which is treated as data on UNIX,
but as part of the "end-of-record" definition on DOS and windows
platforms.
Having more than one delimiter character is perfectly legal. It doesn't
require both. They are alternatives. Finding a "delimiter" at the end of
the line will add no problems to the infile processing.
This method avoids having an extra pass through your data to correct for
this extra character.
Does that explain how 2C and 0D work together ?
This log snip shows how I determine the hex value of the character (,)comma
1 %put %sysfunc( putc( %str(,,,), $hex6 ));
2C2C2C
That needed the %str() around the comma characters, to avoid
the ",,," being used as parameter separators
Regards
Peter Crawford
Datum: 28/08/2001 14:40
An: Peter Crawford/Zentrale/DeuBaExt@Zentrale
Kopie: "'Aldi Kraja'" <aldi@wubios.wustl.edu>
"'Ian Whitlock'" <WHITLOI1@WESTAT.com>
Betreff: RE: Antwort: Reading a CSV File
Nachrichtentext:
Wonderful ! It takes care not only of '2C'x (comma) but
also '0D' (the unprintable character).
I also received responses from Aldi Kraja and Ian Whitlock.
Aldi offered a Unix script solution which takes out the
unprintable character at the end of each line in the file.
Ian provided an enlightening explanation so I can understand
what hex is and how it works. But you gave me a magic
bullet.
But I'm still wondering how and why '2C0D'x is able to
take care both '2C'x and '0D'x ?
Thank you all for your help !
Zhaoping
-----Original Message-----
From: Peter Crawford [mailto:peter.crawford@db.com]
Sent: Tuesday, August 28, 2001 7:43 AM
To: Wang, Zhaoping (USPC Marketing & Investments Division)
Subject: Antwort: Reading a CSV File
did you get the easy solution to this ?
INFILE <your file> DSD DLM=''2C0D'x ...any.other.options...;
Treating that '0D'x as a delimiter effectively ignores it !
Regards
Peter Crawford
Datum: 24/08/2001 20:57
An: SAS-L@LISTSERV.UGA.EDU
Antwort an: "Wang, Zhaoping (USPC Marketing & Investments Division)"
<ZWang@NA2.US.ML.COM>
Betreff: Reading a CSV File
Nachrichtentext:
This is surely not the first time I've ever read a CSV
file, but the problem seems strange.
The original CSV file looks like this:
RUTH H. WATKINS,0,0
I tried to input 3 variables: name ($ 30),
fee (numeric), and discount (numeric).
Here's the error message in the log file:
Invalid data for discount in line 1 .......
------|------|-------|-------|------- (ruler)
RUTH H. WATKINS,0,0. (this period cannot be
"seen" in the CSV file)
ZONE 2545444523230
NUMR 0714B9E3C0C0D
name=RUTH H. WATKINS fee=0 discount=.
_error_=1 _n_=1
Why can the first 0 be read but the second cannot ?
What does "ZONE" and "NUMR" mean ?
Thanks for your help.
Zhaoping
--
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
--
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
--
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
--
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.