Date: Wed, 21 Jun 2006 12:31:12 -0500
Reply-To: Rob Rohrbough <Rob@ROHRBOUGH-SYSTEMS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Rob Rohrbough <Rob@ROHRBOUGH-SYSTEMS.COM>
Subject: Re: Selective Unquoting
Content-Type: text/plain; charset="us-ascii"
To: Mike Rhoads, Toby Dunn, Gerhard Hellriegel
Gentlemen,
Thank you all for replying to my post. It was late last night, and I was
frustrated. I went with Mike's solution since I was not interested in
eliminating the "undoubled" embedded quotes altogether. Both Toby's and
Gerhard's solutions involved the use of the COMPRESS function, which removes
all occurrences of the selected character.
I wanted only to remove the "quoting" occurrences - only the first and last.
While not needed for this solution, I am inclined to request an enhancement
to the COMPBL function (perhaps renaming it) to eliminate extra consecutive
occurrences of a specified character, not just blanks. There have been
times in the past when I have wished such a function were available.
Thanks again for your posts. The final solution appears below.
Rob
Rob Rohrbough
Omaha, NE, USA
data test;
infile datalines dsd truncover;
length Code $5;
length Short $40;
format AddDateQ mmddyy10.;
input Code : Short : AddDateQ : mmddyy.;
if substr(Short, 1, 1) = '"' and substr(Short, length(Short), 1) = '"'
then
Short = substr(Short, 2, length(Short)-2);
datalines;
"DUMM1",""LEADING AND" EMBEDDED QUOTES",04/01/2002
"DUMM2","trail quote"",04/01/2002
"DUMM3","NO QUOTE",04/01/2002
"DUMM4","EMBEDDED" QUOTE ONLY",04/01/2002
;
> -----Original Message-----
> From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com]
> Sent: Wednesday, June 21, 2006 8:09 AM
> To: Rob Rohrbough; SAS-L@listserv.uga.edu
> Subject: RE: Selective Unquoting
>
>
> Rob,
>
> Since your data are "irregular", I would just remove the quotes
> manually, rather than trying to use DEQUOTE, since I assume the latter
> is designed for "properly" quoted strings.
>
> Also, I wasn't sure why you subtracted one from the Length in your
> conditional.
>
> How about:
>
> if substr(Short, 1, 1) = '"' and substr(Short, length(Short), 1) = '"'
> then
> Short = Substr(Short,2,length(Short)-2);
>
> This, of course, will not reduce any paired quotes that have been
> properly embedded, if you have any of those in your data.
>
> Mike Rhoads
> Westat
> RhoadsM1@Westat.com
>
> -----Original Message-----
> From: owner-sas-l@listserv.uga.edu
> [mailto:owner-sas-l@listserv.uga.edu]
> On Behalf Of Rob Rohrbough
> Sent: Tuesday, June 20, 2006 11:02 PM
> To: SAS-L@listserv.uga.edu
> Subject: Selective Unquoting
>
>
> I am trying to import a comma-delimited dataset over which I have
> limited
> control. The data contain some embedded quotes that have not been
> "doubled"
> for proper reading with the DSD option on the INFILE statement. I am
> trying
> to read them anyway - of course with some problems. A simple
> test case
> is
> as follows:
>
> data test;
> infile datalines dsd truncover;
> length Code $5;
> length Short $40;
> format AddDateQ mmddyy10.;
> input Code : Short : AddDateQ : mmddyy.;
> if substr(Short, 1, 1) = '"' and substr(Short, length(Short)-1, 1) =
> '"'
> then
> Short = dequote(Short);
> datalines;
> "DUMM1",""LEADING AND" EMBEDDED QUOTES",04/01/2002
> "DUMM2","trail quote"",04/01/2002
> "DUMM3","NO QUOTE",04/01/2002
> "DUMM4","EMBEDDED" QUOTE ONLY",04/01/2002
> ;
>
> This example shows that my little after-the-fact unquoting
> code handles
> the
> trailing quote, but not the record containing both in initial
> and middle
> "undoubled" embedded quote nor just the embedded quote.
>
> I can't understand why the unquoting code doesn't catch the first and
> last
> datalines. If I examine the resulting "TEST" dataset, the quotes all
> are
> there, and my desk tracing seems to indicate that the unquoting should
> work.
>
> Ideas? TIA,
>
> Rob
>
>
> Rob Rohrbough
> Omaha, NE, USA
>
|