Date: Fri, 24 Sep 2010 18:41:43 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Regular expression question on fixing csv file
Content-Type: text/plain; charset="us-ascii"
How does your client generate the csv file?
I am working on a little article and presentation on using MS Excel workbooks as SAS data sources. Excel doesn't control for many types of errors. (Send me any examples you may have found of Excel shortcomings from the viewpoint of a SAS programmer, if you will, and I'll include them.) One feature that Excel 2007 includes, though, guards against mixing up commas used as separators and those used as text. For example, a csv file exported from Excel has this correct syntax:
While you may have patterns in data that you have seen, using the same character as text and metacharacter will eventually BYOYA. Logically you should have nine commas between what appear to be ID numbers and perhaps three commas between the ID and another number. But how do you know that each row will have the same sequence of data types as other rows?
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ming Chen
Sent: Friday, September 24, 2010 4:01 PM
Subject: Regular expression question on fixing csv file
I am trying to import a client csv format data file with comma within some
columns. Since the client won't fix their crap and the boss thinks we can
fix it by at least manually removing the comma in the txt field.
By checking the data, I am sure that regular expression can come to rescue
me since there are some obvious patterns in the data. However, I am just
know a little bit the regular expression stuff and I really need some help
Here are some sample data:
columns: code1, code1_desc, code2, code2_desc, code3,code3_desc,
111111,VANITY TOP OVAL , 31X22",4456,MARBLE FACTORY,
The,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture
22222,MIRROR 251/2X21/4X33, NLCM-2433,9250,FOREMOST ,GROUPS
INC,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture
3333,ACID ,BRUSH 1/2" 3/PK,1120,OATEY CANADA SUPPLY CHAIN
444,FLARED MALE AD,. 1/4X1/4 48-4B,4056,FAIRVIEW FITTINGS & MFG,
I am sure that I can fix the csv file just by removing the comma in the 5
code_desc columns and there are only number with comma before and after the