LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 24 Sep 2010 18:41:43 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Regular expression question on fixing csv file
Comments: To: Ming Chen <chenming@GMAIL.COM>
In-Reply-To:  <>
Content-Type: text/plain; charset="us-ascii"

Ming: How does your client generate the csv file?

I am working on a little article and presentation on using MS Excel workbooks as SAS data sources. Excel doesn't control for many types of errors. (Send me any examples you may have found of Excel shortcomings from the viewpoint of a SAS programmer, if you will, and I'll include them.) One feature that Excel 2007 includes, though, guards against mixing up commas used as separators and those used as text. For example, a csv file exported from Excel has this correct syntax:

",",,",", ,",",,","

While you may have patterns in data that you have seen, using the same character as text and metacharacter will eventually BYOYA. Logically you should have nine commas between what appear to be ID numbers and perhaps three commas between the ID and another number. But how do you know that each row will have the same sequence of data types as other rows? S

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ming Chen Sent: Friday, September 24, 2010 4:01 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Regular expression question on fixing csv file

Hi All,

I am trying to import a client csv format data file with comma within some columns. Since the client won't fix their crap and the boss thinks we can fix it by at least manually removing the comma in the txt field.

By checking the data, I am sure that regular expression can come to rescue me since there are some obvious patterns in the data. However, I am just know a little bit the regular expression stuff and I really need some help here.

Here are some sample data: columns: code1, code1_desc, code2, code2_desc, code3,code3_desc, code4,code4_desc,code5,code5_desc

111111,VANITY TOP OVAL , 31X22",4456,MARBLE FACTORY, The,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture 22222,MIRROR 251/2X21/4X33, NLCM-2433,9250,FOREMOST ,GROUPS INC,30,Plumbing,3002,Finish Plumbing,060,Bathroom Furniture 3333,ACID ,BRUSH 1/2" 3/PK,1120,OATEY CANADA SUPPLY CHAIN SERV.CO.,30,Plumbing,3004,Rough Plumbing 444,FLARED MALE AD,. 1/4X1/4 48-4B,4056,FAIRVIEW FITTINGS & MFG, LTD,30,,3004,Rough Plumbing

I am sure that I can fix the csv file just by removing the comma in the 5 code_desc columns and there are only number with comma before and after the desc part.



Back to: Top of message | Previous page | Main SAS-L page