LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2012, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 6 Mar 2012 00:02:58 -0500
Reply-To:     Nat Wooding <nathani@VERIZON.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Nat Wooding <nathani@VERIZON.NET>
Subject:      Re: Parsing data from multiple text files
In-Reply-To:  <201203060430.q262Bnj7000809@waikiki.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

Navi

Are these files in a single folder on a Windows system or, if not, what is the OS. Also, if they are in a single folder, are there other txt files in that folder?

Nat Wooding

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Navi R Sent: Monday, March 05, 2012 11:30 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Parsing data from multiple text files

Hi,

I have more than 1000 text files of the below format that I would like to parse. *****Text file data starts here***** Header text on this line

2012/123456 Other dummy text Dummy Text on this line Status: Under construction Property Name: ABC 1234 Owner Name: Joe Smith Group: State Housing Contractor: Type: Residential Address1: 123 MAIN ST Address2: City/State/Zip: CHICAGO, IL 60001 Contact: Mr. XYZ Phone: 123-456-7890 E-Mail: Area (Sq Ft): $4,400.00 Cost: $250,000 Ownership: Owned Document Pages: 4 Start Date: 03/16/2012 Number of Floors: 1 Material Cost: $100,000 Other Description: Other description of the property. Additional Notes: Additional notes goes here. *****Text file data ends here*****

For each six-digit unique property ID following the string "2012/" located on line 3, I would like to extract data for the following fields: Status, Property Name, Owner Name, Group, Contractor, Type, Address1, Address2, City/State/Zip, Contact, Phone, E-mail, Area (Sq Ft), Cost, Ownership, Document Pages, Start Date, Number of Floors, Material Cost, Other Description and Additional Notes.

As you can see from the sample data, each field name is following by a colon followed by the data for that field. In some case, multiple fields may be located on some lines or a field name may be located in one line and its related data in the next line (for example, data for "Additional Notes"). I would like to extract data from multiple text files of this format into a single SAS dataset with property ID in the first column and data for the above field names for each property ID going across the row.

Any suggestions on parsing this data are appreciated.


Back to: Top of message | Previous page | Main SAS-L page