Date: Tue, 6 Mar 2012 00:02:58 -0500
Reply-To: Nat Wooding <nathani@VERIZON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Nat Wooding <nathani@VERIZON.NET>
Subject: Re: Parsing data from multiple text files
In-Reply-To: <201203060430.q262Bnj7000809@waikiki.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Navi
Are these files in a single folder on a Windows system or, if not, what is
the OS. Also, if they are in a single folder, are there other txt files in
that folder?
Nat Wooding
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Navi R
Sent: Monday, March 05, 2012 11:30 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Parsing data from multiple text files
Hi,
I have more than 1000 text files of the below format that I would like to
parse.
*****Text file data starts here*****
Header text on this line
2012/123456 Other dummy text
Dummy Text on this line
Status: Under construction
Property Name: ABC 1234
Owner Name: Joe Smith
Group: State Housing
Contractor:
Type: Residential
Address1: 123 MAIN ST
Address2:
City/State/Zip: CHICAGO, IL 60001
Contact: Mr. XYZ Phone: 123-456-7890
E-Mail:
Area (Sq Ft): $4,400.00 Cost: $250,000 Ownership: Owned Document Pages: 4
Start Date: 03/16/2012 Number of Floors: 1 Material Cost: $100,000 Other
Description:
Other description of the property.
Additional Notes:
Additional notes goes here.
*****Text file data ends here*****
For each six-digit unique property ID following the string "2012/" located
on line 3, I would like to extract data for the following fields:
Status, Property Name, Owner Name, Group, Contractor, Type, Address1,
Address2, City/State/Zip, Contact, Phone, E-mail, Area (Sq Ft), Cost,
Ownership, Document Pages, Start Date, Number of Floors, Material Cost,
Other Description and Additional Notes.
As you can see from the sample data, each field name is following by a colon
followed by the data for that field. In some case, multiple fields may be
located on some lines or a field name may be located in one line and its
related data in the next line (for example, data for "Additional Notes"). I
would like to extract data from multiple text files of this format into a
single SAS dataset with property ID in the first column and data for the
above field names for each property ID going across the row.
Any suggestions on parsing this data are appreciated.
|