Date: Wed, 11 Dec 2002 11:22:44 -0500
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: Parsing Text File into separate cols.
Content-Type: text/plain; charset="iso-8859-1"
Rashida,
You present an interesting problem. I suspect that the line "Providers:"
does not give a provider, but has a provider on the following line is an
indication of incomplete about the organization of the file.
I will assume "Providers:" has at most one provider following. The same
question arises about "Specialty(ies):" - what does the situation look like
when there is more than one? I assume whatever is only on one line. I did
add a second provider in the first case to see how the program would handle
it.
When faced with a messy reading problem it is often best to simplify by
reducing the data to a more manageable and then obtaining the final data
set. In this, case one problem is identifying a logical record. I assumed
ever logical record begins with "Group NAME:" and that line is always
present.
The next problem is the use of quotes some times. The DSD option can handle
both situations, so I turned it into a DSD problem with a delimiter "FF"X
which presumably is never in the file. (Hey, Michael! Is this a sleazy
trick?)
Hopefully this is enough to understand the logic of the program. If not
just ask questions. After you look more closely at the provider/specialty
problem
you may find the program easy to fit the situation. If not ask more
questions. Here is the program.
data w ;
retain seq ;
length line $ 100 ;
infile cards dsd dlm="ff"x ;
input line :$char100. ;
if line = "Providers:" then
do ;
input line :$char100. ;
line = "Providers: " || line ;
end ;
if upcase(line) =: "GROUP NAME:" then seq + 1 ;
cards ;
"Group Name: David G. Parker, DDS, PA"
"Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813) 685-5611"
Office Status: Accepting New Patients
Providers:
"Parker, David G., DDS"
"Parker's Brother"
Primary Office #: 112716
Specialty(ies): General Practice - Dental
Group Name: Abdoney Periodontics and Implant Surgery
"Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511 (813)
684-5554"
Office Status: Accepting New Patients
Providers:
"Abdoney, Mark Allen, DMD"
Specialty(ies): Periodontics
;
data q ( keep = gpname addr primoffice officestat providers spec prob );
length test $ 20
rest gpname addr primoffice officestat
providers spec prob $ 100
;
do until ( last.seq ) ;
set w ;
by seq ;
x = index ( line , ":" ) ;
if x > 0 then
do ;
test = substr ( line , 1 , x ) ;
rest = substr ( line , x + 2 ) ;
end ;
else
do ;
test = "problem" ;
rest = line ;
end ;
select ( upcase(test) ) ;
when ( "GROUP NAME:" ) gpname = rest ;
when ( "ADDRESS/PHONE:" ) addr = rest ;
when ( "PRIMARY OFFICE #:" ) primoffice = rest ;
when ( "OFFICE STATUS:" ) officestat = rest ;
when ( "PROVIDERS:" ) providers = rest ;
when ( "SPECIALTY(IES):" ) spec = rest ;
OTHERWISE PROB = LINE ;
end ;
end ;
run ;
IanWhitlock@westat.com
-----Original Message-----
From: Rashida Patwa [mailto:rashida.patwa@HIGHMARK.COM]
Sent: Wednesday, December 11, 2002 10:18 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Parsing Text File into separate cols.
Hi, need some help to parse this text file into separate cols. I have
showed 2 records and the rest of the records are in the same pattern. I
have colored text blue for record 1 and colored green for record 2.
I need these info into cols.
eg: group name street addr city state zip phone doc
name doc # Specialty
This text file is a variable length. How can I do this? The file has over
1000 docs with 8-9 lines per doc.
Any help would be appreciated.
Thanks.
"Group Name: David G. Parker, DDS, PA"
"Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813) 685-5611"
Office Status: Accepting New Patients
Providers:
"Parker, David G., DDS"
Primary Office #: 112716
Specialty(ies): General Practice - Dental
Group Name: Abdoney Periodontics and Implant Surgery
"Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511 (813)
684-5554"
Office Status: Accepting New Patients
Providers:
"Abdoney, Mark Allen, DMD"
Specialty(ies): Periodontics
Rashida Patwa