LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2002, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 11 Dec 2002 11:22:44 -0500
Reply-To:     Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:      Re: Parsing Text File into separate cols.
Comments: To: "rashida.patwa@HIGHMARK.COM" <rashida.patwa@HIGHMARK.COM>
Comments: cc: "michael@BASSETTCONSULTING.COM" <michael@BASSETTCONSULTING.COM>
Content-Type: text/plain; charset="iso-8859-1"

Rashida,

You present an interesting problem. I suspect that the line "Providers:" does not give a provider, but has a provider on the following line is an indication of incomplete about the organization of the file.

I will assume "Providers:" has at most one provider following. The same question arises about "Specialty(ies):" - what does the situation look like when there is more than one? I assume whatever is only on one line. I did add a second provider in the first case to see how the program would handle it.

When faced with a messy reading problem it is often best to simplify by reducing the data to a more manageable and then obtaining the final data set. In this, case one problem is identifying a logical record. I assumed ever logical record begins with "Group NAME:" and that line is always present.

The next problem is the use of quotes some times. The DSD option can handle both situations, so I turned it into a DSD problem with a delimiter "FF"X which presumably is never in the file. (Hey, Michael! Is this a sleazy trick?)

Hopefully this is enough to understand the logic of the program. If not just ask questions. After you look more closely at the provider/specialty problem you may find the program easy to fit the situation. If not ask more questions. Here is the program.

data w ; retain seq ; length line $ 100 ; infile cards dsd dlm="ff"x ; input line :$char100. ; if line = "Providers:" then do ; input line :$char100. ; line = "Providers: " || line ; end ; if upcase(line) =: "GROUP NAME:" then seq + 1 ; cards ; "Group Name: David G. Parker, DDS, PA" "Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813) 685-5611" Office Status: Accepting New Patients Providers: "Parker, David G., DDS" "Parker's Brother" Primary Office #: 112716 Specialty(ies): General Practice - Dental Group Name: Abdoney Periodontics and Implant Surgery "Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511 (813) 684-5554" Office Status: Accepting New Patients Providers: "Abdoney, Mark Allen, DMD" Specialty(ies): Periodontics ;

data q ( keep = gpname addr primoffice officestat providers spec prob ); length test $ 20 rest gpname addr primoffice officestat providers spec prob $ 100 ; do until ( last.seq ) ; set w ; by seq ; x = index ( line , ":" ) ; if x > 0 then do ; test = substr ( line , 1 , x ) ; rest = substr ( line , x + 2 ) ; end ; else do ; test = "problem" ; rest = line ; end ; select ( upcase(test) ) ; when ( "GROUP NAME:" ) gpname = rest ; when ( "ADDRESS/PHONE:" ) addr = rest ; when ( "PRIMARY OFFICE #:" ) primoffice = rest ; when ( "OFFICE STATUS:" ) officestat = rest ; when ( "PROVIDERS:" ) providers = rest ; when ( "SPECIALTY(IES):" ) spec = rest ; OTHERWISE PROB = LINE ; end ; end ; run ;

IanWhitlock@westat.com

-----Original Message----- From: Rashida Patwa [mailto:rashida.patwa@HIGHMARK.COM] Sent: Wednesday, December 11, 2002 10:18 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Parsing Text File into separate cols.

Hi, need some help to parse this text file into separate cols. I have showed 2 records and the rest of the records are in the same pattern. I have colored text blue for record 1 and colored green for record 2. I need these info into cols.

eg: group name street addr city state zip phone doc name doc # Specialty

This text file is a variable length. How can I do this? The file has over 1000 docs with 8-9 lines per doc. Any help would be appreciated.

Thanks.

"Group Name: David G. Parker, DDS, PA" "Address/Phone: 227 North Knights Avenue, Brandon, FL 33510 (813) 685-5611" Office Status: Accepting New Patients Providers: "Parker, David G., DDS" Primary Office #: 112716 Specialty(ies): General Practice - Dental Group Name: Abdoney Periodontics and Implant Surgery "Address/Phone: 413 West Robertson Street Suite B, Brandon, FL 33511 (813) 684-5554" Office Status: Accepting New Patients Providers: "Abdoney, Mark Allen, DMD" Specialty(ies): Periodontics

Rashida Patwa


Back to: Top of message | Previous page | Main SAS-L page