Date: Mon, 11 May 2009 21:03:31 +0100
Reply-To: karma <dorjetarap@GOOGLEMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: karma <dorjetarap@GOOGLEMAIL.COM>
Subject: Re: Report mining techniques question.
In-Reply-To: <200905111842.n4BAmtFW005047@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
Maybe slightly crude but it works for this case. I had to cheat
slightly and ensure there were at least 2 spaces between the address
and the dob on each line as the address contains spaces we want to
keep.
filename test "c:\documents and settings\kdt\desktop\test.txt" ;
data want (drop=_:);
infile test missover ;
input _start $ ;
if _start eq "Page" then do ;
place = scan(_infile_,5) ;
input / ;
do _n_=1 to 5 ;
input (County LName FName)($) SSN CIN $8. Address & :$13. dob ;
if _n_ ne 3 then output ;
end ;
end ;
run ;
proc print ;run ;
output:
Obs place County LName FName SSN CIN
Address dob
1 Albany 01 Davis Richard 98120987
M639472 1 Main St. 19650103
2 Albany 01 Davis Bill 98129987
M639518 1 Mane St 19650103
3 Albany 01 Spade Sam 157829049
M937272 5 Short St. 19860328
4 Albany 41 Spade Ace 157829149
M937284 5 Shorter St. 19860328
5 Allegany 03 Time Justin 848409393
M848383 3 Times Pl. 19760228
6 Allegany 03 Time Justin 848409393
M848283 3 Times Place 19790228
7 Allegany 03 Smith Bill 85737827
M772723 4 Forty Ave 19840704
8 Allegany 03 Smyth William 85737827
M772721 4 Fort Ave 19840704
2009/5/11 Stephen Dybas <skd02@health.state.ny.us>:
> Hello SAS-Ls,
>
> I hope every had a nice weekend, especially mothers!
>
> I am including a small mock up report to show what I am trying to
> accomplish. Thanks to everyone that has already replied with some leads on
> using the _infile_ automatic variable.
>
> A couple of other things that I need help with include saving the county
> name for inclusion as part of an output statement along with the data that
> follows in the report. I would like to read in the two record pairs that
> follow the county, as #1 and #2 perhaps, as they really belong together in
> one observation with different variable names.
>
> I am hoping if anyone can describe an approach to tackling this input
> problem.
>
> This is the beginning of the report
> that I have to discard because it is just
> a bunch a title statements that do not contain
> and usable data. The data lines appear below
> All the data is fictional
> The actual data will start next
>
> Page 1 for Albany Albany
> County LName FName SSN CIN Address DOB
>
> 01 Davis Richard 098120987 M639472 1 Main St. 19650103
> 01 Davis Bill 098129987 M639518 1 Mane St 19650103
>
> 01 Spade Sam 157829049 M937272 5 Short St. 19860328
> 41 Spade Ace 157829149 M937284 5 Shorter St. 19860328
>
> Page 1 for Allegany Allegany
> County LName FName SSN CIN Address DOB
>
> 03 Time Justin 848409393 M848383 3 Times Pl. 19760228
> 03 Time Justin 848409393 M848283 3 Times Place 19790228
>
> 03 Smith Bill 085737827 M772723 4 Forty Ave 19840704
> 03 Smyth William 085737827 M772721 4 Fort Ave 19840704
>
> End of the report
> I need to mine the county data and what
> appears to look like the records of the report
>
> I am getting back into SAS so my skills are a little rusty. Before this, I
> always worked with flat input files, never parsing report files.
>