Date: Thu, 17 Aug 2006 18:25:52 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: How to read this text file into SAS dataset.Any suggestion
will be appreciated.
On Wed, 16 Aug 2006 20:44:42 -0700, Jun <junliu2004@GMAIL.COM> wrote:
>Dear All:
>
>I have several text files. All these files contains titles and a table.
>I just want to read in the numbers appeared in the table (not include
>the number appeared in title and in the table row name and column name)
>into a SAS dataset. In the SAS dataset, I want to have three columns,
>the first column indicates at which column a particular number appeared
>in the old table, the second column indicates at which row a
>particular number appeared in the old table, the third column is the
>number. The following is one sample of the text file. Any suggestion
>will be appreciated.
>
> title
>
>---------------------------------------------------------------------------
>
> column1 ----column2--- ---column3-----
> n % n %
>
>----------------------------------------------------------------------------
> AAAA 22 10.2 100 100
> BBBB 33 12.2 1200 23
> CCCC 55 8.9 42 0.3
>
>---------------------------------------------------------------------------
>
>
>The sas dataset I want to create for the above example is as follows:
>
>1 1 22
>1 2 33
>1 3 55
>2 1 10.2
>2 2 12.2
>2 3 8.9
>3 1 100
>3 2 1200
>3 3 42
>4 1 100
>4 2 23
>4 3 0.3
>
>
>Thanks,
>Jun
In a problem like this, a solutions are based on finding dependable
patterns, and on knowing how much generalization is needed.
Try this:
data extract;
infile cards missover;
input @;
if rownum or (lag(_infile_)=:'-' and not missing(_infile_) );
rownum + 1;
if missing(_infile_) then stop;
input stub :& $20. @; drop stub;
colnum = 0;
do until (missing(number) );
colnum + 1;
input number @;
if not missing(number) then output;
end;
cards;
title
-------------------------------------------------------
column1 ----column2--- ---column3-----
n % n %
-------------------------------------------------------
AAAA 22 10.2 100 100
BB BB 33 12.2 1200 23
CCCC 55 8.9 42 0.3
-------------------------------------------------------
;
The scanning begins where there is a non-blank line immediately below a line
which begins with a hyphen. It stops at the first subsequent blank line.
The code assumes that the row labels will not have multiple embedded blanks.
It does not make any assumption about the number of numeric columns.
Result:
Obs rownum colnum number
1 1 1 22.0
2 1 2 10.2
3 1 3 100.0
4 1 4 100.0
5 2 1 33.0
6 2 2 12.2
7 2 3 1200.0
8 2 4 23.0
9 3 1 55.0
10 3 2 8.9
11 3 3 42.0
12 3 4 0.3