| Date: | Sat, 29 May 2010 17:12:22 -0400 |
| Reply-To: | Anna Supady <statistics2020@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Anna Supady <statistics2020@GMAIL.COM> |
| Subject: | Re: 1 GB data sets=very messy!!! |
|
| In-Reply-To: | <15BB261E-747F-482D-81E6-8C356C9BBA41@gmail.com> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
|---|
Hi Tom,
Thanks for email. Here is more info:
So now I see that I got to read only one row of the data and it reads one
variable at the time. The dlm=09 means it is tabulating the spaces.
How can I get it to read actual amount of characters in variable, not just
one.
I am getting somewhere...not sure how to get it work correctly.
thanks a lot for your help
Ania
messy data 10:18 Saturday, May 29, 2010 7975
OUTPUT:
O x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
b x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
3 3 4 4
s 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
8 9 0 1
1 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0
. 0 . 5
O x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x
b 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7
7 8 8 8
s 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
9 0 1 2
1 1 0 6 2 . 2 1 7 6 2 4 . 0 . 0 . 0 . 0 . 0 . 2 5 9 2 0 0 6 . 0 . 0 . 0 . 0
. 0 . 0
x x x x x x x x x x x x x x x x x x x x x x x x
O x x x x x x x x x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1
b 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
2 2 2 2
s 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
0 1 2 3
1 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 6 7 2 1 8 . 0 . 0 . 6 2
9 9 0 6
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x
O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1
b 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6
6 6 6 6
s 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1 2 3 4
1 0 . 0 . . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 .
0 . 0 .
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x
O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
2 2 2 2
b 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0
0 0 0 0
s 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2 3 4 5
1 8 . 0 . 0 . 0 . 0 . 0 . 0 . 2 0 . 0 . 0 . 0 . 0 . 1 5 6 . 0 . 0 . 2 . 2 8
. 0 . 0
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x
O 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2
b 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4
4 4
s 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
3 4
1 . 0 . 0 . 0 . 1 0 3 . 4 2 8 4 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0
. 0
and the LOG:
279 data one;
280
infile 'f:\Stark\first.txt';
281 array x{244};
282 input x(*) 1.;
NOTE: The infile 'f:\Stark\first.txt' is:
Filename=f:\Stark\first.txt,
RECFM=V,LRECL=256,File Size (bytes)=3074,
Last Modified=29May2010:11:44:22,
Create Time=29May2010:11:44:21
NOTE: Invalid data for x2 in line 1 2-2.
NOTE: Invalid data for x4 in line 1 4-4.
NOTE: Invalid data for x6 in line 1 6-6.
NOTE: Invalid data for x8 in line 1 8-8.
NOTE: Invalid data for x243 in line 1 243-243.
RULE:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
1 CHAR
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.51062.217624.0.0.0.0.0.2592006.0.0.0.0.0.0.0.0.0
ZONE
3030303030303030303030303030303030303030333330333333030303030303333333030303030303030303
NUMR
0909090909090909090909090909090909090909510629217624909090909092592006909090909090909090
89
.0.0.0.0.0.0.0.0.0.67218.0.0.6299060.0..0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.0.0.0.0.0.
ZONE
0303030303030303030333330303033333330300303030303030303030303030303030303030303030303030
NUMR
9090909090909090909672189090962990609099090909090909090909090909090909090909890909090909
177
0.20.0.0.0.0.156.0.0.2.28.0.0.0.0.0.103.4284.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
ZONE
30330303030303330303030330303030303033323333030303030303030303030303030303030303
NUMR
092090909090915690909292890909090909103E4284909090909090909090909090909090909090
x1=0 x2=. x3=0 x4=. x5=0 x6=. x7=0 x8=. x9=0 x10=. x11=0 x12=. x13=0 x14=.
x15=0 x16=. x17=0 x18=.
x19=0 x20=. x21=0 x22=. x23=0 x24=. x25=0 x26=. x27=0 x28=. x29=0 x30=.
x31=0 x32=. x33=0 x34=.
x35=0 x36=. x37=0 x38=. x39=0 x40=. x41=5 x42=1 x43=0 x44=6 x45=2 x46=.
x47=2 x48=1 x49=7 x50=6
x51=2 x52=4 x53=. x54=0 x55=. x56=0 x57=. x58=0 x59=. x60=0 x61=. x62=0
x63=. x64=2 x65=5 x66=9
x67=2 x68=0 x69=0 x70=6 x71=. x72=0 x73=. x74=0 x75=. x76=0 x77=. x78=0
x79=. x80=0 x81=. x82=0
x83=. x84=0 x85=. x86=0 x87=. x88=0 x89=. x90=0 x91=. x92=0 x93=. x94=0
x95=. x96=0 x97=. x98=0
x99=. x100=0 x101=. x102=0 x103=. x104=0 x105=. x106=0 x107=. x108=6 x109=7
x110=2 x111=1 x112=8
x113=. x114=0 x115=. x116=0 x117=. x118=6 x119=2 x120=9 x121=9 x122=0 x123=6
x124=0 x125=. x126=0
x127=. x128=. x129=0 x130=. x131=0 x132=. x133=0 x134=. x135=0 x136=. x137=0
x138=. x139=0 x140=.
x141=0 x142=. x143=0 x144=. x145=0 x146=. x147=0 x148=. x149=0 x150=. x151=0
x152=. x153=0 x154=.
x155=0 x156=. x157=0 x158=. x159=0 x160=. x161=0 x162=. x163=0 x164=. x165=8
x166=. x167=0 x168=.
x169=0 x170=. x171=0 x172=. x173=0 x174=. x175=0 x176=. x177=0 x178=. x179=2
x180=0 x181=. x182=0
x183=. x184=0 x185=. x186=0 x187=. x188=0 x189=. x190=1 x191=5 x192=6 x193=.
x194=0 x195=. x196=0
x197=. x198=2 x199=. x200=2 x201=8 x202=. x203=0 x204=. x205=0 x206=. x207=0
x208=. x209=0 x210=.
x211=0 x212=. x213=1 x214=0 x215=3 x216=. x217=4 x218=2 x219=8 x220=4 x221=.
x222=0 x223=. x224=0
x225=. x226=0 x227=. x228=0 x229=. x230=0 x231=. x232=0 x233=. x234=0 x235=.
x236=0 x237=. x238=0
x239=. x240=0 x241=. x242=0 x243=. x244=0 _ERROR_=1 _N_=1
NOTE: 1 record was read from the infile 'f:\Stark\first.txt'.
The minimum record length was 256.
The maximum record length was 256.
One or more lines were truncated.
NOTE: The data set WORK.ONE has 1 observations and 244 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
data one;
infile 'f:\Stark\first.txt' dlm='09'x truncover;
array x{244};
input x(*) 5.;
proc iml;
use one;
read all into x;
proc print data= one;
run;
quit;
283 proc iml;
NOTE: IML Ready
284 use one;
285 read all into x;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
286 proc print data= one;
287 run;
NOTE: There were 1 observations read from the data set WORK.ONE.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
288 quit;
or this one:
379 data one;
380 infile 'f:\Stark\first.txt' dlm='09'x truncover;
381 array x{244};
382 input x(*) 1.;
NOTE: The infile 'f:\Stark\first.txt' is:
Filename=f:\Stark\first.txt,
RECFM=V,LRECL=256,File Size (bytes)=3074,
Last Modified=29May2010:11:44:22,
Create Time=29May2010:11:44:21
NOTE: Invalid data for x2 in line 1 2-2.
NOTE: Invalid data for x4 in line 1 4-4.
NOTE: Invalid data for x6 in line 1 6-6.
NOTE: Invalid data for x241 in line 1 241-241.
NOTE: Invalid data for x243 in line 1 243-243.
RULE:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
1 CHAR
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.51062.217624.0.0.0.0.0.2592006.0.0.0.0.0.0.0.0.0
ZONE
3030303030303030303030303030303030303030333330333333030303030303333333030303030303030303
NUMR
0909090909090909090909090909090909090909510629217624909090909092592006909090909090909090
89
.0.0.0.0.0.0.0.0.0.67218.0.0.6299060.0..0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.0.0.0.0.0.
ZONE
0303030303030303030333330303033333330300303030303030303030303030303030303030303030303030
NUMR
9090909090909090909672189090962990609099090909090909090909090909090909090909890909090909
177
0.20.0.0.0.0.156.0.0.2.28.0.0.0.0.0.103.4284.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
ZONE
30330303030303330303030330303030303033323333030303030303030303030303030303030303
NUMR
092090909090915690909292890909090909103E4284909090909090909090909090909090909090
x1=0 x2=. x3=0 x4=. x5=0 x6=. x7=0 x8=. x9=0 x10=. x11=0 x12=. x13=0 x14=.
x15=0 x16=. x17=0 x18=.
x19=0 x20=. x21=0 x22=. x23=0 x24=. x25=0 x26=. x27=0 x28=. x29=0 x30=.
x31=0 x32=. x33=0 x34=.
x35=0 x36=. x37=0 x38=. x39=0 x40=. x41=5 x42=1 x43=0 x44=6 x45=2 x46=.
x47=2 x48=1 x49=7 x50=6
x51=2 x52=4 x53=. x54=0 x55=. x56=0 x57=. x58=0 x59=. x60=0 x61=. x62=0
x63=. x64=2 x65=5 x66=9
x67=2 x68=0 x69=0 x70=6 x71=. x72=0 x73=. x74=0 x75=. x76=0 x77=. x78=0
x79=. x80=0 x81=. x82=0
x83=. x84=0 x85=. x86=0 x87=. x88=0 x89=. x90=0 x91=. x92=0 x93=. x94=0
x95=. x96=0 x97=. x98=0
x99=. x100=0 x101=. x102=0 x103=. x104=0 x105=. x106=0 x107=. x108=6 x109=7
x110=2 x111=1 x112=8
x113=. x114=0 x115=. x116=0 x117=. x118=6 x119=2 x120=9 x121=9 x122=0 x123=6
x124=0 x125=. x126=0
x127=. x128=. x129=0 x130=. x131=0 x132=. x133=0 x134=. x135=0 x136=. x137=0
x138=. x139=0 x140=.
x141=0 x142=. x143=0 x144=. x145=0 x146=. x147=0 x148=. x149=0 x150=. x151=0
x152=. x153=0 x154=.
x155=0 x156=. x157=0 x158=. x159=0 x160=. x161=0 x162=. x163=0 x164=. x165=8
x166=. x167=0 x168=.
x169=0 x170=. x171=0 x172=. x173=0 x174=. x175=0 x176=. x177=0 x178=. x179=2
x180=0 x181=. x182=0
x183=. x184=0 x185=. x186=0 x187=. x188=0 x189=. x190=1 x191=5 x192=6 x193=.
x194=0 x195=. x196=0
x197=. x198=2 x199=. x200=2 x201=8 x202=. x203=0 x204=. x205=0 x206=. x207=0
x208=. x209=0 x210=.
x211=0 x212=. x213=1 x214=0 x215=3 x216=. x217=4 x218=2 x219=8 x220=4 x221=.
x222=0 x223=. x224=0
x225=. x226=0 x227=. x228=0 x229=. x230=0 x231=. x232=0 x233=. x234=0 x235=.
x236=0 x237=. x238=0
x239=. x240=0 x241=. x242=0 x243=. x244=0 _ERROR_=1 _N_=1
NOTE: 1 record was read from the infile 'f:\Stark\first.txt'.
The minimum record length was 256.
The maximum record length was 256.
One or more lines were truncated.
NOTE: The data set WORK.ONE has 1 observations and 244 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
383 proc iml;
NOTE: IML Ready
384 use one;
385 read all into x;
NOTE: Exiting IML.
NOTE: PROCEDURE IML used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
386 proc print data= one;
387 run;
NOTE: There were 1 observations read from the data set WORK.ONE.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
388 quit;
and its output:
messy data 10:18 Saturday, May 29, 2010 7980
O x x x x x x x x x x x x x x x x x x x x x x x x x x x x
b x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
s 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
1 . . . . . . . . 51062 2176 . . . 92006 . . . . . . . . . . . . . . . . . .
. . . . .
O x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x
b 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7
7 7 7
s 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
5 6 7
1 . . . . . . 0.4284 . . . . . . . 0 . . . . . . . . . . . . . . . . . . . .
. . . . .
x x x x x x x x x x x x x x x x x x x x
O x x x x x x x x x x x x x x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1
b 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 1 1 1 1
s 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
5 6 7 8 9
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x
O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1
b 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5
5 5 5 6 6
s 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
7 8 9 0 1
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x
O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2
b 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9
9 0 0 0 0
s 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
9 0 1 2 3
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x
O 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2
b 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4
4 4 4 4
s 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1 2 3 4
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
On Tue, May 18, 2010 at 11:07 PM, Tom Robinson <barefootguru@gmail.com>wrote:
> SAS handles large data sets fine and is the most adept language I've seen
> for processing messy files.
>
> Can you post a sample of what you're trying to read and the code you've
> written to read the data in?
>
> Cheers
>
>
> On 2010-05-19, at 14:12, Anna Supady wrote:
>
> > Hi guys,
> >
> > I am trying to learn how to handle large data sets, like 1 GB. I have one
> > project that I am working on right now from the website:
> > www.kddcup-orange.com
> > I am new to it. We tried to read data into SAS and it just doesn't read
> any
> > variable. Any suggestions how to read messy data? Any maybe simpler
> examples
> > helpful.
> >
> > Thanks a lot,
> >
> > Ania,
>
On Tue, May 18, 2010 at 11:07 PM, Tom Robinson <barefootguru@gmail.com>wrote:
> SAS handles large data sets fine and is the most adept language I've seen
> for processing messy files.
>
> Can you post a sample of what you're trying to read and the code you've
> written to read the data in?
>
> Cheers
>
>
> On 2010-05-19, at 14:12, Anna Supady wrote:
>
> > Hi guys,
> >
> > I am trying to learn how to handle large data sets, like 1 GB. I have one
> > project that I am working on right now from the website:
> > www.kddcup-orange.com
> > I am new to it. We tried to read data into SAS and it just doesn't read
> any
> > variable. Any suggestions how to read messy data? Any maybe simpler
> examples
> > helpful.
> >
> > Thanks a lot,
> >
> > Ania,
>
|