LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 29 May 2010 17:12:22 -0400
Reply-To:   Anna Supady <statistics2020@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Anna Supady <statistics2020@GMAIL.COM>
Subject:   Re: 1 GB data sets=very messy!!!
Comments:   To: Tom Robinson <barefootguru@gmail.com>
In-Reply-To:   <15BB261E-747F-482D-81E6-8C356C9BBA41@gmail.com>
Content-Type:   text/plain; charset=ISO-8859-1

Hi Tom,

Thanks for email. Here is more info:

So now I see that I got to read only one row of the data and it reads one variable at the time. The dlm=09 means it is tabulating the spaces.

How can I get it to read actual amount of characters in variable, not just one.

I am getting somewhere...not sure how to get it work correctly.

thanks a lot for your help

Ania

messy data 10:18 Saturday, May 29, 2010 7975

OUTPUT:

O x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

b x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4

s 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

1 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 5

O x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

b 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8

s 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2

1 1 0 6 2 . 2 1 7 6 2 4 . 0 . 0 . 0 . 0 . 0 . 2 5 9 2 0 0 6 . 0 . 0 . 0 . 0 . 0 . 0

x x x x x x x x x x x x x x x x x x x x x x x x

O x x x x x x x x x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

b 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2

s 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3

1 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 6 7 2 1 8 . 0 . 0 . 6 2 9 9 0 6

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

b 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6

s 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4

1 0 . 0 . . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 .

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2

b 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0

s 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

1 8 . 0 . 0 . 0 . 0 . 0 . 0 . 2 0 . 0 . 0 . 0 . 0 . 1 5 6 . 0 . 0 . 2 . 2 8 . 0 . 0

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

O 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

b 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4

s 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4

1 . 0 . 0 . 0 . 1 0 3 . 4 2 8 4 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 and the LOG:

279 data one;

280

infile 'f:\Stark\first.txt';

281 array x{244};

282 input x(*) 1.;

NOTE: The infile 'f:\Stark\first.txt' is:

Filename=f:\Stark\first.txt,

RECFM=V,LRECL=256,File Size (bytes)=3074,

Last Modified=29May2010:11:44:22,

Create Time=29May2010:11:44:21

NOTE: Invalid data for x2 in line 1 2-2.

NOTE: Invalid data for x4 in line 1 4-4.

NOTE: Invalid data for x6 in line 1 6-6.

NOTE: Invalid data for x8 in line 1 8-8.

NOTE: Invalid data for x243 in line 1 243-243.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---

1 CHAR 0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.51062.217624.0.0.0.0.0.2592006.0.0.0.0.0.0.0.0.0

ZONE 3030303030303030303030303030303030303030333330333333030303030303333333030303030303030303

NUMR 0909090909090909090909090909090909090909510629217624909090909092592006909090909090909090

89 .0.0.0.0.0.0.0.0.0.67218.0.0.6299060.0..0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.0.0.0.0.0.

ZONE 0303030303030303030333330303033333330300303030303030303030303030303030303030303030303030

NUMR 9090909090909090909672189090962990609099090909090909090909090909090909090909890909090909

177 0.20.0.0.0.0.156.0.0.2.28.0.0.0.0.0.103.4284.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0

ZONE 30330303030303330303030330303030303033323333030303030303030303030303030303030303

NUMR 092090909090915690909292890909090909103E4284909090909090909090909090909090909090

x1=0 x2=. x3=0 x4=. x5=0 x6=. x7=0 x8=. x9=0 x10=. x11=0 x12=. x13=0 x14=. x15=0 x16=. x17=0 x18=.

x19=0 x20=. x21=0 x22=. x23=0 x24=. x25=0 x26=. x27=0 x28=. x29=0 x30=. x31=0 x32=. x33=0 x34=.

x35=0 x36=. x37=0 x38=. x39=0 x40=. x41=5 x42=1 x43=0 x44=6 x45=2 x46=. x47=2 x48=1 x49=7 x50=6

x51=2 x52=4 x53=. x54=0 x55=. x56=0 x57=. x58=0 x59=. x60=0 x61=. x62=0 x63=. x64=2 x65=5 x66=9

x67=2 x68=0 x69=0 x70=6 x71=. x72=0 x73=. x74=0 x75=. x76=0 x77=. x78=0 x79=. x80=0 x81=. x82=0

x83=. x84=0 x85=. x86=0 x87=. x88=0 x89=. x90=0 x91=. x92=0 x93=. x94=0 x95=. x96=0 x97=. x98=0

x99=. x100=0 x101=. x102=0 x103=. x104=0 x105=. x106=0 x107=. x108=6 x109=7 x110=2 x111=1 x112=8

x113=. x114=0 x115=. x116=0 x117=. x118=6 x119=2 x120=9 x121=9 x122=0 x123=6 x124=0 x125=. x126=0

x127=. x128=. x129=0 x130=. x131=0 x132=. x133=0 x134=. x135=0 x136=. x137=0 x138=. x139=0 x140=.

x141=0 x142=. x143=0 x144=. x145=0 x146=. x147=0 x148=. x149=0 x150=. x151=0 x152=. x153=0 x154=.

x155=0 x156=. x157=0 x158=. x159=0 x160=. x161=0 x162=. x163=0 x164=. x165=8 x166=. x167=0 x168=.

x169=0 x170=. x171=0 x172=. x173=0 x174=. x175=0 x176=. x177=0 x178=. x179=2 x180=0 x181=. x182=0

x183=. x184=0 x185=. x186=0 x187=. x188=0 x189=. x190=1 x191=5 x192=6 x193=. x194=0 x195=. x196=0

x197=. x198=2 x199=. x200=2 x201=8 x202=. x203=0 x204=. x205=0 x206=. x207=0 x208=. x209=0 x210=.

x211=0 x212=. x213=1 x214=0 x215=3 x216=. x217=4 x218=2 x219=8 x220=4 x221=. x222=0 x223=. x224=0

x225=. x226=0 x227=. x228=0 x229=. x230=0 x231=. x232=0 x233=. x234=0 x235=. x236=0 x237=. x238=0

x239=. x240=0 x241=. x242=0 x243=. x244=0 _ERROR_=1 _N_=1

NOTE: 1 record was read from the infile 'f:\Stark\first.txt'.

The minimum record length was 256.

The maximum record length was 256.

One or more lines were truncated.

NOTE: The data set WORK.ONE has 1 observations and 244 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.01 seconds

data one; infile 'f:\Stark\first.txt' dlm='09'x truncover; array x{244}; input x(*) 5.; proc iml; use one; read all into x; proc print data= one; run; quit;

283 proc iml;

NOTE: IML Ready

284 use one;

285 read all into x;

NOTE: Exiting IML.

NOTE: PROCEDURE IML used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

286 proc print data= one;

287 run;

NOTE: There were 1 observations read from the data set WORK.ONE.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

288 quit;

or this one:

379 data one;

380 infile 'f:\Stark\first.txt' dlm='09'x truncover;

381 array x{244};

382 input x(*) 1.;

NOTE: The infile 'f:\Stark\first.txt' is:

Filename=f:\Stark\first.txt,

RECFM=V,LRECL=256,File Size (bytes)=3074,

Last Modified=29May2010:11:44:22,

Create Time=29May2010:11:44:21

NOTE: Invalid data for x2 in line 1 2-2.

NOTE: Invalid data for x4 in line 1 4-4.

NOTE: Invalid data for x6 in line 1 6-6.

NOTE: Invalid data for x241 in line 1 241-241.

NOTE: Invalid data for x243 in line 1 243-243.

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---

1 CHAR 0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.51062.217624.0.0.0.0.0.2592006.0.0.0.0.0.0.0.0.0

ZONE 3030303030303030303030303030303030303030333330333333030303030303333333030303030303030303

NUMR 0909090909090909090909090909090909090909510629217624909090909092592006909090909090909090

89 .0.0.0.0.0.0.0.0.0.67218.0.0.6299060.0..0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.0.0.0.0.0.

ZONE 0303030303030303030333330303033333330300303030303030303030303030303030303030303030303030

NUMR 9090909090909090909672189090962990609099090909090909090909090909090909090909890909090909

177 0.20.0.0.0.0.156.0.0.2.28.0.0.0.0.0.103.4284.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0

ZONE 30330303030303330303030330303030303033323333030303030303030303030303030303030303

NUMR 092090909090915690909292890909090909103E4284909090909090909090909090909090909090

x1=0 x2=. x3=0 x4=. x5=0 x6=. x7=0 x8=. x9=0 x10=. x11=0 x12=. x13=0 x14=. x15=0 x16=. x17=0 x18=.

x19=0 x20=. x21=0 x22=. x23=0 x24=. x25=0 x26=. x27=0 x28=. x29=0 x30=. x31=0 x32=. x33=0 x34=.

x35=0 x36=. x37=0 x38=. x39=0 x40=. x41=5 x42=1 x43=0 x44=6 x45=2 x46=. x47=2 x48=1 x49=7 x50=6

x51=2 x52=4 x53=. x54=0 x55=. x56=0 x57=. x58=0 x59=. x60=0 x61=. x62=0 x63=. x64=2 x65=5 x66=9

x67=2 x68=0 x69=0 x70=6 x71=. x72=0 x73=. x74=0 x75=. x76=0 x77=. x78=0 x79=. x80=0 x81=. x82=0

x83=. x84=0 x85=. x86=0 x87=. x88=0 x89=. x90=0 x91=. x92=0 x93=. x94=0 x95=. x96=0 x97=. x98=0

x99=. x100=0 x101=. x102=0 x103=. x104=0 x105=. x106=0 x107=. x108=6 x109=7 x110=2 x111=1 x112=8

x113=. x114=0 x115=. x116=0 x117=. x118=6 x119=2 x120=9 x121=9 x122=0 x123=6 x124=0 x125=. x126=0

x127=. x128=. x129=0 x130=. x131=0 x132=. x133=0 x134=. x135=0 x136=. x137=0 x138=. x139=0 x140=.

x141=0 x142=. x143=0 x144=. x145=0 x146=. x147=0 x148=. x149=0 x150=. x151=0 x152=. x153=0 x154=.

x155=0 x156=. x157=0 x158=. x159=0 x160=. x161=0 x162=. x163=0 x164=. x165=8 x166=. x167=0 x168=.

x169=0 x170=. x171=0 x172=. x173=0 x174=. x175=0 x176=. x177=0 x178=. x179=2 x180=0 x181=. x182=0

x183=. x184=0 x185=. x186=0 x187=. x188=0 x189=. x190=1 x191=5 x192=6 x193=. x194=0 x195=. x196=0

x197=. x198=2 x199=. x200=2 x201=8 x202=. x203=0 x204=. x205=0 x206=. x207=0 x208=. x209=0 x210=.

x211=0 x212=. x213=1 x214=0 x215=3 x216=. x217=4 x218=2 x219=8 x220=4 x221=. x222=0 x223=. x224=0

x225=. x226=0 x227=. x228=0 x229=. x230=0 x231=. x232=0 x233=. x234=0 x235=. x236=0 x237=. x238=0

x239=. x240=0 x241=. x242=0 x243=. x244=0 _ERROR_=1 _N_=1

NOTE: 1 record was read from the infile 'f:\Stark\first.txt'.

The minimum record length was 256.

The maximum record length was 256.

One or more lines were truncated.

NOTE: The data set WORK.ONE has 1 observations and 244 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.01 seconds

383 proc iml;

NOTE: IML Ready

384 use one;

385 read all into x;

NOTE: Exiting IML.

NOTE: PROCEDURE IML used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

386 proc print data= one;

387 run;

NOTE: There were 1 observations read from the data set WORK.ONE.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

388 quit;

and its output:

messy data 10:18 Saturday, May 29, 2010 7980

O x x x x x x x x x x x x x x x x x x x x x x x x x x x x

b x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3

s 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

1 . . . . . . . . 51062 2176 . . . 92006 . . . . . . . . . . . . . . . . . . . . . . .

O x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

b 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7

s 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

1 . . . . . . 0.4284 . . . . . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . .

x x x x x x x x x x x x x x x x x x x x

O x x x x x x x x x x x x x x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

b 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

s 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

b 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6

s 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2

b 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 0 0 0 0

s 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3

1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

O 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

b 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4

s 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4

1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

On Tue, May 18, 2010 at 11:07 PM, Tom Robinson <barefootguru@gmail.com>wrote:

> SAS handles large data sets fine and is the most adept language I've seen > for processing messy files. > > Can you post a sample of what you're trying to read and the code you've > written to read the data in? > > Cheers > > > On 2010-05-19, at 14:12, Anna Supady wrote: > > > Hi guys, > > > > I am trying to learn how to handle large data sets, like 1 GB. I have one > > project that I am working on right now from the website: > > www.kddcup-orange.com > > I am new to it. We tried to read data into SAS and it just doesn't read > any > > variable. Any suggestions how to read messy data? Any maybe simpler > examples > > helpful. > > > > Thanks a lot, > > > > Ania, >

On Tue, May 18, 2010 at 11:07 PM, Tom Robinson <barefootguru@gmail.com>wrote:

> SAS handles large data sets fine and is the most adept language I've seen > for processing messy files. > > Can you post a sample of what you're trying to read and the code you've > written to read the data in? > > Cheers > > > On 2010-05-19, at 14:12, Anna Supady wrote: > > > Hi guys, > > > > I am trying to learn how to handle large data sets, like 1 GB. I have one > > project that I am working on right now from the website: > > www.kddcup-orange.com > > I am new to it. We tried to read data into SAS and it just doesn't read > any > > variable. Any suggestions how to read messy data? Any maybe simpler > examples > > helpful. > > > > Thanks a lot, > > > > Ania, >


Back to: Top of message | Previous page | Main SAS-L page