LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 8 Jan 2011 10:52:07 -0500
Reply-To:     "Kirby, Ted" <ted.kirby@LEWIN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Kirby, Ted" <ted.kirby@LEWIN.COM>
Subject:      Re: Maximum Size for Character Variables?
In-Reply-To:  A<AANLkTiny6tOvP+4HQzfDmbasQBHL_pjmKMRy_PqaP311@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"

Joe,

When I run your code as given, the problem is compounded. When sing the FILEREF "afile" PROC EXPORT defines the LRECL as 256, which of course gives the usual line truncation note. Under this, the variable X (called "VAR3" by PROC EXPORT) has a length of 500 despite the fact that this length is larger than the LRECL:

239 240 proc import file=afile dbms=csv out=readin_import replace; 240! guessingrows=1050; 241 getnames=no; 242 run;

243 /********************************************************************** 244 * PRODUCT: SAS 245 * VERSION: 9.1 246 * CREATOR: External File Interface 247 * DATE: 08JAN11 248 * DESC: Generated SAS Datastep Code 249 * TEMPLATE SOURCE: (None Specified.) 250 ***********************************************************************/ 251 data READIN_IMPORT ; 252 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 253 infile AFILE delimiter = ',' MISSOVER DSD ; 254 informat VAR1 $4. ; 255 informat VAR2 $5. ; 256 informat VAR3 $500. ; 257 informat VAR4 $7. ; 258 informat VAR5 $3. ; 259 format VAR1 $4. ; 260 format VAR2 $5. ; 261 format VAR3 $500. ; 262 format VAR4 $7. ; 263 format VAR5 $3. ; 264 input 265 VAR1 $ 266 VAR2 $ 267 VAR3 $ 268 VAR4 $ 269 VAR5 $ 270 ; 271 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 272 run;

NOTE: The infile AFILE is: File Name=C:\Temp\_TD4228\#LN00009, RECFM=V,LRECL=256

NOTE: 1200 records were read from the infile AFILE. The minimum record length was 31. The maximum record length was 256. One or more lines were truncated. NOTE: The data set WORK.READIN_IMPORT has 1200 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

1200 rows created in READIN_IMPORT from AFILE.

NOTE: .READIN_IMPORT was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 1.71 seconds cpu time 1.70 seconds

Now, when I use the actual file name of the "afile" FILEREF (copied from the log when "afile" was created). The LRECL now is 32767, but X (VAR3) still has a length of 500:

273 PROC IMPORT OUT= WORK.TESTREADIN 274 DATAFILE= "C:\Temp\_TD4228\#LN00009" 275 DBMS=CSV REPLACE; 276 guessingrows=1050; 277 GETNAMES=NO; 278 DATAROW=1; 279 RUN;

280 /********************************************************************** 281 * PRODUCT: SAS 282 * VERSION: 9.1 283 * CREATOR: External File Interface 284 * DATE: 08JAN11 285 * DESC: Generated SAS Datastep Code 286 * TEMPLATE SOURCE: (None Specified.) 287 ***********************************************************************/ 288 data WORK.TESTREADIN ; 289 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 290 infile 'C:\Temp\_TD4228\#LN00009' delimiter = ',' MISSOVER DSD lrecl=32767 ; 291 informat VAR1 $4. ; 292 informat VAR2 $5. ; 293 informat VAR3 $500. ; 294 informat VAR4 $7. ; 295 informat VAR5 $3. ; 296 format VAR1 $4. ; 297 format VAR2 $5. ; 298 format VAR3 $500. ; 299 format VAR4 $7. ; 300 format VAR5 $3. ; 301 input 302 VAR1 $ 303 VAR2 $ 304 VAR3 $ 305 VAR4 $ 306 VAR5 $ 307 ; 308 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 309 run;

NOTE: The infile 'C:\Temp\_TD4228\#LN00009' is: File Name=C:\Temp\_TD4228\#LN00009, RECFM=V,LRECL=32767

NOTE: 1200 records were read from the infile 'C:\Temp\_TD4228\#LN00009'. The minimum record length was 31. The maximum record length was 1230. NOTE: The data set WORK.TESTREADIN has 1200 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

What goes on here?

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Joe Matise Sent: Saturday, January 08, 2011 12:58 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Maximum Size for Character Variables?

Art, I *think* that windows registry fix applies to Excel (XLS, specifically, the XLSX conversion is in another location) only. CSV seems to be an SCL program entirely, as far as I can tell (though the actual mechanics is well beyond my expertise).

I don't find any unexpected performance, however, locally. I have a basically fresh 9.2 TSM2 install on the PC I tested this on (no settings changed from defaults whatsoever), I haven't fiddled with registries on this windows machine (again, fairly fresh install); only caveat is it's on Win7 64, so who knows what that does. But this test program behaves as expected, I think; I have a single variable that has a size > 1024 characters, way below the normal guessingrows value (omitting guessingrows ends with a size of $33), being read in as a much larger field when I increase guessingrows sufficiently to meet the need.

filename afile temp; data null; length x $1300; file afile dlm=',' lrecl=5000; do _n_ = 1 to 1200; x="abc"||repeat('0',_n_)||'cba'; a="test"; b="hello"; y="goodbye"; z="end";

put a $ b $ x $ y $ z $ ; end; run;

data readin; infile afile lrecl=5000 dlm=','; length x $1300; input a $ b $ x $ y $ z $ ; run;

proc import file=afile dbms=csv out=readin_import replace; guessingrows=1050; getnames=no; run;

Not only does the PROC IMPORT code get you the full length, but it goes above and beyond and gets another couple hundred columns beyond what row 1050 needs.

I would be curious if you have any issues with the encoding of the file (ANSI vs UTF-8, for example)?

-Joe

On Fri, Jan 7, 2011 at 7:09 PM, Arthur Tabachneck <art297@rogers.com> wrote:

> Ted, > > You probably have to change Window's guessing rows setting. Take a look at > the following post from data_null: http://xrl.us/bfa8qi > > Art > ------ > On Fri, 7 Jan 2011 20:02:17 -0500, Kirby, Ted <ted.kirby@LEWIN.COM> wrote: > > >I have a rather messy CSV file (messy in the sense that the variable > >values are all over the place). I was surprised at how well PROC IMPORT > >did in creating a SAS dataset, especially if I set the GUESSINGROWS > >option high enough so that the variables have the correct format > >(character or numeric) and size. My question is: > > > > > > > >Is there some sort of SAS option that restricts the size of character > >variables that will be read with PROC IMPORT? I have scanned the list > >of SAS Options in the SAS documentation on my PC, but did not any option > >name that seemed to deal with variable lengths (other than the MVARSIZE > >option that deals with the size of macro variables). > > > > > > > >The reason for my question is that one instance of the character > >variable "overall" contains 1,026 characters. This occurs on the > >1,323rd row of the CSV file. When I use PROC IMPORT to read the data > >and set GUESSINGROWS to 1,400 (to be sure SAS looks at the row with the > >longest instance of the "overall" variable), PROC IMPORT assigns a > >length of 500 to the "overall" variable, not the correct length of > >1,026. > > > > > > > >This question is somewhat academic because I have also read this file > >with an INPUT statement and the colon format modifier as in: > > > > > > > >data client; > > > >infile "c:\clientfile.csv" dsd missover lrecl = 20000 firstobs = 2; > > > >input > > > >ClientID :4. > > > >fname :$22. > > > > > > > ><lots of other variables> > > > > > > > >Overall :$1026. > > > > > > > ><lots more variables> > > > > > > > >; > > > >run; > > > > > > > >but I just wanted to understand what was happening. > > > > > > > > > > > > > >************* IMPORTANT - PLEASE READ ******************** > > > >This e-mail, including attachments, may include confidential and/or > proprietary information, > >and may be used only by the person or entity to which it is addressed. If > the reader of this > >e-mail is not the intended recipient or his or her authorized agent, the > reader is hereby > >notified that any dissemination, distribution or copying of this e-mail is > prohibited. If you > >have received this e-mail in error, please notify the sender by replying > to > this message > >and delete this e-mail immediately. > > > ************* IMPORTANT - PLEASE READ ********************

This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.


Back to: Top of message | Previous page | Main SAS-L page