LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2011, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 4 Jan 2011 13:12:16 -0800
Reply-To:     "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Subject:      Re: Is it sorted? Are there performance benefits?
In-Reply-To:  <AANLkTin79h38p1acj+njT_SXBsfEm9p=-yLxvizBvdKL@mail.gmail.com>
Content-Type: text/plain; charset=utf-8

Sterling, Joe, et al.,

Have I missed something in this thread? It wouldn't be the first time. :-) But, in the log below, I see a "short-circuiting" of file reading when the SORTEDBY= option is specified.

113 data have; 114 do i=1 to 10000000; 115 do j=1 to 10; 116 output; 117 end; 118 end; 119 run;

NOTE: The data set WORK.HAVE has 100000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 6.78 seconds cpu time 5.74 seconds

120 121 data want(where=(i=1)) ; 122 set have; 123 run;

NOTE: There were 100000000 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 10 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 9.20 seconds cpu time 9.20 seconds

124 data want ; 125 set have(where=(i=1) sortedby=i); 126 run;

NOTE: There were 10 observations read from the data set WORK.HAVE. WHERE i=1; NOTE: The data set WORK.WANT has 10 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 3.68 seconds cpu time 3.66 seconds

127 128 129 data want(where=(i=1)) ; 130 set have; 131 by i; 132 run;

NOTE: There were 100000000 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 10 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 13.27 seconds cpu time 13.27 seconds

133 data want ; 134 set have(where=(i=1) sortedby=i); 135 by i; 136 run;

NOTE: There were 10 observations read from the data set WORK.HAVE. WHERE i=1; NOTE: The data set WORK.WANT has 10 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 3.74 seconds cpu time 3.74 seconds

So what am I missing in this discussion?

Dan

Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Joe Matise > Sent: Tuesday, January 04, 2011 12:43 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Is it sorted? Are there performance benefits? > > Do you need the index, or can you just remove it (in PROC DATASETS)? > > -Joe > > On Tue, Jan 4, 2011 at 2:24 PM, Sterling Paramore > <gnilrets@gmail.com>wrote: > > > Good point Mark, I neglected to test that. So it looks like we're > stuck > > with the PROC SORT PRESORTED method discussed earlier. I'm having a > > problem > > with this, however, because the dataset comes indexed and PROC SORT > wont > > modify the indexed dataset even though I'm only asking it to validate > the > > sort order. > > > > > > On Tue, Jan 4, 2011 at 11:53 AM, Keintz, H. Mark > > <mkeintz@wharton.upenn.edu>wrote: > > > > > I also thought assigning the SORTEDBY attribute would generate the > needed > > > metadata without the burden of needlessly running proc sort. So I > tried > > it. > > > > > > But it did not improve data retrieval. Apparently the fact that > the > > > VALIDATED=NO attribute remained was sufficient to prevent the data > engine > > > from stopping once J>1. > > > > > > I would appreciate examples of performance improvements or other > > efficacies > > > (beyond documenting the sort order) from setting SORTEDBY in the > absence > > of > > > validation by PROC SORT. > > > > > > Regards, > > > Mark > > > > > > > > > > > > > -----Original Message----- > > > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf > Of > > > > Mike Zdeb > > > > Sent: Tuesday, January 04, 2011 2:22 PM > > > > To: SAS-L@LISTSERV.UGA.EDU > > > > Subject: Re: Is it sorted? Are there performance benefits? > > > > > > > > hi (again, forgot to add) ... and you could also add the > attribute as > > > > you make the data set ... > > > > > > > > data have; > > > > input x @@; > > > > datalines; > > > > 1 3 2 > > > > ; > > > > run; > > > > > > > > proc sort data = have; > > > > by x; > > > > run; > > > > > > > > data have_2 (sortedby=x); > > > > set have; > > > > by x; > > > > y = x + 10; > > > > run; > > > > > > > > -- > > > > Mike Zdeb > > > > U@Albany School of Public Health > > > > One University Place (Room 119) > > > > Rensselaer, New York 12144-3456 > > > > P/518-402-6479 F/630-604-1475 > > > > > > > > > hi ... if you are sure that a data set is sorted, you can add > that > > > > attribute ... > > > > > > > > > > proc datasets lib=work nolist; > > > > > modify have_2 (sortedby=x); > > > > > quit; > > > > > > > > > > proc contents data = have_2; > > > > > run; > > > > > > > > > > you'll see ... > > > > > > > > > > Sort Information > > > > > Sortedby x > > > > > Validated NO > > > > > Character Set ANSI > > > > > > > > > > and if you try to sort HAVE_2 ... > > > > > > > > > > 198 > > > > > 199 proc sort data=have_2; > > > > > 200 by x; > > > > > 201 run; > > > > > > > > > > NOTE: Input data set is already sorted, no sorting done. > > > > > NOTE: PROCEDURE SORT used (Total process time): > > > > > real time 0.00 seconds > > > > > cpu time 0.00 seconds > > > > > > > > > > > > > > > you can change that attribute even if the data set is not > sorted ... > > > > > > > > > > > > > > > data have; > > > > > input x : $1. @@; > > > > > datalines; > > > > > 1 3 2 > > > > > ; > > > > > run; > > > > > > > > > > proc datasets lib=work nolist; > > > > > modify have (sortedby=x); > > > > > quit; > > > > > > > > > > proc contents data=have; > > > > > run; > > > > > > > > > > proc sort data=have; > > > > > by x; > > > > > run; > > > > > > > > > > and still get ... > > > > > > > > > > Sort Information > > > > > Sortedby x > > > > > Validated NO > > > > > Character Set ANSI > > > > > > > > > > and ... > > > > > > > > > > 262 proc sort data=have; > > > > > 263 by x; > > > > > 264 run; > > > > > > > > > > NOTE: Input data set is already sorted, no sorting done. > > > > > NOTE: PROCEDURE SORT used (Total process time): > > > > > real time 0.00 seconds > > > > > cpu time 0.00 seconds > > > > > > > > > > > > > > > > > > > > -- > > > > > Mike Zdeb > > > > > U@Albany School of Public Health > > > > > One University Place (Room 119) > > > > > Rensselaer, New York 12144-3456 > > > > > P/518-402-6479 F/630-604-1475 > > > > > > > > > >> Dear SAS-L, > > > > >> > > > > >> I have a dataset that is sorted and proc contents reports it > as > > > > sorted. I > > > > >> then read that dataset using a by statement and make a new > column. > > > > Suddenly > > > > >> proc contents reports it as not sorted. Of course, it is > still > > > > sorted, but > > > > >> I was wondering if anyone knew if there were performance > problems > > > > that would > > > > >> occur since proc contents things it's not sorted. > > > > >> > > > > >> Here's an example to show what's happening: > > > > >> > > > > >> data have; > > > > >> input x $; > > > > >> datalines; > > > > >> 1 > > > > >> 3 > > > > >> 2 > > > > >> ; > > > > >> run; > > > > >> > > > > >> proc sort data = have; > > > > >> by x; > > > > >> run; > > > > >> > > > > >> data have_2; > > > > >> set have; > > > > >> by x; > > > > >> y = x + 10; > > > > >> run; > > > > >> > > > > >> proc contents data = have; > > > > >> run; > > > > >> proc contents data = have_2; > > > > >> run; > > > > >> > > > > >> Thanks, > > > > >> Sterling > > > > >> > > > > > > > > > >


Back to: Top of message | Previous page | Main SAS-L page