Date: Tue, 4 Jan 2011 14:42:59 -0600
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Is it sorted? Are there performance benefits?
In-Reply-To: <AANLkTikMr_3RYuxQL-WvR2svQp3K1FgZfr-74uwa88UW@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Do you need the index, or can you just remove it (in PROC DATASETS)?
-Joe
On Tue, Jan 4, 2011 at 2:24 PM, Sterling Paramore <gnilrets@gmail.com>wrote:
> Good point Mark, I neglected to test that. So it looks like we're stuck
> with the PROC SORT PRESORTED method discussed earlier. I'm having a
> problem
> with this, however, because the dataset comes indexed and PROC SORT wont
> modify the indexed dataset even though I'm only asking it to validate the
> sort order.
>
>
> On Tue, Jan 4, 2011 at 11:53 AM, Keintz, H. Mark
> <mkeintz@wharton.upenn.edu>wrote:
>
> > I also thought assigning the SORTEDBY attribute would generate the needed
> > metadata without the burden of needlessly running proc sort. So I tried
> it.
> >
> > But it did not improve data retrieval. Apparently the fact that the
> > VALIDATED=NO attribute remained was sufficient to prevent the data engine
> > from stopping once J>1.
> >
> > I would appreciate examples of performance improvements or other
> efficacies
> > (beyond documenting the sort order) from setting SORTEDBY in the absence
> of
> > validation by PROC SORT.
> >
> > Regards,
> > Mark
> >
> >
> >
> > > -----Original Message-----
> > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > > Mike Zdeb
> > > Sent: Tuesday, January 04, 2011 2:22 PM
> > > To: SAS-L@LISTSERV.UGA.EDU
> > > Subject: Re: Is it sorted? Are there performance benefits?
> > >
> > > hi (again, forgot to add) ... and you could also add the attribute as
> > > you make the data set ...
> > >
> > > data have;
> > > input x @@;
> > > datalines;
> > > 1 3 2
> > > ;
> > > run;
> > >
> > > proc sort data = have;
> > > by x;
> > > run;
> > >
> > > data have_2 (sortedby=x);
> > > set have;
> > > by x;
> > > y = x + 10;
> > > run;
> > >
> > > --
> > > Mike Zdeb
> > > U@Albany School of Public Health
> > > One University Place (Room 119)
> > > Rensselaer, New York 12144-3456
> > > P/518-402-6479 F/630-604-1475
> > >
> > > > hi ... if you are sure that a data set is sorted, you can add that
> > > attribute ...
> > > >
> > > > proc datasets lib=work nolist;
> > > > modify have_2 (sortedby=x);
> > > > quit;
> > > >
> > > > proc contents data = have_2;
> > > > run;
> > > >
> > > > you'll see ...
> > > >
> > > > Sort Information
> > > > Sortedby x
> > > > Validated NO
> > > > Character Set ANSI
> > > >
> > > > and if you try to sort HAVE_2 ...
> > > >
> > > > 198
> > > > 199 proc sort data=have_2;
> > > > 200 by x;
> > > > 201 run;
> > > >
> > > > NOTE: Input data set is already sorted, no sorting done.
> > > > NOTE: PROCEDURE SORT used (Total process time):
> > > > real time 0.00 seconds
> > > > cpu time 0.00 seconds
> > > >
> > > >
> > > > you can change that attribute even if the data set is not sorted ...
> > > >
> > > >
> > > > data have;
> > > > input x : $1. @@;
> > > > datalines;
> > > > 1 3 2
> > > > ;
> > > > run;
> > > >
> > > > proc datasets lib=work nolist;
> > > > modify have (sortedby=x);
> > > > quit;
> > > >
> > > > proc contents data=have;
> > > > run;
> > > >
> > > > proc sort data=have;
> > > > by x;
> > > > run;
> > > >
> > > > and still get ...
> > > >
> > > > Sort Information
> > > > Sortedby x
> > > > Validated NO
> > > > Character Set ANSI
> > > >
> > > > and ...
> > > >
> > > > 262 proc sort data=have;
> > > > 263 by x;
> > > > 264 run;
> > > >
> > > > NOTE: Input data set is already sorted, no sorting done.
> > > > NOTE: PROCEDURE SORT used (Total process time):
> > > > real time 0.00 seconds
> > > > cpu time 0.00 seconds
> > > >
> > > >
> > > >
> > > > --
> > > > Mike Zdeb
> > > > U@Albany School of Public Health
> > > > One University Place (Room 119)
> > > > Rensselaer, New York 12144-3456
> > > > P/518-402-6479 F/630-604-1475
> > > >
> > > >> Dear SAS-L,
> > > >>
> > > >> I have a dataset that is sorted and proc contents reports it as
> > > sorted. I
> > > >> then read that dataset using a by statement and make a new column.
> > > Suddenly
> > > >> proc contents reports it as not sorted. Of course, it is still
> > > sorted, but
> > > >> I was wondering if anyone knew if there were performance problems
> > > that would
> > > >> occur since proc contents things it's not sorted.
> > > >>
> > > >> Here's an example to show what's happening:
> > > >>
> > > >> data have;
> > > >> input x $;
> > > >> datalines;
> > > >> 1
> > > >> 3
> > > >> 2
> > > >> ;
> > > >> run;
> > > >>
> > > >> proc sort data = have;
> > > >> by x;
> > > >> run;
> > > >>
> > > >> data have_2;
> > > >> set have;
> > > >> by x;
> > > >> y = x + 10;
> > > >> run;
> > > >>
> > > >> proc contents data = have;
> > > >> run;
> > > >> proc contents data = have_2;
> > > >> run;
> > > >>
> > > >> Thanks,
> > > >> Sterling
> > > >>
> > > >
> >
>
|