Date: Fri, 22 Oct 2010 11:07:48 -0500
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: When does PROC SQL consistently leave incoming data order
unchanged?
In-Reply-To: <E0B423A8C0D1E74B8905B2C5CB38C1AF0D4D5C@GENO3.wharton.upenn.edu>
Content-Type: text/plain; charset=ISO-8859-1
What you found is consistent with what I've generally observed. The warning
that it cannot be guaranteed is more of a 'if it really matters, use order
by, or you might not realize it's reordering things' warning.
-Joe
On Fri, Oct 22, 2010 at 10:40 AM, Keintz, H. Mark <mkeintz@wharton.upenn.edu
> wrote:
> Folks:
>
> The SAS 9.2 SQL Procedure User's Guide (page 24) states:
>
> "Unless an ORDER BY clause is included in the SELECT statement,
> then a particular order to the output rows, such as the order
> in which the rows are encountered in the queried table, cannot
> be guaranteed, even if an index is present."
>
> Yet we have found that SAS DOES CONSISTENTLY PRESERVE QUERIED
> TABLE ORDER for this very simple SELECT example, under certain
> conditions. We generated dataset WANT as:
>
> PROC SQL noprint;
> create table WANT as
> select * from HAVE
> where qvar in ( list of values ) ;
> quit;
>
> Then we compared WANT to dataset CONTROL, defined as:
>
> data CONTROL;
> set HAVE;
> if qvar in ( list of values ) ;
> run;
>
> (Yes, we used IF intentionally). The conditions in which we
> found WANT to always match CONTROL (i.e. no data reordering)
> were:
>
> 1. HAVE is not indexed or SAS chooses not to use an index.
>
> After dozens of tests on several datasets with (in the
> aggregate) over 2 billion records, we have not produced a
> single instance of re-ordering.
>
> 2. SAS does use an index, (i.e. HAVE is indexed by QVAR), but
> HAVE is also sorted by QVAR. I.e. the index variable is
> also the primary sort key.
>
> My question is:
>
> Have any of you found PROC SQL to re-order records in either
> condition 1 or 2 above?
>
> I should add that
> (a) we are using a LINUX operating system,
> (b) the THREADS option is enabled,
> (c) CPUCOUNT is 4,
> (d) the queried tables are SAS version 9 datasets, and
> (e) our data is not stored using the SPDE engine (although
> it is on a RAID array).
>
>
> BTW, it is possible that the ORDER BY will reorder the data,
> even if it specifies the same sort keys as the queried table.
> Specifically, if HAVE is sorted by QVAR X Y, but has records
> with duplicate keys, then using ORDER BY QVAR,X,Y can re-order
> the tied records (it did so in 20 out of 20 tests we performed).
>
> Regards,
> Mark
>
|