LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 5 Dec 2006 15:03:58 -0500
Reply-To:     Venky Chakravarthy <swovcc@HOTMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Venky Chakravarthy <swovcc@HOTMAIL.COM>
Subject:      Re: A SAS patch changed the way PROC SQL worked and changed the
              way I thought PROC SQL worked.
Comments: To: Roy Pardee <pardee.r@GHC.ORG>

Now try:

proc sql feedback _method; create table testout as select sum(s) as l , s , s as y from test group by s ; quit ;

and

proc sql feedback _method; create table testout as select sum(s) as l , s as x , s as y from test group by s ; quit ;

When I ran this in 8.2 and 9.1.3, the above two misbehaved consistently. Why would assigning an alias or not make the difference in summarizing to a 1 record or a 10 record dataset? Why would S still get reported as TEST.Y as a result of the FEEDBACK option. It appears that the aliased name always triumphs even if preceded by the original, as is, in the SELECT.

Now, you SQL folks "Talk amongst yourselves" :-) and offer guidance to others.

Venky Chakravarthy

On Tue, 5 Dec 2006 11:25:01 -0800, Pardee, Roy <pardee.r@GHC.ORG> wrote:

>You're being pretty charitable here, I think. This looks like a >straight-up bug to me. DISTINCT might have the desired effect, but I'd >call it a workaround rather than a matter of making intent clear. > >I've said it before & I'll say it again: I wish proc sql had a >'noremerge' option or some such that I could use to turn off the remerge >behavior. I *never* do it on purpose, b/c it's not sql that would run >anyplace but sas, and having it as a possibility does occasionally >confuse sas... > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of >Sigurd Hermansen >Sent: Tuesday, December 05, 2006 10:46 AM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: A SAS patch changed the way PROC SQL worked and changed the >way I thought PROC SQL worked. > >Guido: >I don't have a sensible explanation of how SAS SQL decides to collapse >rows in a GROUP BY query, and why it does some times and not others. >Calculating or renaming of columns in a GROUP BY clause tends to inhibit >collapsing of rows in summaries. In the case you present, the alias for >s in the select statement would be difficult for the SQL compiler to >relate back to the s in the GROUP BY clause. > >Make your intent more obvious to the SQL compiler with the DISTINCT >keyword following SELECT. In a query language that conforms to the >relational model, the duplicated rows should not appear at all in a >relation. Perhaps SAS is trying to prevent misleading results of summary >queries that involve the sometimes useful but peculiar REMERGE's. Most >other flavours of SQL avoid these problems by refusing to process a >query that has a different SELECT list (except for summary functions) >than GROUP BY list. > >Though certainly legal in a SQL query, copying of column values and >renaming would be more appropriate in a data display query than in a >query that creates a relational table. The copying and renaming confuse >the SQL compiler and add no information to the database. >Sig > >-----Original Message----- >From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] >On Behalf Of Guido T >Sent: Tuesday, December 05, 2006 3:49 AM >To: sas-l@listserv.uga.edu >Subject: A SAS patch changed the way PROC SQL worked and changed the way >I thought PROC SQL worked. > > >Hi SAS-L, > >I had some PROC SQL code that I used to count the number of elements in >a group to be later used to create a format. Here's a simplified >version of my code :- > >proc sql; > create table testout as > select > count(*) as l > , s as x > , s as y > from test > group by s > ; >quit; > >The idea was to get the count of number of elements in the group S into >L and populated X and Y with the group S. The way I thought PROC SQL >would work was to group the data by S and then populate the output >dataset. This worked for a couple of years in a macro that wasn't >changed. > >When I recently had to re-run the code on some old data I got a lot of >errors from PROC FORMAT about overlapping ranges etc. That data hadn't >changed, my code hadn't changed, the macro hadn't changed, so what had >happened? > >When I ran the code on my PC (using an "old" version of SAS 8.2) >everything was OK, but when I ran it on the production Unix server the >code crashed. > >Log from my PC: > >1 data test; >2 s = 1; >3 do p =1 to 10; >4 output; >5 end; >6 run; > >NOTE: The data set WORK.TEST has 10 observations and 2 variables. >NOTE: DATA statement used: > real time 0.45 seconds > cpu time 0.04 seconds >7 >8 proc sql feedback _method; >9 create table testout as >10 select >11 count(*) as l >12 , s as x >13 , s as y >14 from test >15 group by s >16 ; >NOTE: Statement transforms to: > > select COUNT(*) as l, TEST.s as x, TEST.s as y > from WORK.TEST > group by TEST.s; > >NOTE: SQL execution methods chosen are: > > sqxcrta > sqxsumg > sqxsort > sqxsrc( WORK.TEST ) >NOTE: Table WORK.TESTOUT created, with 1 rows and 3 columns. > >17 quit; >NOTE: PROCEDURE SQL used: > real time 0.26 seconds > cpu time 0.09 seconds > >One record is output to TESTOUT (as I expected) and the transformed code >looks like what I would have expected as well, just making the >references more explicit. > >When the same code is run on Unix (Still SAS 8.2, but with more recent >hotfixes) the following is in the log : > >8 proc sql feedback _method; >9 create table testout as >10 select >11 count(*) as l >12 , s as x >13 , s as y >14 from test >15 group by s >16 ; >NOTE: Statement transforms to: > > select COUNT(*) as l, TEST.S as x, TEST.S as y > from WORK.TEST > group by x; > >NOTE: The query requires remerging summary statistics back with the >original data. > >NOTE: SQL execution methods chosen are: > > sqxcrta > sqxsumg > sqxsort > sqxsrc( WORK.TEST ) >NOTE: Table WORK.TESTOUT created, with 10 rows and 3 columns. > >TEN records in TESTOUT and the transformed code is now being grouped by >X instead of TEST.S. I didn't say group by X, I didn't think I >could group by a plain X, because it wasn't in the input dataset. So >grouping by X also gives the new NOTE in the log " NOTE: The query >requires remerging summary statistics back with the original data." >which would reasonable if I was grouping by X, but I wasn't. > >I tried SAS 9.1.3 SP4 on Windows XP and got slightly different results. > >23 proc sql feedback; >24 create table testout as >25 select >26 count(*) as l >27 , s as x >28 , s as y >29 from test >30 group by s >31 ; >NOTE: Statement transforms to: > > select COUNT(*) as l, TEST.s as x, TEST.s as y > from WORK.TEST > group by TEST.x; > >NOTE: The query requires remerging summary statistics back with the >original data. >NOTE: Table WORK.TESTOUT created, with 10 rows and 3 columns. > >Still TEN records, but now it is grouping by TEST.X. There isn't a >TEST.X, Only TEST.S and TEST.P. > >There are lots of ways to get PROC SQL to return the desired result, but >some hotfix (I don't know which one) changed the way PROC SQL worked and >also changed my view on how PROC SQL works. > >++ Guido


Back to: Top of message | Previous page | Main SAS-L page