Date: Thu, 2 May 2002 16:13:12 -0700
Reply-To: Sigurd Wilson Hermansen <hermans1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Wilson Hermansen <hermans1@WESTAT.COM>
Subject: Re: An SQL question.
Content-Type: text/plain; charset=ISO-8859-1
Others have pointed you in the right direction. Let me continue a bit
Each distinct value of familyid becomes a group. As a first step, the
program counts the number of rows containing workhrs *values* within
each group. Then it expands each row of that result to create a
different row for each instance of a workhrs value.
If you have
the SQL compiler first generates
in the background, and then '..remerges summary statistics back to the
original data ..' to produce this result:
familyid workhrs nworkhrs
1 3 2
1 2 2
2 10 2
2 1 2
This procedure often produces mistaken solutions, so the note serves
as a warning. In this case you probably want to sum rather than count
workhrs for each familyid. In that case, you should select familyid,
sum(workhrs) as nworkhrs. That removes the dilemma of what to do with
multiple values of workhrs per group.
If you really do need a workhr value for each group, you would
probably want to limit it to one. In earlier posts I have illustrated
how to select the minimum or maximum values (assuming no multiple
minima or maxima). It gets more complicated if you do have multiples.
firstname.lastname@example.org (Neal K Nair) wrote in message news:<vines.SGH8+LOJowA@c11wdc-0020b.acf.dhhs.gov>...
> Hi SAS-Lers
> I have an SQL program as follows:
> proc sql;
> crate table three as
> select familyid, workhrs,count(*) as nworkhrs
> from two
> group by familyid;
> [data set named 'two' contains the variables 'familyid' and 'workhrs', sorted
> by familyid and by descending order of workhrs]
> I get a NOTE in the SAS log: "The querry requires remerging summary
> statistics back with the original data".
> I do not understand the NOTE. Could someone explain it to me, please?