Date: Tue, 12 Feb 2008 14:42:35 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: SQL logic help
In-Reply-To: <5860691a-9dd8-4b99-b698-6083fc7df571@e6g2000prf.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"
Reeza:
Kenneth correctly points out that SQL does not have the logical
equivalent of first.x. The first obs implies a first in a physical
sequence in a dataset. Since appropriate physical sequencing of a
dataset requires sorting, which may alter a physical sequence required
for other purposes, logic programming does not rely on (or recognize) a
physical sequence of obs in a dataset.
In SQL a database programmer relies on groups of distinct attribute
values (sets) instead of physical sequence. In a group that has distinct
key values (activity_date?), the MIN() summary function evaluates as
True when its argument becomes the earliest date of the activity_date
attribute. As a summary function, MIN() can only appear in a HAVING
clause of a GROUP BY clause.
Now if a group contains more than one instance of an activity_date and
they happen to have the earliest date value in that group, all satisfy
the condition MIN(activity_date). A SQL program will then yield for that
group more than one tuple and, possibly, more than one activity_ID. In
contrast the first.group condition arbitrarily selects the first of the
obs in the physical sequence of the dataset. Depending on specifications
of the sorting procedure, that sequence could be the order of obs in the
source dataset, or something else, and may not be reproducible unless
the programmer preserves the physical ordering of the source dataset and
sort specifications.
If one precludes duplicated activity_date values in any group, the SQL
solution
... select distinct group,activity_id,MIN(activity_date) from have
where activity=2
group by group,activity_ID
;
More than one tuple with the earliest activity_date in any group limited
to activity=2 will result in initial selection of multiple tuples. If
each has the same value of activity_id, the DISTINCT modifier will limit
the result to a single tuple.
Depending on what relations exist among the attributes activity, group,
activity_ID, and activity_date, other SQL programs could give you
different results. The first.x method doesn't require much in the way of
preconditions and key definitions, but it may only work for a particular
configuration of data and may be difficult to reproduce. The SAS Data
step solutions differ from SQL solutions in both theoretical and
practical ways. Over a number of years the logic of sets and relations
that SQL implements has become easier for me to grasp than the somewhat
more procedural logic of SAS Data steps (or equivalent scripting
languages such as PL/SQL and Transact/SQL). You'll have to decide what
course to take for yourself.
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Reeza
Sent: Tuesday, February 12, 2008 12:42 PM
To: sas-l@uga.edu
Subject: SQL logic help
Hi All,
I know this isn't quite the right group, but I also know there's a lot
of SQL programmers here.
I have a simple set of logic that I can't quite get to work.
I have a list of activities, the activity date, the group and the
activity ID.
What I need to do is select by group, the last activity, the date and
corresponding activitity ID for a specific activity.
In SAS my code would be something like this
proc sort data=have;
by group activity_date;
data need;
set have;
where activity=2;
by group;
if first.group;
keep group activity_date activity_ID;
run;
I just need some help translating this to SQL, please :)
TIA,
Reeza