**Date:** Thu, 22 May 1997 10:30:03 PDT
**Reply-To:** bill@HOPI.CVG.BAXTER.COM
**Sender:** "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
**From:** William Anderson <bill@HOPI.CVG.BAXTER.COM>
**Subject:** Re: PROC SQL Equivalent of First.Variable
**In-Reply-To:** <c=US%a=_%p=BLS%l=OCWCMAIL-970522153138Z-16472@ocwcmail.PSB.BLS.GOV>; from "Toland_G" at May 22,
97 11:31 am
It is an axiom of relational database theory that the physical order of observations does not matter. (Check the latest edition of Date, or your alternate favorite book.) It follows that a relational language should not even allow you to have a first. operator. Thus one should not be surprised if SQL doesn't have such an operator; the surprise would be if the converse were true. In this respect PROC MERGE is not relational; for me this is sufficient reason to avoid PROC MERGE.

Of course in the real world we can pick and choose what theoretical model we like, but at least the standard axiomatic situation should lead one to question whether or not the desired query is even meaningful.

On the other hand, if the desire is not to pick the first record of each "x" group, but rather the one that corresponds to the smallest y, then the query is within the bounds of relational theory. SQL provides at least two methods of doing it.

You can select the smallest value of y, and then select the corresponding record. You can use views or subqueries, or a combination. I prefer the view approach because I find it easier to generate personal confidence in complicated queries (i.e. debugging is easier.)

For example

PROC SQL;
Create View Smally as select min(y) as miny from data2;
Create View Small2 as select data2.* from data2, smally where data2.y = smally.miny;
Select data1.x, data1.status, small2.y from data1, small2 where data1.x = small2.x;

***************************************************************
* William N. Anderson, PhD *
* Manager, Biostatistics Phone 714-250-2202 *
* Baxter Healthcare Fax 714-250-5080 *
* Cardiovascular Group EMail andersb@cvg.baxter.com *
* 17221 Red Hill Avenue *
* Irvine CA 92714 *
***************************************************************

>
> I have the following two data sets:
>
> DATA1
> --------
> x = 1 status = "C"
>
> DATA2
> --------
> x = 1 y = 1.000
> x = 1 y = 2.000
>
> I would like to produce DATA3 as
> x = 1 status = "C" y = 1.000
>
> Basically, I want to merge the first record of each "x" group in DATA2
> onto DATA1 using PROC SQL.
>
> If I use the code:
>
> proc sql;
> select a.x, a.status, b.y
> from data1 a, ( select distinct x, y from data2 ) b
> where a.x = b.x;
> quit;
>
> then my results are
>
> DATA3
> x = 1 status = "C" y = 1
> x = 1 status = "C" y = 2
>
> How do I just take the first record of each x value in DATA2?
> Gregory J Toland
> Bureau of Labor Statistics
> Office of Compensation & Working Conditions
> Statistical Methods Group
> 202-606-6133
>
> GO REDSOX CELTICS BRUINS PATRIOTS!!!
>