| Date: | Wed, 29 Jul 1998 17:56:12 -0400 |
| Reply-To: | HERMANS1 <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU> |
| From: | HERMANS1 <HERMANS1@WESTAT.COM> |
| Subject: | Re[2]: Data reshaping/rearranging question: job for SQL? |
|
| Content-Type: | text/plain; charset=US-ASCII |
While we are waiting for a response from the SQL expert, let me
suggest that you have probably reduced the solution to an optimal
form. Michael has asked in essence for a Cartesian product of the two
subsets _NAME_ ^ _TYPE_="OBS" and _NAME_ ^ _TYPE_="VAR". Your program
initially defines those sets as sources for a Cartesian product
(evidenced by the absence of a WHERE or ON restriction on the main
SELECT statement). The SAS log message merely tells you that a
Cartesian product cannot be optimized. That seems obvious in this
case.
While you would like to avoid creating a Cartesian product first and then
subsetting it, in this case you actually want the Cartesian product as a final
result. You have already partitioned the table in the OBS and VAR subsets
before joining them into a Cartesian product. That should prove sufficient to
optimize a query that links each element of two sets. Although I may in the
past have described SAS SQL in such glowing terms that it sounds like magic, it
simply provides a convenient way to express ordinary set logic.
To make it easier to extend your SQL solution, I would suggest that you replace
the .. starting (WHERE=(_TYPE_= ...) with ..(SELECT DISTINCT * FROM
starting)... in-line views to guard against duplicate rows of values that could
play havoc with a Cartesian product. With that minor exception aside, it looks
like model SQL code to me. Sig
______________________________ Thread _________________________________
Subject: Re: Data reshaping/rearranging question: job for SQL?
Author: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK> at Internet-E-Mail
Date: 7/28/98 9:45 PM
In article <6pl911$6qj$1@sunburst.ccs.yorku.ca>, Michael Friendly
<friendly@hotspur.psych.yorku.ca> writes
>I have a data set which looks like this:
>
> _TYPE_ _NAME_ DIM1 DIM2
>
> OBS High 0.18093 0.019248
> OBS Med 0.18500 0.011625
> OBS Low -0.28769 0.061994
> VAR WELL 0.25954 -0.012102
> VAR MODERATE -0.01421 0.069901
> VAR IMPAIRED -0.23739 -0.018897
>
>I need to rearrange so that all pairs of OBS-VARs become the
>observations of the new data set, with the values of DIM1 as X1 and Y1.
>That is, the M + N input observations become M * N output observations,
>with X1, Y1 as the value of DIM1 for each observation
>
> OBS VAR X1 Y1
>
> High WELL 0.18093 0.25954
> High MODERATE 0.18093 -0.01421
> High IMPAIRED 0.18093 -0.23739
> Med WELL 0.18500 0.25954
> Med MODERATE 0.18500 -0.01421
> Med IMPAIRED 0.18500 -0.23739
> Low WELL -0.28769 0.25954
> Low MODERATE -0.28769 -0.01421
> Low IMPAIRED -0.28769 -0.23739
>
>Seems like there's an easier way to do this with SQL than with data
>step manipulations. Can anyone show me how?
>
>--
>Michael Friendly Internet: friendly@hotspur.psych.yorku.ca (NeXTmail OK)
>Psychology Dept
>York University Voice: 416 736-5115 Fax: 416 736-5814
>4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
>Toronto, ONT M3J 1P3 CANADA
As I prefer to avoid cartesian products,
this is a challenge to my rusty sql .... so I thought I'ld push my $.02
create table joint as
select a._name_ as obs
,b._name_ as var
,a.dim1 as x1
,b.dim1 as y1
,a.dim2 as x2
,b.dim2 as y2
from starting( where=( _type_ ='OBS' )) as a
,starting( where=( _type_ ='VAR' )) as b
;
SHOCK...
it worked
+FSVIEW: WORK.JOINT (B)---------------------------------------------+
| OBS OBS VAR X1 Y1 X2 Y2 |
| |
| 1 High WELL 0.18093 0.25954 0.019248 -0.012102 |
| 2 High MODERATE 0.18093 -0.01421 0.019248 0.069901 |
| 3 High IMPAIRED 0.18093 -0.23739 0.019248 -0.018897 |
| 4 Med WELL 0.185 0.25954 0.011625 -0.012102 |
| 5 Med MODERATE 0.185 -0.01421 0.011625 0.069901 |
| 6 Med IMPAIRED 0.185 -0.23739 0.011625 -0.018897 |
| 7 Low WELL -0.28769 0.25954 0.061994 -0.012102 |
| 8 Low MODERATE -0.28769 -0.01421 0.061994 0.069901 |
| 9 Low IMPAIRED -0.28769 -0.23739 0.061994 -0.018897 |
| |
+--------------------------------------------------------------------+
the log...........
40 data starting;
41 input _type_ $ _name_ $ dim1 dim2;
42 /*
43 length _TYPE_ $3 _NAME_ DIM1 DIM2
44 */
45 list;cards;
RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+-
46 OBS High 0.18093 0.019248
47 OBS Med 0.18500 0.011625
48 OBS Low -0.28769 0.061994
49 VAR WELL 0.25954 -0.012102
50 VAR MODERATE -0.01421 0.069901
51 VAR IMPAIRED -0.23739 -0.018897
NOTE: The data set WORK.STARTING has 6 observations and 4 variables.
NOTE: The DATA statement used 0.16 seconds.
52 ;
53 /* target output
54 OBS VAR X1 Y1
55 High WELL 0.18093 0.25954
56 High MODERATE 0.18093 -0.01421
57 High IMPAIRED 0.18093 -0.23739
58 Med WELL 0.18500 0.25954
59 Med MODERATE 0.18500 -0.01421
60 Med IMPAIRED 0.18500 -0.23739
61 Low WELL -0.28769 0.25954
62 Low MODERATE -0.28769 -0.01421
63 Low IMPAIRED -0.28769 -0.23739
64 ;
65
66 Feeling just a bit hesitant here, so this shouldn't be exposed
67 */
68 proc sql;
69 create table joint as
70 select a._name_ as obs
71 ,b._name_ as var
72 ,a.dim1 as x1
73 ,b.dim1 as y1
74 ,a.dim2 as x2
75 ,b.dim2 as y2
76 from starting( where=( _type_ ='OBS' )) as a
77 ,starting( where=( _type_ ='VAR' )) as b
78 ;
NOTE: The execution of this query involves performing one or more
Cartesian product joins that can not be optimized.
NOTE: Table WORK.JOINT created, with 9 rows and 6 columns.
Now I've exposed my sql weakness,
where is the sql specialist who produces simple sql
that can be optimized ?
(I'm not sure a datastep would be much more complex than that sql )
--
Peter Crawford
|