Date: Tue, 19 Dec 2000 16:22:58 -0500
Reply-To: "Bross, Dean S" <dean.bross@MAIL.VA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Bross, Dean S" <dean.bross@MAIL.VA.GOV>
Subject: Re: Optimum SUM fitting
Content-Type: text/plain; charset="iso-8859-1"
I took several days off, and haven't run SAS during this
entire time, so that led me to work through the problem
you brought up.
I suggest you approach this problem by first building a
file with 470 records containing the final answer to
the combination problem for each value of SUM 1-470.
Then combine this file with the big file of 1 million
sum records, which you can do by using SQL or by sorting
and merging.
To create the answer file, I have started by reading in the
60 terms, and giving each one a serial number. I'm assuming
you can't use a term over again in a combination unless
there is another term with the same value.
1 data a ;
2 input t @@ ;
3 s=_n_ ;
4 cards ;
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set WORK.A has 60 observations and 2 variables.
NOTE: DATA statement used:
real time 1.42 seconds
11 proc sort data=a ; by t s ;
12 run ;
NOTE: There were 60 observations read from the dataset WORK.A.
NOTE: The data set WORK.A has 60 observations and 2 variables.
NOTE: PROCEDURE SORT used:
real time 0.27 seconds
Then create a file of all SUMS you can reach using only one
term. You can select arbitrarily when there are multiple terms
with the same value. Also keep track of the term number that
has been used.
13 data a1 ; set a ; by t ;
14 if first.t ;
15 p1=t ;
16 fsum=t ;
17 u1=s ;
18 keep p1 fsum u1 ;
NOTE: There were 60 observations read from the dataset WORK.A.
NOTE: The data set WORK.A1 has 44 observations and 3 variables.
NOTE: DATA statement used:
real time 0.17 seconds
In this case there are 44 unique values of FSUM that can be
contructed using combinations of just one term. Now construct all
FSUM values that can be reached using combinations of 2 terms, but
not with only a single term. You can generate this list in SQL,
and assume that the second term selected is less than or equal to
the first term and arbitrarily select one combination for any
FSUM value if there are more than one possible combinations.
19 proc sql ;
20 create table x as
21 select p1,t as p2,p1+t as fsum,u1,s as u2
22 from a1,a
23 where p1 ge t and u1 ne s and
24 p1+t not in (select fsum from a1)
25 order by fsum ;
NOTE: The execution of this query involves performing one or more Cartesian
product joins that
can not be optimized.
NOTE: Table WORK.X created, with 913 rows and 5 columns.
NOTE: PROCEDURE SQL used:
real time 1.70 seconds
26 data a2 ; set x ; by fsum ;
27 if first.fsum ;
28 run ;
NOTE: There were 913 observations read from the dataset WORK.X.
NOTE: The data set WORK.A2 has 107 observations and 5 variables.
NOTE: DATA statement used:
real time 0.11 seconds
There were 107 unique values that could only be reached by 2
way combinations.
Now form all combinations of 3 terms that reach a new FSUM value
that could not be reached by fewer terms. You can do this by
starting with all 2 term combinations with a cartesian product
with all single terms. You can also assume that the third term
is less than or equal to the second term, and you have to
check that you haven't used a term twice in a 3 way combination.
Again, select arbitrarily if there is more than one 3 way
combination that can reach a new FSUM value.
29 proc sql ;
30 create table x as
31 select p1,p2,t as p3,p1+p2+t as fsum,u1,u2,s as u3
32 from a2,a
33 where p2 ge t and u1 ne s and u2 ne s and
34 p1+p2+t not in
35 (select fsum from a1 union select fsum from a2)
36 order by fsum ;
NOTE: The execution of this query involves performing one or more Cartesian
product joins that
can not be optimized.
NOTE: Table WORK.X created, with 1160 rows and 7 columns.
NOTE: PROCEDURE SQL used:
real time 0.76 seconds
37 data a3 ; set x ; by fsum ;
38 if first.fsum ;
39 run ;
NOTE: There were 1160 observations read from the dataset WORK.X.
NOTE: The data set WORK.A3 has 88 observations and 7 variables.
NOTE: DATA statement used:
real time 0.05 seconds
There were 88 unique FSUM values that need a 3 way combination.
Now keep doing this process to form all 4 way combinations by
crossing the 3 way combinations with the list of terms, then
5 way combinations can be built from 4 way combinations and the
list of terms then 6 way combinations building on the 5 way
combinations. At each step, only new values of FSUM can be added,
and check that you have not used a term more than once. Also
select arbitrarily if there is more than one way to reach a new FSUM.
40 proc sql ;
41 create table x as
42 select p1,p2,p3,t as p4,p1+p2+p3+t as fsum,u1,u2,u3,s as u4
43 from a3,a
44 where p3 ge t and u1 ne s and u2 ne s and u3 ne s and
45 p1+p2+p3+t not in
46 (select fsum from a1 union select fsum from a2 union
47 select fsum from a3)
48 order by fsum ;
NOTE: The execution of this query involves performing one or more Cartesian
product joins that
can not be optimized.
NOTE: Table WORK.X created, with 1062 rows and 9 columns.
NOTE: PROCEDURE SQL used:
real time 0.86 seconds
49 data a4 ; set x ; by fsum ;
50 if first.fsum ;
51 run ;
NOTE: There were 1062 observations read from the dataset WORK.X.
NOTE: The data set WORK.A4 has 78 observations and 9 variables.
NOTE: DATA statement used:
real time 0.11 seconds
52 proc sql ;
53 create table x as
54 select p1,p2,p3,p4,t as p5,p1+p2+p3+p4+t as fsum,u1,u2,u3,u4,s as u5
55 from a4,a
56 where p4 ge t and u1 ne s and u2 ne s and u3 ne s and u4 ne s and
57 p1+p2+p3+p4+t not in
58 (select fsum from a1 union select fsum from a2 union
59 select fsum from a3 union select fsum from a4)
60 order by fsum ;
NOTE: The execution of this query involves performing one or more Cartesian
product joins that
can not be optimized.
NOTE: Table WORK.X created, with 988 rows and 11 columns.
NOTE: PROCEDURE SQL used:
real time 0.87 seconds
61 data a5 ; set x ; by fsum ;
62 if first.fsum ;
63 run ;
NOTE: There were 988 observations read from the dataset WORK.X.
NOTE: The data set WORK.A5 has 71 observations and 11 variables.
NOTE: DATA statement used:
real time 0.11 seconds
64 proc sql ;
65 create table x as
66 select p1,p2,p3,p4,p5,t as p6,p1+p2+p3+p4+p5+t as fsum,
67 u1,u2,u3,u4,u5,s as u6
68 from a5,a
69 where p5 ge t and u1 ne s and u2 ne s and u3 ne s and u4 ne s and
70 u5 ne s and p1+p2+p3+p4+p5+t not in
71 (select fsum from a1 union select fsum from a2 union
72 select fsum from a3 union select fsum from a4 union
73 select fsum from a5)
74 order by fsum ;
NOTE: The execution of this query involves performing one or more Cartesian
product joins that
can not be optimized.
NOTE: Table WORK.X created, with 918 rows and 13 columns.
NOTE: PROCEDURE SQL used:
real time 0.88 seconds
75 data a6 ; set x ; by fsum ;
76 if first.fsum ;
77 run ;
NOTE: There were 918 observations read from the dataset WORK.X.
NOTE: The data set WORK.A6 has 65 observations and 13 variables.
NOTE: DATA statement used:
real time 0.10 seconds
So there were 78 71 and 65 new FSUM values that require 4 way
5 way and 6 way combinations.
Now put them together and see what you get.
78 data aa ; set a1 a2 a3 a4 a5 a6 ; by fsum ;
79 sum=fsum ;
80 keep p1-p6 u1-u6 fsum sum ;
81 run ;
NOTE: There were 44 observations read from the dataset WORK.A1.
NOTE: There were 107 observations read from the dataset WORK.A2.
NOTE: There were 88 observations read from the dataset WORK.A3.
NOTE: There were 78 observations read from the dataset WORK.A4.
NOTE: There were 71 observations read from the dataset WORK.A5.
NOTE: There were 65 observations read from the dataset WORK.A6.
NOTE: The data set WORK.AA has 453 observations and 14 variables.
NOTE: DATA statement used:
real time 0.26 seconds
82 proc print data=aa ;
83 var fsum p1-p6 u1-u6 ;
84 run ;
NOTE: There were 453 observations read from the dataset WORK.AA.
NOTE: PROCEDURE PRINT used:
real time 0.75 seconds
Altogether, 453 values of FSUM can be reached with up to 6 way
combinations of these terms. Now to fill in the missing FSUM
numbers, I used a merge with a data set containing only the
numbers from 1-470.
85 data x ;
86 do sum=1 to 470 ;
87 output ;
88 end ;
89 run ;
NOTE: The data set WORK.X has 470 observations and 1 variables.
NOTE: DATA statement used:
real time 0.04 seconds
90 data final ; merge x aa (in=yy) ; by sum ;
91 retain a1-a6 b1-b6 f ;
92 if yy then do ;
93 a1=p1 ; a2=p2 ; a3=p3 ; a4=p4 ; a5=p5 ; a6=p6 ;
94 b1=u1 ; b2=u2 ; b3=u3 ; b4=u4 ; b5=u5 ; b6=u6 ;
95 f=fsum ;
96 fsdiff=sum-fsum ;
97 end ;
98 else do ;
99 p1=a1 ; p2=a2 ; p3=a3 ; p4=a4 ; p5=a5 ; p6=a6 ;
100 u1=b1 ; u2=b2 ; u3=b3 ; u4=b4 ; u5=b5 ; u6=b6 ;
101 fsum=f ;
102 fsdiff=sum-fsum ;
103 end ;
104 drop a1-a6 b1-b6 f ;
105 run ;
NOTE: There were 470 observations read from the dataset WORK.X.
NOTE: There were 453 observations read from the dataset WORK.AA.
NOTE: The data set WORK.FINAL has 470 observations and 15 variables.
NOTE: DATA statement used:
real time 0.17 seconds
It is this FINAL data set that you can use to put the combinations
together with the big data set.
Dean Bross
Department of Veterans Affairs
dean.bross@mail.va.gov
-----Original Message-----
From: Alain Marais [mailto:alainmarais@HOTMAIL.COM]
Sent: Sunday, December 17, 2000 7:30 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Optimum SUM fitting
Dear sas-l,
I'm new to the group so would you please bear with me. I'll really
appreciate an advice on one optimization problem. I have a small list of 60
integer "terms":
20 08 06 19 60 16 01 02 17 25
02 20 06 06 14 05 15 10 13 67
22 18 17 30 12 29 31 32 21 43
18 27 77 07 37 42 41 27 03 29
22 09 35 22 51 71 97 64 88 03
04 11 18 20 12 23 15 13 26 53
And I also have a big SAS data set SUMS (1 million observations). It has 1
numeric variable SUM that can vary from 1 to 470. For each SUM on this file
I have to find up to 6 terms f1-f6 that add up to SUM. The optimization rule
is: the fewer terms are used to form the SUM, the better. If not all 6 terms
are used the rest of them shall remain missing. If SUM can't be fit even by
all 6 terms the term combination adding up to largest possible FSUM < SUM
shall be used and the difference put in a variable FSDIFF. For example, if
SUM on file had values 7, 34, 308, 445, 467 the output could be like this:
f1 f2 f3 f4 f5 f6 fsum sum fsdiff
--------------------------------------------------------------
7 . . . . . 7 7 0
32 2 . . . . 34 34 0
97 88 77 43 3 . 308 308 0
97 88 77 71 60 51 444 445 1
97 88 77 71 67 64 464 467 3
(I've done them "by hand" so am not sure all terms are optimal.) I've
thought to create a data set with all (6,60) combinations and index them for
lookup but quickly understood it was not feasible because of their big
number. Has anyone done anything like it? Does anybody knwo an algorithm
with which it could be done in reasonable machine time?
Thank you very much indeed.
Alain Marais
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.