```Date: Tue, 19 Dec 2000 16:22:58 -0500 Reply-To: "Bross, Dean S" Sender: "SAS(r) Discussion" From: "Bross, Dean S" Subject: Re: Optimum SUM fitting Comments: To: Alain Marais Content-Type: text/plain; charset="iso-8859-1" I took several days off, and haven't run SAS during this entire time, so that led me to work through the problem you brought up. I suggest you approach this problem by first building a file with 470 records containing the final answer to the combination problem for each value of SUM 1-470. Then combine this file with the big file of 1 million sum records, which you can do by using SQL or by sorting and merging. To create the answer file, I have started by reading in the 60 terms, and giving each one a serial number. I'm assuming you can't use a term over again in a combination unless there is another term with the same value. 1 data a ; 2 input t @@ ; 3 s=_n_ ; 4 cards ; NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.A has 60 observations and 2 variables. NOTE: DATA statement used: real time 1.42 seconds 11 proc sort data=a ; by t s ; 12 run ; NOTE: There were 60 observations read from the dataset WORK.A. NOTE: The data set WORK.A has 60 observations and 2 variables. NOTE: PROCEDURE SORT used: real time 0.27 seconds Then create a file of all SUMS you can reach using only one term. You can select arbitrarily when there are multiple terms with the same value. Also keep track of the term number that has been used. 13 data a1 ; set a ; by t ; 14 if first.t ; 15 p1=t ; 16 fsum=t ; 17 u1=s ; 18 keep p1 fsum u1 ; NOTE: There were 60 observations read from the dataset WORK.A. NOTE: The data set WORK.A1 has 44 observations and 3 variables. NOTE: DATA statement used: real time 0.17 seconds In this case there are 44 unique values of FSUM that can be contructed using combinations of just one term. Now construct all FSUM values that can be reached using combinations of 2 terms, but not with only a single term. You can generate this list in SQL, and assume that the second term selected is less than or equal to the first term and arbitrarily select one combination for any FSUM value if there are more than one possible combinations. 19 proc sql ; 20 create table x as 21 select p1,t as p2,p1+t as fsum,u1,s as u2 22 from a1,a 23 where p1 ge t and u1 ne s and 24 p1+t not in (select fsum from a1) 25 order by fsum ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.X created, with 913 rows and 5 columns. NOTE: PROCEDURE SQL used: real time 1.70 seconds 26 data a2 ; set x ; by fsum ; 27 if first.fsum ; 28 run ; NOTE: There were 913 observations read from the dataset WORK.X. NOTE: The data set WORK.A2 has 107 observations and 5 variables. NOTE: DATA statement used: real time 0.11 seconds There were 107 unique values that could only be reached by 2 way combinations. Now form all combinations of 3 terms that reach a new FSUM value that could not be reached by fewer terms. You can do this by starting with all 2 term combinations with a cartesian product with all single terms. You can also assume that the third term is less than or equal to the second term, and you have to check that you haven't used a term twice in a 3 way combination. Again, select arbitrarily if there is more than one 3 way combination that can reach a new FSUM value. 29 proc sql ; 30 create table x as 31 select p1,p2,t as p3,p1+p2+t as fsum,u1,u2,s as u3 32 from a2,a 33 where p2 ge t and u1 ne s and u2 ne s and 34 p1+p2+t not in 35 (select fsum from a1 union select fsum from a2) 36 order by fsum ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.X created, with 1160 rows and 7 columns. NOTE: PROCEDURE SQL used: real time 0.76 seconds 37 data a3 ; set x ; by fsum ; 38 if first.fsum ; 39 run ; NOTE: There were 1160 observations read from the dataset WORK.X. NOTE: The data set WORK.A3 has 88 observations and 7 variables. NOTE: DATA statement used: real time 0.05 seconds There were 88 unique FSUM values that need a 3 way combination. Now keep doing this process to form all 4 way combinations by crossing the 3 way combinations with the list of terms, then 5 way combinations can be built from 4 way combinations and the list of terms then 6 way combinations building on the 5 way combinations. At each step, only new values of FSUM can be added, and check that you have not used a term more than once. Also select arbitrarily if there is more than one way to reach a new FSUM. 40 proc sql ; 41 create table x as 42 select p1,p2,p3,t as p4,p1+p2+p3+t as fsum,u1,u2,u3,s as u4 43 from a3,a 44 where p3 ge t and u1 ne s and u2 ne s and u3 ne s and 45 p1+p2+p3+t not in 46 (select fsum from a1 union select fsum from a2 union 47 select fsum from a3) 48 order by fsum ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.X created, with 1062 rows and 9 columns. NOTE: PROCEDURE SQL used: real time 0.86 seconds 49 data a4 ; set x ; by fsum ; 50 if first.fsum ; 51 run ; NOTE: There were 1062 observations read from the dataset WORK.X. NOTE: The data set WORK.A4 has 78 observations and 9 variables. NOTE: DATA statement used: real time 0.11 seconds 52 proc sql ; 53 create table x as 54 select p1,p2,p3,p4,t as p5,p1+p2+p3+p4+t as fsum,u1,u2,u3,u4,s as u5 55 from a4,a 56 where p4 ge t and u1 ne s and u2 ne s and u3 ne s and u4 ne s and 57 p1+p2+p3+p4+t not in 58 (select fsum from a1 union select fsum from a2 union 59 select fsum from a3 union select fsum from a4) 60 order by fsum ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.X created, with 988 rows and 11 columns. NOTE: PROCEDURE SQL used: real time 0.87 seconds 61 data a5 ; set x ; by fsum ; 62 if first.fsum ; 63 run ; NOTE: There were 988 observations read from the dataset WORK.X. NOTE: The data set WORK.A5 has 71 observations and 11 variables. NOTE: DATA statement used: real time 0.11 seconds 64 proc sql ; 65 create table x as 66 select p1,p2,p3,p4,p5,t as p6,p1+p2+p3+p4+p5+t as fsum, 67 u1,u2,u3,u4,u5,s as u6 68 from a5,a 69 where p5 ge t and u1 ne s and u2 ne s and u3 ne s and u4 ne s and 70 u5 ne s and p1+p2+p3+p4+p5+t not in 71 (select fsum from a1 union select fsum from a2 union 72 select fsum from a3 union select fsum from a4 union 73 select fsum from a5) 74 order by fsum ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.X created, with 918 rows and 13 columns. NOTE: PROCEDURE SQL used: real time 0.88 seconds 75 data a6 ; set x ; by fsum ; 76 if first.fsum ; 77 run ; NOTE: There were 918 observations read from the dataset WORK.X. NOTE: The data set WORK.A6 has 65 observations and 13 variables. NOTE: DATA statement used: real time 0.10 seconds So there were 78 71 and 65 new FSUM values that require 4 way 5 way and 6 way combinations. Now put them together and see what you get. 78 data aa ; set a1 a2 a3 a4 a5 a6 ; by fsum ; 79 sum=fsum ; 80 keep p1-p6 u1-u6 fsum sum ; 81 run ; NOTE: There were 44 observations read from the dataset WORK.A1. NOTE: There were 107 observations read from the dataset WORK.A2. NOTE: There were 88 observations read from the dataset WORK.A3. NOTE: There were 78 observations read from the dataset WORK.A4. NOTE: There were 71 observations read from the dataset WORK.A5. NOTE: There were 65 observations read from the dataset WORK.A6. NOTE: The data set WORK.AA has 453 observations and 14 variables. NOTE: DATA statement used: real time 0.26 seconds 82 proc print data=aa ; 83 var fsum p1-p6 u1-u6 ; 84 run ; NOTE: There were 453 observations read from the dataset WORK.AA. NOTE: PROCEDURE PRINT used: real time 0.75 seconds Altogether, 453 values of FSUM can be reached with up to 6 way combinations of these terms. Now to fill in the missing FSUM numbers, I used a merge with a data set containing only the numbers from 1-470. 85 data x ; 86 do sum=1 to 470 ; 87 output ; 88 end ; 89 run ; NOTE: The data set WORK.X has 470 observations and 1 variables. NOTE: DATA statement used: real time 0.04 seconds 90 data final ; merge x aa (in=yy) ; by sum ; 91 retain a1-a6 b1-b6 f ; 92 if yy then do ; 93 a1=p1 ; a2=p2 ; a3=p3 ; a4=p4 ; a5=p5 ; a6=p6 ; 94 b1=u1 ; b2=u2 ; b3=u3 ; b4=u4 ; b5=u5 ; b6=u6 ; 95 f=fsum ; 96 fsdiff=sum-fsum ; 97 end ; 98 else do ; 99 p1=a1 ; p2=a2 ; p3=a3 ; p4=a4 ; p5=a5 ; p6=a6 ; 100 u1=b1 ; u2=b2 ; u3=b3 ; u4=b4 ; u5=b5 ; u6=b6 ; 101 fsum=f ; 102 fsdiff=sum-fsum ; 103 end ; 104 drop a1-a6 b1-b6 f ; 105 run ; NOTE: There were 470 observations read from the dataset WORK.X. NOTE: There were 453 observations read from the dataset WORK.AA. NOTE: The data set WORK.FINAL has 470 observations and 15 variables. NOTE: DATA statement used: real time 0.17 seconds It is this FINAL data set that you can use to put the combinations together with the big data set. Dean Bross Department of Veterans Affairs dean.bross@mail.va.gov -----Original Message----- From: Alain Marais [mailto:alainmarais@HOTMAIL.COM] Sent: Sunday, December 17, 2000 7:30 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Optimum SUM fitting Dear sas-l, I'm new to the group so would you please bear with me. I'll really appreciate an advice on one optimization problem. I have a small list of 60 integer "terms": 20 08 06 19 60 16 01 02 17 25 02 20 06 06 14 05 15 10 13 67 22 18 17 30 12 29 31 32 21 43 18 27 77 07 37 42 41 27 03 29 22 09 35 22 51 71 97 64 88 03 04 11 18 20 12 23 15 13 26 53 And I also have a big SAS data set SUMS (1 million observations). It has 1 numeric variable SUM that can vary from 1 to 470. For each SUM on this file I have to find up to 6 terms f1-f6 that add up to SUM. The optimization rule is: the fewer terms are used to form the SUM, the better. If not all 6 terms are used the rest of them shall remain missing. If SUM can't be fit even by all 6 terms the term combination adding up to largest possible FSUM < SUM shall be used and the difference put in a variable FSDIFF. For example, if SUM on file had values 7, 34, 308, 445, 467 the output could be like this: f1 f2 f3 f4 f5 f6 fsum sum fsdiff -------------------------------------------------------------- 7 . . . . . 7 7 0 32 2 . . . . 34 34 0 97 88 77 43 3 . 308 308 0 97 88 77 71 60 51 444 445 1 97 88 77 71 67 64 464 467 3 (I've done them "by hand" so am not sure all terms are optimal.) I've thought to create a data set with all (6,60) combinations and index them for lookup but quickly understood it was not feasible because of their big number. Has anyone done anything like it? Does anybody knwo an algorithm with which it could be done in reasonable machine time? Thank you very much indeed. Alain Marais _________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com. ```

Back to: Top of message | Previous page | Main SAS-L page