Date: Sat, 16 Sep 2006 13:36:21 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Dynamic Looping Problem in SAS
Indeed, it would appear to be an operations research (OR) problem. Typically
that means that one can design a simple algorithm which will work in theory
but takes practically forever to run.
Is SAS/OR licensed?
On Fri, 15 Sep 2006 08:06:14 -0700, David Neal <afdbn@UAA.ALASKA.EDU> wrote:
>Sak,
>
>This gets a lot more complicated than you would think. Here is a
>simplified example. If you have 3 points with the distance between #1
>and #2 equal to 1 mile and the distance from #1 to #3 and from #2 to #3
>equal to 11 miles, do you delete #1 or #2 or both? This problem can
>become quite complicated and you may end up with several possible
>solutions. Since you did indicate that you had moved the
>more "desired" points to the front of the list, you might want to add a
>descision rule to the program that selects points based on their
>order. So, following that rule, you would keep #1 and delete #2. But
>what if #4 was 9 miles from #1 and 11 miles from #2 and more than 10
>miles from 3? If you delete #2, you would then end up deleting #4,
>resulting in two deletions. However, If you deleted #1 in the
>beginning, you would only lose one datapoint. If your original data is
>fairly sparse, this might not be an issue but my guess is that you
>aren't that lucky and you have to deal with fairly clustered data.
>
>David Neal
>
>
>
>----- Original Message -----
>From: sak071 <samuelkleiner@GMAIL.COM>
>Date: Friday, September 15, 2006 7:38 am
>Subject: Re: Dynamic Looping Problem in SAS
>
>> David,
>> Thanks for your reply. Yes, I do want a dataset in which each id
>> is at
>> least 10 miles from any other. In reality, what I have done is first
>> sorted the ids so that the first id's in the dataset are the ones that
>> I most want to keep, however, the ideal methodology would give me the
>> largest set of ids such that no id is less than 10 miles from any
>> otherid. Because of my less-than-stellar programmings skills, I'm
>> not sure
>> how to do this. If you know of a way, any input would be greatly
>> appreciated. Thanks!
>>
>> -sak
>>
>>
>> David Neal wrote:
>> > Sak,
>> >
>> > I think you need to clarify your request a bit. For example, do you
>> > want a dataset in which each id is at least 10 miles from any
>> other? If
>> > so, your example doesn't quite work. Your comment in the code
>> seems to
>> > indicate that this is your goal but the logic in your would seem to
>> > indicate that you are looking for pairs of sequential ids that
>> are 10
>> > miles apart. Is that the case?
>> >
>> > David Neal
>> >
>> > sak071 wrote:
>> > > HI Everyone,
>> > > I'm trying to do something in SAS that seems like it should be
>> fairly> > straightforward but I simply can't figure out how to
>> accomplish what I
>> > > need. Below is an entire file which takes a person through
>> step by step
>> > > how to do what I need for an example 7-observation dataset but
>> the real
>> > > dataset that I need to do this on is over 20,000 obs. Help!
>> > >
>> > > Thanks.
>> > >
>> > > -sak
>> > >
>> > > /*BEGINNING OF SAS FILE*/
>> > >
>> > > /*FIRST, I WILL STATE THAT MY OBJECTIVE IS TO GET THE SET OF
>> ID NUMBERS
>> > > THAT ARE AT LEAST 10 MILES APART
>> > > FROM ANY ID NUMBER. I WILL WALK THROUGH THE PATTERN IN A 7
>> OBSERVATION> > EXAMPLE BUT KEEP IN MIND THAT MY ACTUAL DATA IS
>> > > MORE THAN 20,000 OBSERVATIONS. */
>> > >
>> > >
>> > > /*THIS CONTAINS THE ORIGINAL DATASET*/
>> > >
>> > > DATA ORIGINAL;
>> > > INPUT id lat long;
>> > > CARDS;
>> > > 1 5 4
>> > > 2 7 2
>> > > 3 8 6
>> > > 4 20 7
>> > > 5 1 8
>> > > 6 3 1
>> > > 7 1 9
>> > > ;
>> > >
>> > > /*THE FINAL DATASET NEEDS TO LOOK LIKE THIS*/
>> > > DATA FINAL;
>> > > INPUT id lat long;
>> > > CARDS;
>> > > 1 5 4
>> > > 3 8 6
>> > > 4 20 7
>> > > 5 1 8
>> > > 6 3 1
>> > > ;
>> > >
>> > > /*THE LOGIC TO GET FROM THE ORIGINAL TO THE SAMPLE IS AS FOLLOWS:
>> > >
>> > > 1) WE START WITH OBSERVATION 1. USING THE (MADE-UP) FORMULA
>> > > DISTANCE=(LAT_J-LAT_I)^2+(LONG_J-LONG_I)^2
>> > > WE GET THE FOLLOWING DATASET CALLED ITERATION1 */
>> > >
>> > > DATA ITERATION1;
>> > > SET ORIGINAL;
>> > > IF ID NE 1 THEN DO;
>> > > DISTANCE=(LAT-5)**2 + (LONG-4)**2; /*NOTE THAT THESE
>> ARE THE LAT AND
>> > > LONG FOR ID #1*/
>> > > END;
>> > > RUN;
>> > >
>> > > /*NOTE THAT SINCE ID #2 IS 8 MILES FROM ID #1, I WANT TO
>> DELETE ID #2
>> > > TO COMPLETE THE
>> > > ITERATION*/
>> > >
>> > > DATA ITERATION1;
>> > > SET ITERATION1;
>> > > IF ID NE 1 THEN DO;
>> > > IF DISTANCE<10 THEN DELETE;
>> > > END;
>> > > RUN;
>> > >
>> > > /*ID #1 IS NOW TAKEN CARE OF SO WE TURN OUT ATTENTION TO ID #3
>> USING> > THE DISTANCE FORMULA.
>> > > WE GET THE DATASET CALLED ITERATION2*/
>> > >
>> > > DATA ITERATION2 (DROP=DISTANCE);
>> > > SET ITERATION1;
>> > > RUN;
>> > >
>> > > DATA ITERATION2;
>> > > SET ITERATION2;
>> > > IF ID NE 1 AND ID NE 3 THEN DO;
>> > > DISTANCE=(LAT-8)**2 + (LONG-6)**2; /*NOTE THAT THESE
>> ARE THE LAT AND
>> > > LONG FOR ID #3*/
>> > > END;
>> > > RUN;
>> > >
>> > > /*AGAIN, WE DELETE ALL ID NUMBERS THAT ARE LESS THAN 10 MILES
>> FROM ID
>> > > #3 (THERE ARE NONE IN THIS CASE)*/
>> > > DATA ITERATION2;
>> > > SET ITERATION2;
>> > > IF ID NE 1 AND ID NE 3 THEN DO;
>> > > IF DISTANCE<10 THEN DELETE;
>> > > END;
>> > > RUN;
>> > >
>> > >
>> > > /*ID #3 IS NOW TAKEN CARE OF SO WE TURN OUT ATTENTION TO ID #4
>> USING> > THE DISTANCE FORMULA.
>> > > WE GET THE DATASET CALLED ITERATION3*/
>> > >
>> > > DATA ITERATION3 (DROP=DISTANCE);
>> > > SET ITERATION2;
>> > > RUN;
>> > >
>> > > DATA ITERATION3;
>> > > SET ITERATION3;
>> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 THEN DO;
>> > > DISTANCE=(LAT-20)**2 + (LONG-7)**2; /*NOTE THAT THESE
>> ARE THE LAT AND
>> > > LONG FOR ID #4*/
>> > > END;
>> > > RUN;
>> > >
>> > > /*AGAIN, WE DELETE ALL ID NUMBERS THAT ARE LESS THAN 10 MILES
>> FROM ID
>> > > #4 (THERE ARE NONE IN THIS CASE)*/
>> > > DATA ITERATION3;
>> > > SET ITERATION3;
>> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 THEN DO;
>> > > IF DISTANCE<10 THEN DELETE;
>> > > END;
>> > > RUN;
>> > >
>> > > /*ID #4 IS NOW TAKEN CARE OF SO WE TURN OUT ATTENTION TO ID #5
>> USING> > THE DISTANCE FORMULA.
>> > > WE GET THE DATASET CALLED ITERATION3*/
>> > >
>> > > DATA ITERATION4 (DROP=DISTANCE);
>> > > SET ITERATION3;
>> > > RUN;
>> > >
>> > > DATA ITERATION4;
>> > > SET ITERATION4;
>> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 AND ID NE 5 THEN DO;
>> > > DISTANCE=(LAT-1)**2 + (LONG-8)**2; /*NOTE THAT THESE
>> ARE THE LAT AND
>> > > LONG FOR ID #5*/
>> > > END;
>> > > RUN;
>> > >
>> > > /*NOTE THAT SINCE ID #7 IS 1 MILE FROM ID #5, I WANT TO DELETE
>> ID #7 TO
>> > > COMPLETE THE
>> > > ITERATION*/
>> > > DATA ITERATION4;
>> > > SET ITERATION4;
>> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 AND ID NE 5 THEN DO;
>> > > IF DISTANCE<10 THEN DELETE;
>> > > END;
>> > > RUN;
>> > >
>> > > /*WE HAVE NOW ARRIVED AT THE FINAL DATASET WHICH IS THE SAME
>> AS THAT
>> > > INCLUDED ABOVE*/
>> > >
>> > > DATA FINAL (DROP=DISTANCE);
>> > > SET ITERATION4;
>> > > RUN;
>> > >
>>
|