LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 16 Sep 2006 13:36:21 -0400
Reply-To:     "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject:      Re: Dynamic Looping Problem in SAS

Indeed, it would appear to be an operations research (OR) problem. Typically that means that one can design a simple algorithm which will work in theory but takes practically forever to run.

Is SAS/OR licensed?

On Fri, 15 Sep 2006 08:06:14 -0700, David Neal <afdbn@UAA.ALASKA.EDU> wrote:

>Sak, > >This gets a lot more complicated than you would think. Here is a >simplified example. If you have 3 points with the distance between #1 >and #2 equal to 1 mile and the distance from #1 to #3 and from #2 to #3 >equal to 11 miles, do you delete #1 or #2 or both? This problem can >become quite complicated and you may end up with several possible >solutions. Since you did indicate that you had moved the >more "desired" points to the front of the list, you might want to add a >descision rule to the program that selects points based on their >order. So, following that rule, you would keep #1 and delete #2. But >what if #4 was 9 miles from #1 and 11 miles from #2 and more than 10 >miles from 3? If you delete #2, you would then end up deleting #4, >resulting in two deletions. However, If you deleted #1 in the >beginning, you would only lose one datapoint. If your original data is >fairly sparse, this might not be an issue but my guess is that you >aren't that lucky and you have to deal with fairly clustered data. > >David Neal > > > >----- Original Message ----- >From: sak071 <samuelkleiner@GMAIL.COM> >Date: Friday, September 15, 2006 7:38 am >Subject: Re: Dynamic Looping Problem in SAS > >> David, >> Thanks for your reply. Yes, I do want a dataset in which each id >> is at >> least 10 miles from any other. In reality, what I have done is first >> sorted the ids so that the first id's in the dataset are the ones that >> I most want to keep, however, the ideal methodology would give me the >> largest set of ids such that no id is less than 10 miles from any >> otherid. Because of my less-than-stellar programmings skills, I'm >> not sure >> how to do this. If you know of a way, any input would be greatly >> appreciated. Thanks! >> >> -sak >> >> >> David Neal wrote: >> > Sak, >> > >> > I think you need to clarify your request a bit. For example, do you >> > want a dataset in which each id is at least 10 miles from any >> other? If >> > so, your example doesn't quite work. Your comment in the code >> seems to >> > indicate that this is your goal but the logic in your would seem to >> > indicate that you are looking for pairs of sequential ids that >> are 10 >> > miles apart. Is that the case? >> > >> > David Neal >> > >> > sak071 wrote: >> > > HI Everyone, >> > > I'm trying to do something in SAS that seems like it should be >> fairly> > straightforward but I simply can't figure out how to >> accomplish what I >> > > need. Below is an entire file which takes a person through >> step by step >> > > how to do what I need for an example 7-observation dataset but >> the real >> > > dataset that I need to do this on is over 20,000 obs. Help! >> > > >> > > Thanks. >> > > >> > > -sak >> > > >> > > /*BEGINNING OF SAS FILE*/ >> > > >> > > /*FIRST, I WILL STATE THAT MY OBJECTIVE IS TO GET THE SET OF >> ID NUMBERS >> > > THAT ARE AT LEAST 10 MILES APART >> > > FROM ANY ID NUMBER. I WILL WALK THROUGH THE PATTERN IN A 7 >> OBSERVATION> > EXAMPLE BUT KEEP IN MIND THAT MY ACTUAL DATA IS >> > > MORE THAN 20,000 OBSERVATIONS. */ >> > > >> > > >> > > /*THIS CONTAINS THE ORIGINAL DATASET*/ >> > > >> > > DATA ORIGINAL; >> > > INPUT id lat long; >> > > CARDS; >> > > 1 5 4 >> > > 2 7 2 >> > > 3 8 6 >> > > 4 20 7 >> > > 5 1 8 >> > > 6 3 1 >> > > 7 1 9 >> > > ; >> > > >> > > /*THE FINAL DATASET NEEDS TO LOOK LIKE THIS*/ >> > > DATA FINAL; >> > > INPUT id lat long; >> > > CARDS; >> > > 1 5 4 >> > > 3 8 6 >> > > 4 20 7 >> > > 5 1 8 >> > > 6 3 1 >> > > ; >> > > >> > > /*THE LOGIC TO GET FROM THE ORIGINAL TO THE SAMPLE IS AS FOLLOWS: >> > > >> > > 1) WE START WITH OBSERVATION 1. USING THE (MADE-UP) FORMULA >> > > DISTANCE=(LAT_J-LAT_I)^2+(LONG_J-LONG_I)^2 >> > > WE GET THE FOLLOWING DATASET CALLED ITERATION1 */ >> > > >> > > DATA ITERATION1; >> > > SET ORIGINAL; >> > > IF ID NE 1 THEN DO; >> > > DISTANCE=(LAT-5)**2 + (LONG-4)**2; /*NOTE THAT THESE >> ARE THE LAT AND >> > > LONG FOR ID #1*/ >> > > END; >> > > RUN; >> > > >> > > /*NOTE THAT SINCE ID #2 IS 8 MILES FROM ID #1, I WANT TO >> DELETE ID #2 >> > > TO COMPLETE THE >> > > ITERATION*/ >> > > >> > > DATA ITERATION1; >> > > SET ITERATION1; >> > > IF ID NE 1 THEN DO; >> > > IF DISTANCE<10 THEN DELETE; >> > > END; >> > > RUN; >> > > >> > > /*ID #1 IS NOW TAKEN CARE OF SO WE TURN OUT ATTENTION TO ID #3 >> USING> > THE DISTANCE FORMULA. >> > > WE GET THE DATASET CALLED ITERATION2*/ >> > > >> > > DATA ITERATION2 (DROP=DISTANCE); >> > > SET ITERATION1; >> > > RUN; >> > > >> > > DATA ITERATION2; >> > > SET ITERATION2; >> > > IF ID NE 1 AND ID NE 3 THEN DO; >> > > DISTANCE=(LAT-8)**2 + (LONG-6)**2; /*NOTE THAT THESE >> ARE THE LAT AND >> > > LONG FOR ID #3*/ >> > > END; >> > > RUN; >> > > >> > > /*AGAIN, WE DELETE ALL ID NUMBERS THAT ARE LESS THAN 10 MILES >> FROM ID >> > > #3 (THERE ARE NONE IN THIS CASE)*/ >> > > DATA ITERATION2; >> > > SET ITERATION2; >> > > IF ID NE 1 AND ID NE 3 THEN DO; >> > > IF DISTANCE<10 THEN DELETE; >> > > END; >> > > RUN; >> > > >> > > >> > > /*ID #3 IS NOW TAKEN CARE OF SO WE TURN OUT ATTENTION TO ID #4 >> USING> > THE DISTANCE FORMULA. >> > > WE GET THE DATASET CALLED ITERATION3*/ >> > > >> > > DATA ITERATION3 (DROP=DISTANCE); >> > > SET ITERATION2; >> > > RUN; >> > > >> > > DATA ITERATION3; >> > > SET ITERATION3; >> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 THEN DO; >> > > DISTANCE=(LAT-20)**2 + (LONG-7)**2; /*NOTE THAT THESE >> ARE THE LAT AND >> > > LONG FOR ID #4*/ >> > > END; >> > > RUN; >> > > >> > > /*AGAIN, WE DELETE ALL ID NUMBERS THAT ARE LESS THAN 10 MILES >> FROM ID >> > > #4 (THERE ARE NONE IN THIS CASE)*/ >> > > DATA ITERATION3; >> > > SET ITERATION3; >> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 THEN DO; >> > > IF DISTANCE<10 THEN DELETE; >> > > END; >> > > RUN; >> > > >> > > /*ID #4 IS NOW TAKEN CARE OF SO WE TURN OUT ATTENTION TO ID #5 >> USING> > THE DISTANCE FORMULA. >> > > WE GET THE DATASET CALLED ITERATION3*/ >> > > >> > > DATA ITERATION4 (DROP=DISTANCE); >> > > SET ITERATION3; >> > > RUN; >> > > >> > > DATA ITERATION4; >> > > SET ITERATION4; >> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 AND ID NE 5 THEN DO; >> > > DISTANCE=(LAT-1)**2 + (LONG-8)**2; /*NOTE THAT THESE >> ARE THE LAT AND >> > > LONG FOR ID #5*/ >> > > END; >> > > RUN; >> > > >> > > /*NOTE THAT SINCE ID #7 IS 1 MILE FROM ID #5, I WANT TO DELETE >> ID #7 TO >> > > COMPLETE THE >> > > ITERATION*/ >> > > DATA ITERATION4; >> > > SET ITERATION4; >> > > IF ID NE 1 AND ID NE 3 AND ID NE 4 AND ID NE 5 THEN DO; >> > > IF DISTANCE<10 THEN DELETE; >> > > END; >> > > RUN; >> > > >> > > /*WE HAVE NOW ARRIVED AT THE FINAL DATASET WHICH IS THE SAME >> AS THAT >> > > INCLUDED ABOVE*/ >> > > >> > > DATA FINAL (DROP=DISTANCE); >> > > SET ITERATION4; >> > > RUN; >> > > >>


Back to: Top of message | Previous page | Main SAS-L page