LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 3 Sep 2003 13:16:28 -0400
Reply-To:     Glenn Heagerty <gheagerty@ADELPHIA.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Glenn Heagerty <gheagerty@ADELPHIA.NET>
Organization: Consumer Solutions, Inc.
Subject:      Re: An efficiency question
In-Reply-To:  <BAY1-F117AagmQ5WTaY00014850@hotmail.com>
Content-Type: text/plain; charset=us-ascii; format=flowed

Hi all,

I really should be working on a project, but since this came across last night I've been hooked.

I generated a test data set as follows:

options compress=no;

data test(drop=i j); array vars{30} $4 var1-var30; do i=1 to 3000000; do j=1 to 30; vars{j} = put(min(10000*round(ranuni(13), 0.0001), 9999), z4.); end; output; end; run;

I don't know how many unique subcategory codes Chris has, but the test data set should suffice. The bummer is that despite all the different ways of setting the flags (double loops, using OR, using index, using indexw, using direct addressing, making assumptions about the categories, formats, etc.) nothing beat the original code's processing time. Is this a feature of the test data set? I don't know. Maybe using call peek would help, but I'm not familiar with this function.

Now based on additional information provided by Chris and everyone's suggestions I think we can quantitify the problem as follows:

1) Chris has 30 input variables containing subcategory codes 2) Chris needs to set over 2500 indicator variables 3) Chris has lots of subcategory codes - at least 2,500 maybe 4) Subcategory codes can belong to one or more categories

It seems like there is no other choice but to check each variable against the list of subcategory codes for a given category and subcategory, any additional statements added to check to see if the indicator variable has already been set just adds to the time, and that it comes down to defining at least one array for the input variables. So the task is to enter the subcategory code lists most efficiently and use within a category (pick one):

* if-then-else statements * select statements * formats

As Paul Choate suggested, if the frequency of subcategory codes is skewed towards a few codes or categories have a dependence on other categories, than that would drive the order of the lists.

Well, thanks for following along. Hopefully, it made some sense.

Glenn

Christopher Sadler wrote: > Many thanks to everyone for the responses, but I made an embarrassing > mistake in the example. That second line should have been 'else if' as > I am > already using else statements. This, of course, does improve the speed. > However, I did a terrible job of providing an example of the true nature of > why I'm looking for more ways to improve efficiency. The volume of > comparisions is huge, as there are over 50 categories (more to come) and > all > their respective subcategories that need to be evaluated (again, this is to > be run on datasets of at least 3 million records). Some categories > overlap > and some will end up being blank, having no value for any of the flag > variables (so I think adding an 'until' clause as Paul suggested is not > really an option). I was hoping there might be some means to eliminate > the > loop and evaluate the entire array against an 'in' at the same time (?) > > Again, many thanks, > Chris > > A better example is this: > > data new; > set old; > array _vars_(30) var1-var30; > > do i = 1 to 30; > if _vars_(i) in ('3244','5664','6756') then category_a_1=1; > else if _vars_(i) in ('6546','4433') then category_a_2=1; > continue with about 50 more 'else if' statements for category a's > subcategories......... > > if _vars_(i) in ('3423','5533,'5543') then category_b_1=1 > else if _vars_(i) in ('5344','3244','6443') then category_b_2=1; > continue with about 50 more 'else if' statements for category b's > subcategories..... > > continue with about 50 more categories....... > end; > > run; > > > > >> From: "Paul Dorfman" <paul_dorfman@hotmail.com> >> Reply-To: sashole@bellsouth.net >> To: cjsadler@HOTMAIL.COM, SAS-L@LISTSERV.UGA.EDU >> Subject: Re: An efficiency question >> Date: Wed, 03 Sep 2003 06:16:08 +0000 >> >> Chris, >> >> You are searching sequentially but not stopping short when the >> condition is >> satisfied and keep going all the way to i=30. This should improve the >> loop >> performance about twice *on the average*: >> >> do i = 1 to 30 until ( flag1 and flag2 ) ; >> if _vars_(i) in ('3244','5664','6756') then flag1=1; >> if _vars_(i) in ('6546','4433') then flag2=1; >> end; >> >> Kind regards, >> ================= >> Paul M. Dorfman >> Jacksonville, FL >> ================= >> > > _________________________________________________________________ > MSN 8: Get 6 months for $9.95/month http://join.msn.com/?page=dept/dialup >


Back to: Top of message | Previous page | Main SAS-L page