LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 23 Apr 2010 14:29:31 -0500
Reply-To:   "White, Svend A." <SvendW@HEALTH.OK.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "White, Svend A." <SvendW@HEALTH.OK.GOV>
Subject:   Re: dynamic data suppression
In-Reply-To:   <16FD64291482A34F995D2AF14A5C932C08BE54F1@MAIL002.prod.ds.russell.com>
Content-Type:   TEXT/plain; charset="us-ascii"

Thanks once again, Mark.

This looks great.

Svend

> -----Original Message----- > From: Terjeson, Mark [mailto:Mterjeson@russell.com] > Sent: Friday, April 23, 2010 10:48 AM > To: White, Svend A.; SAS-L@LISTSERV.UGA.EDU > Subject: RE: Re: dynamic data suppression > > Hi Svend, > > You are right the result1 suppressed > the row not just the value. However, > since we have the logic that "flags" > the condition we want, instead of > using the HAVING upon that flag to > chop out rows, then we'll just move > the logic up into the SELECT clause > and use that flag with a CASE statement > which is akin to your datastep IF > statement and use that flag to set the > values to missing where needed. > see result2 > > > data sample; > input id $ num; > cards; > A 10 > B 4 > C 31 > D 2 > E 9 > F 5 > G 12 > H 7 > I 18 > J 23 > K 14 > L 6 > M 15 > ; > run; > > > ***********************************; > * drop rows less than threshold * > * * > * also drop next closest value * > * if only one at threshold level * > ***********************************; > proc sql; > create table result1 as > select * > from sample > having num ge &threshold > and num ne > (select (sum(num lt &threshold) eq 1) > *(select min(num) > from sample > where num ge &threshold) > from sample) > ; > quit; > > > * merely suppress those under threshold ; > proc sql; > create table result2 as > select id, > case when (num ge &threshold > and num ne > (select (sum(num lt &threshold) eq 1) > *(select min(num) > from sample > where num ge &threshold) > from sample)) > then num > else . > end as num > from sample > ; > quit; > > > > Hope this is helpful. > > > Mark Terjeson > Investment Business Intelligence > Investment Management & Research > Russell Investments > 253-439-2367 > > > Russell > Global Leaders in Multi-Manager Investing > > > > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > White, Svend A. > Sent: Friday, April 23, 2010 8:29 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: dynamic data suppression > > Terjeson, Mark wrote: > > > proc sql; > > create table result1 as > > select * > > from sample > > having num ge &threshold > > and num ne > > (select (sum(num lt &threshold) eq 1) > > *(select min(num) > > from sample > > where num ge &threshold) > > from sample) > > ; > > quit; > > I'm working on integrating this into my report and I see that it works > a > little too well! This code populates my new dataset with only the > values > that made the cut. For example, > > > A 10 > > C 6 > > Whereas I'm looking for this: > > > A 10 > > B . > > C 6 > > D . > > I guess I should've specified that I didn't want to omit the "bad" > records entirely from the new dataset, but simply overwrite their > numerical values. > > The code I'm replacing looks something like this: > > > data result1; > > set sample; > > if num le 5 then do; > > number =.; > > pop_estimate = .; > > percent = .; > > crude_cancer_rate = .; > > age_adjusted_rate = .; > > end; > > run; > > > It would be straightforward enough to create an additional table with > the excluded ones items, merge the 2 tables into a 3rd one and then > sort > table 3 back to the original sequence, but is this the best way to do > it? > > Thanks. > > Svend > > > > > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > > Terjeson, Mark > > Sent: Friday, April 16, 2010 12:37 PM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: Re: dynamic data suppression > > > > Hi Svend, > > > > > > > > %let threshold=5; > > > > data sample; > > input id $ num; > > cards; > > A 10 > > B 4 > > C 31 > > D 2 > > E 9 > > F 8 > > G 12 > > H 7 > > I 18 > > J 23 > > K 14 > > L 6 > > M 15 > > ; > > run; > > > > > > ***********************************; > > * drop rows less than threshold * > > * * > > * also drop next closest value * > > * if only one at threshold level * > > ***********************************; > > proc sql; > > create table result1 as > > select * > > from sample > > having num ge &threshold > > and num ne > > (select (sum(num lt &threshold) eq 1) > > *(select min(num) > > from sample > > where num ge &threshold) > > from sample) > > ; > > quit; > > > > > > > > > > > > > > ***************************************; > > * another approach is to drop the * > > * bottom 4 or 5 entities regardless * > > * of the value * > > ***************************************; > > proc rank data=sample out=rankings; > > var num; > > ranks theRank; > > run; > > > > * optional sort ; > > proc sort data=rankings; > > by theRank; > > run; > > > > * if you want to drop the ; > > * bottom four or five ids ; > > data result2; > > set rankings; > > if theRank le &threshold then delete; > > run; > > > > > > The reason it could be four, even when > > the threshold is five is if there are > > duplicate values for position 5 and 6. > > i.e. a tie would be 5.5 > > > > > > > > > > Hope this is helpful. > > > > > > Mark Terjeson > > Investment Business Intelligence > > Investment Management & Research > > Russell Investments > > 253-439-2367 > > > > > > Russell > > Global Leaders in Multi-Manager Investing > > > > > > > > > > > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > > White, Svend A. > > Sent: Friday, April 16, 2010 8:37 AM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: dynamic data suppression > > > > I currently suppress counts within a data step that fall below a > > minimum > > threshold for reasons of privacy, but that occasionally results in > > comical cases where a single number has been suppressed but all the > > user > > has to do is to do a little arithmetic (i.e., subtract the subtotal > for > > data displayed from the total provided) to get the suppressed count. > > > > Currently, if the threshold is 5, I suppress B and D in this one, but > > that breaks down if there is only one number being suppressed. > > > > A 10 > > B 4 * > > C 6 > > D 2 * > > Total 22 > > > > I want to make the data suppression more dynamic based on the number > of > > below-threshold values present. If only one value falls under the > > threshold, I want to suppress another (ideally, the next lowest) > count. > > > > Is there a paper out there that explains this? > > > > Thanks. > > > > Svend


Back to: Top of message | Previous page | Main SAS-L page