| Date: | Fri, 23 Apr 2010 14:29:31 -0500 |
| Reply-To: | "White, Svend A." <SvendW@HEALTH.OK.GOV> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "White, Svend A." <SvendW@HEALTH.OK.GOV> |
| Subject: | Re: dynamic data suppression |
| In-Reply-To: | <16FD64291482A34F995D2AF14A5C932C08BE54F1@MAIL002.prod.ds.russell.com> |
| Content-Type: | TEXT/plain; charset="us-ascii" |
Thanks once again, Mark.
This looks great.
Svend
> -----Original Message-----
> From: Terjeson, Mark [mailto:Mterjeson@russell.com]
> Sent: Friday, April 23, 2010 10:48 AM
> To: White, Svend A.; SAS-L@LISTSERV.UGA.EDU
> Subject: RE: Re: dynamic data suppression
>
> Hi Svend,
>
> You are right the result1 suppressed
> the row not just the value. However,
> since we have the logic that "flags"
> the condition we want, instead of
> using the HAVING upon that flag to
> chop out rows, then we'll just move
> the logic up into the SELECT clause
> and use that flag with a CASE statement
> which is akin to your datastep IF
> statement and use that flag to set the
> values to missing where needed.
> see result2
>
>
> data sample;
> input id $ num;
> cards;
> A 10
> B 4
> C 31
> D 2
> E 9
> F 5
> G 12
> H 7
> I 18
> J 23
> K 14
> L 6
> M 15
> ;
> run;
>
>
> ***********************************;
> * drop rows less than threshold *
> * *
> * also drop next closest value *
> * if only one at threshold level *
> ***********************************;
> proc sql;
> create table result1 as
> select *
> from sample
> having num ge &threshold
> and num ne
> (select (sum(num lt &threshold) eq 1)
> *(select min(num)
> from sample
> where num ge &threshold)
> from sample)
> ;
> quit;
>
>
> * merely suppress those under threshold ;
> proc sql;
> create table result2 as
> select id,
> case when (num ge &threshold
> and num ne
> (select (sum(num lt &threshold) eq 1)
> *(select min(num)
> from sample
> where num ge &threshold)
> from sample))
> then num
> else .
> end as num
> from sample
> ;
> quit;
>
>
>
> Hope this is helpful.
>
>
> Mark Terjeson
> Investment Business Intelligence
> Investment Management & Research
> Russell Investments
> 253-439-2367
>
>
> Russell
> Global Leaders in Multi-Manager Investing
>
>
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> White, Svend A.
> Sent: Friday, April 23, 2010 8:29 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: dynamic data suppression
>
> Terjeson, Mark wrote:
>
> > proc sql;
> > create table result1 as
> > select *
> > from sample
> > having num ge &threshold
> > and num ne
> > (select (sum(num lt &threshold) eq 1)
> > *(select min(num)
> > from sample
> > where num ge &threshold)
> > from sample)
> > ;
> > quit;
>
> I'm working on integrating this into my report and I see that it works
> a
> little too well! This code populates my new dataset with only the
> values
> that made the cut. For example,
>
> > A 10
> > C 6
>
> Whereas I'm looking for this:
>
> > A 10
> > B .
> > C 6
> > D .
>
> I guess I should've specified that I didn't want to omit the "bad"
> records entirely from the new dataset, but simply overwrite their
> numerical values.
>
> The code I'm replacing looks something like this:
>
> > data result1;
> > set sample;
> > if num le 5 then do;
> > number =.;
> > pop_estimate = .;
> > percent = .;
> > crude_cancer_rate = .;
> > age_adjusted_rate = .;
> > end;
> > run;
>
>
> It would be straightforward enough to create an additional table with
> the excluded ones items, merge the 2 tables into a 3rd one and then
> sort
> table 3 back to the original sequence, but is this the best way to do
> it?
>
> Thanks.
>
> Svend
>
>
>
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > Terjeson, Mark
> > Sent: Friday, April 16, 2010 12:37 PM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Re: dynamic data suppression
> >
> > Hi Svend,
> >
> >
> >
> > %let threshold=5;
> >
> > data sample;
> > input id $ num;
> > cards;
> > A 10
> > B 4
> > C 31
> > D 2
> > E 9
> > F 8
> > G 12
> > H 7
> > I 18
> > J 23
> > K 14
> > L 6
> > M 15
> > ;
> > run;
> >
> >
> > ***********************************;
> > * drop rows less than threshold *
> > * *
> > * also drop next closest value *
> > * if only one at threshold level *
> > ***********************************;
> > proc sql;
> > create table result1 as
> > select *
> > from sample
> > having num ge &threshold
> > and num ne
> > (select (sum(num lt &threshold) eq 1)
> > *(select min(num)
> > from sample
> > where num ge &threshold)
> > from sample)
> > ;
> > quit;
> >
> >
> >
> >
> >
> >
> > ***************************************;
> > * another approach is to drop the *
> > * bottom 4 or 5 entities regardless *
> > * of the value *
> > ***************************************;
> > proc rank data=sample out=rankings;
> > var num;
> > ranks theRank;
> > run;
> >
> > * optional sort ;
> > proc sort data=rankings;
> > by theRank;
> > run;
> >
> > * if you want to drop the ;
> > * bottom four or five ids ;
> > data result2;
> > set rankings;
> > if theRank le &threshold then delete;
> > run;
> >
> >
> > The reason it could be four, even when
> > the threshold is five is if there are
> > duplicate values for position 5 and 6.
> > i.e. a tie would be 5.5
> >
> >
> >
> >
> > Hope this is helpful.
> >
> >
> > Mark Terjeson
> > Investment Business Intelligence
> > Investment Management & Research
> > Russell Investments
> > 253-439-2367
> >
> >
> > Russell
> > Global Leaders in Multi-Manager Investing
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > White, Svend A.
> > Sent: Friday, April 16, 2010 8:37 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: dynamic data suppression
> >
> > I currently suppress counts within a data step that fall below a
> > minimum
> > threshold for reasons of privacy, but that occasionally results in
> > comical cases where a single number has been suppressed but all the
> > user
> > has to do is to do a little arithmetic (i.e., subtract the subtotal
> for
> > data displayed from the total provided) to get the suppressed count.
> >
> > Currently, if the threshold is 5, I suppress B and D in this one,
but
> > that breaks down if there is only one number being suppressed.
> >
> > A 10
> > B 4 *
> > C 6
> > D 2 *
> > Total 22
> >
> > I want to make the data suppression more dynamic based on the number
> of
> > below-threshold values present. If only one value falls under the
> > threshold, I want to suppress another (ideally, the next lowest)
> count.
> >
> > Is there a paper out there that explains this?
> >
> > Thanks.
> >
> > Svend
|