Date: Tue, 7 Feb 2006 13:52:45 -0500
Reply-To: Kateri Heydon <heydon@EMAIL.CHOP.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Kateri Heydon <heydon@EMAIL.CHOP.EDU>
Subject: Re: dividing data into quintiles
Content-Type: text/plain; charset=US-ASCII
SO, when dividing into quintiles, if there are ties, then the quintiles
may be unequal. is this true??
>>> "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
02/07/06 1:46 PM >>>
PROC RANK is designed to do handle this task, and it is part of Base
SAS.
On Tue, 7 Feb 2006 12:16:58 -0600, Jiann-Shiun Huang <Jiann-
Shiun.Huang@AMERUS.COM> wrote:
>Kateri:
>
> With Toby's reminder about possible ties, the following code takes
>ties into consideration. The dataset is first sort in descending
order
>of the YourVar. All ties are assigned the higher quintile. If you
want
>to assign ties to the lower quintile, then sort in ascending order
and
>modify the codes accordingly. Write back if you have questions on
>this.
>
>proc sort data=YourFile out=old;
> by descending YourVar;
>run;
>data New;
> set old nobs=ObsCount;
> array bound(2:5) _temporary_;
> retain bound(2:5);
> select;
> when (_N_ le 0.2*ObsCount)
> do;
> Quintile=5;
> if 0.2*ObsCount le _N_ le 0.2*ObsCount+1 then
>bound(5)=YourVar;
> end;
> when (0.2*ObsCount lt _N_ le 0.4*ObsCount)
> do;
> if YourVar eq bound(5)
> then Quintile=5;
> else Quintile=4;
> if 0.4*ObsCount le _N_ le 0.4*ObsCount+1 then
>bound(4)=YourVar;
> end;
> when (0.4*ObsCount lt _N_ le 0.6*ObsCount)
> do;
> if YourVar eq bound(4)
> then Quintile=4;
> else Quintile=3;
> if 0.6*ObsCount le _N_ le 0.6*ObsCount+1 then
>bound(3)=YourVar;
> end;
> when (0.6*ObsCount lt _N_ le 0.8*ObsCount)
> do;
> if YourVar eq bound(3)
> then Quintile=3;
> else Quintile=2;
> if 0.8*ObsCount le _N_ le 0.8*ObsCount+1 then
>bound(2)=YourVar;
> end;
> when (_N_ le ObsCount)
> do;
> if YourVar eq bound(2)
> then Quintile=2;
> else Quintile=1;
> end;
> end;
>run;
>proc print data=New;
> var YourVar Quintile;
>run;
>
>
>
>J S Huang
>1-515-557-3987
>fax 1-515-557-2422
>
>J S Huang
>1-515-557-3987
>fax 1-515-557-2422
>
>>>> "toby dunn" <tobydunn@hotmail.com> 2/7/2006 11:32:34 AM >>>
>Jiann,
>
>What happens in your code when there is a tie or when you have some
>value
>that spans quantile, you have to shove it in one or the other
quantile
>as
>they cant be split across quantiles.
>
>
>
>Toby Dunn
>
>
>
>>>> "Kateri Heydon" <heydon@email.chop.edu> 2/7/2006 11:57:49 AM >>>
>awesome. this worked. thank you!!
>Kateri
>
>>>> "Jiann-Shiun Huang" <Jiann-Shiun.Huang@amerus.com> 02/07/06 12:27
>PM >>>
> The following code should work. Use ObsCount to count the toal
>observations in YourFile.
>
>data New;
> set YourFile nobs=ObsCount;
> select;
> when (0.8*ObsCount lt _N_ le ObsCount) Quintile=5;
> when (0.6*ObsCount lt _N_ le 0.8*ObsCount) Quintile=4;
> when (0.4*ObsCount lt _N_ le 0.6*ObsCount) Quintile=3;
> when (0.2*ObsCount lt _N_ le 0.4*ObsCount) Quintile=2;
> otherwise Quintile=1;
> end;
>run;
>proc print data=New;
>run;
>
>
>J S Huang
>1-515-557-3987
>fax 1-515-557-2422
>
>>>> "Kateri H. Heydon" <heydon@EMAIL.CHOP.EDU> 2/7/2006 10:57:54 AM
>>>>
>Hi--
>I'm trying to divide my data into quintiles. I have sorted the
>variable of
>interest (a probability score for each observation), now how do I
>create an
>indicator variable 1-5 for each quintile???
>
>Any input would be great!!
|