LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 22 Aug 2007 16:15:42 -0500
Reply-To:     Mary <mlhoward@avalon.net>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mary <mlhoward@AVALON.NET>
Subject:      Re: SAS data step compute COUNTs and Percentages
Comments: To: Tom White <tw2@MAIL.COM>
Content-Type: text/plain; charset="iso-8859-1"

Hi Tom,

I've played around with some code that would get your data in the format that you want it- before percentages. Here it is below.

Mary Howard

Research Assistant III

Dept. of Ophthalmology

Univ. of Iowa Hospitals and Clinics

data set1; informat diag $10.; input id diag $; cards; 1 heart 1 heart 1 diabetes 2 eye 3 hip 3 hip 4 diabetes 4 diabetes 4 diabetes ; run; proc print data=set1; run;

proc sql; create table set1a as select distinct id from set1;

data set2a; informat diag $10.; set set1a; diag='heart'; run;

data set2b; informat diag $10.; set set1a; diag='diabetes'; run;

data set2c; informat diag $10.; set set1a; diag='eye'; run;

data set2; set set2a set2b set2c; run; proc print data=set2 noobs; var id diag; run;

proc sql; create table set3 as select id, diag, count(diag) as count from set1 group by id,diag order by id, diag; quit; run;

proc print data=set3 noobs; run; proc sort data=set2; by id diag; proc sort data=set3; by id diag;

proc sql; create table set4 as select set2.id as id, set2.diag as diag, set3.count as count from set2 left outer join set3 on set2.id=set3.id and set2.diag=set3.diag order by set2.id,set2.diag; quit; run;

data set4a; set set4; if count=. then count=0;

proc print data=set4a noobs; var id diag count; run;

----- Original Message ----- From: "Tom White" <tw2@MAIL.COM> To: <SAS-L@LISTSERV.UGA.EDU> Sent: Wednesday, August 22, 2007 2:04 PM Subject: SAS data step compute COUNTs and Percentages

> Hello SAS-L, > > I have a data set containing about 7M records (and about 30 more fields besides the two fields, ID & CODE) I show here. > > ID is one of the fields in this data set. It stands for a patient ID. > It is a character field of length 1. > > CODE is another 5-digit character field. It stands for a medical procedure. > This field takes on thousands of values. There is, however, a subset of these > values I am interested in. This subset contains the CODE values > > NULL 59510 J2505 59400 J1745 J3490 27447 > A0431 FACIL J9355 J9310 E0784 > > Here is sample dataset containing 34 records. > > ID CODE > 1 J3490 > 1 NULL > 1 36415 > 1 88142 > 1 59510 > 2 99213 > 2 80053 > 2 J2505 > 3 FACIL > 3 J3490 > 4 99232 > 5 80053 > 5 E0784 > 5 E0784 > 5 J9310 > 6 A0431 > 7 59400 > 7 59400 > 8 NULL > 8 NULL > 8 NULL > 8 NULL > 9 27447 > 9 J9355 > 10 E0784 > 10 NULL > 10 E0784 > 10 NULL > 10 J2505 > 10 59510 > 10 NULL > 10 NULL > 10 E0784 > 10 59510 > > > I would like to create additional fields to augment the above data set for modeling. > > > ***************************************************************************************************** > I would like to say in words what I am trying to compute. > > Maybe you can find an easier way to get the results I show below. > > For each patient (ID) in my sample dataset, create new data columnns to compute > the total number of medical services (CODE) they had within any of the top medical services (CODEs) > shown below: > > NULL 59510 J2505 59400 J1745 J3490 27447 > A0431 FACIL J9355 J9310 E0784 > > The above CODEs are the medical procedures, out of many thousands of medical procedures, > I have determined to be of interest to me. > > Therefore I would like to know, given the thousands of procedures, and given my 12 proceduers shwon above, > the total number of procedures (CODEs) a patientb (ID) had within any of the 12 proceduers shown above. > > I would like to create counts and percentages. > ***************************************************************************************************** > > > > (1) I would like to know, for each ID, how many CODEs are found in > the subset of codes mentioned above (I repeat them here): > NULL 59510 J2505 59400 J1745 J3490 27447 > A0431 FACIL J9355 J9310 E0784 > > For example, create a new field, call it, say, SUM_COUNT, to count > the total number of codes per ID: > > ID CODE SUM_COUNT > 1 J3490 1 > 1 NULL 1 > 1 36415 0 > 1 88142 0 > 1 59510 1 > 2 99213 0 > 2 80053 0 > 2 J2505 1 > 3 FACIL 1 > 3 J3490 1 > 4 99232 0 > 5 80053 0 > 5 E0784 2 > 5 E0784 2 > 5 J9310 1 > 6 A0431 1 > 7 59400 2 > 7 59400 2 > 8 NULL 4 > 8 NULL 4 > 8 NULL 4 > 8 NULL 4 > 9 27447 1 > 9 J9355 1 > 10 E0784 3 > 10 NULL 4 > 10 E0784 3 > 10 NULL 4 > 10 J2505 1 > 10 59510 2 > 10 NULL 4 > 10 NULL 4 > 10 E0784 3 > 10 59510 2 > > (2) Now that I have counts, I would like to compute WITHIN ID percentages. > > For example, within ID=10, we have 10 records. Of those 10 records, we have: > > NULL=4 records or 4/10=40% > E0784=3 records or 3/10=30% > J2505=1 records or 1/10=10% > 59510=2 records or 2/10=20% > > My data set now will look like: > > ID CODE SUM_COUNT WITHIN_PCT > 1 J3490 1 0.20 > 1 NULL 1 0.20 > 1 36415 0 0.00 > 1 88142 0 0.00 > 1 59510 1 0.20 > > 2 99213 0 0.00 > 2 80053 0 0.00 > 2 J2505 1 0.33 > > 3 FACIL 1 0.50 > 3 J3490 1 0.50 > > 4 99232 0 0.00 > > 5 80053 0 0.00 > 5 E0784 2 0.50 > 5 E0784 2 0.50 > 5 J9310 1 0.25 > > 6 A0431 1 1.00 > > 7 59400 2 1.00 > 7 59400 2 1.00 > > 8 NULL 4 1.00 > 8 NULL 4 1.00 > 8 NULL 4 1.00 > 8 NULL 4 1.00 > > 9 27447 1 0.50 > 9 J9355 1 0.50 > > 10 E0784 3 0.30 > 10 NULL 4 0.40 > 10 E0784 3 0.30 > 10 NULL 4 0.40 > 10 J2505 1 0.10 > 10 59510 2 0.20 > 10 NULL 4 0.40 > 10 NULL 4 0.40 > 10 E0784 3 0.30 > 10 59510 2 0.20 > > > (3) Finally, I woul like to have an OVERALL percentage. > > For example, lets take ID=1. (We have a TOTAL of 34 records.) > > I would like a new field, call it, OVERALL_PCT. > > For ID=1 and CODE='J3490' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or 2.94%). > For ID=1 and CODE='NULL' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or 2.94%). > For ID=1 and CODE='36415' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or 0.00%). > For ID=1 and CODE='88142' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or 0.0%). > For ID=1 and CODE='59510' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or 2.94%). > > ID CODE SUM_COUNT OVERALL_PCT > 1 J3490 1 0.02941 > 1 NULL 1 0.02941 > 1 36415 0 0.00000 > 1 88142 0 0.00000 > 1 59510 1 0.02941 > > > Therefore, my final desired data set will look like: > > ID CODE SUM_COUNT WITHIN_PCT OVERALL_PCT > 1 J3490 1 0.20 0.02941 > 1 NULL 1 0.20 0.02941 > 1 36415 0 0.00 0.00000 > 1 88142 0 0.00 0.00000 > 1 59510 1 0.20 0.02941 > 2 99213 0 0.00 0.00000 > 2 80053 0 0.00 0.00000 > 2 J2505 1 0.33 0.02941 > 3 FACIL 1 0.50 0.02941 > 3 J3490 1 0.50 0.02941 > 4 99232 0 0.00 0.00000 > 5 80053 0 0.00 0.00000 > 5 E0784 2 0.50 0.05882 > 5 E0784 2 0.50 0.05882 > 5 J9310 1 0.25 0.02941 > 6 A0431 1 1.00 0.02941 > 7 59400 2 1.00 0.05882 > 7 59400 2 1.00 0.05882 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 9 27447 1 0.50 0.02941 > 9 J9355 1 0.50 0.02941 > 10 E0784 3 0.30 0.08823 > 10 NULL 4 0.40 0.11765 > 10 E0784 3 0.30 0.08823 > 10 NULL 4 0.40 0.11765 > 10 J2505 1 0.10 0.02941 > 10 59510 2 0.20 0.05882 > 10 NULL 4 0.40 0.11765 > 10 NULL 4 0.40 0.11765 > 10 E0784 3 0.30 0.08823 > 10 59510 2 0.20 0.05882 > > Thank so much!!! > tom > > -- > We've Got Your Name at http://www.mail.com ! > Get a FREE E-mail Account Today - Choose From 100+ Domains >


Back to: Top of message | Previous page | Main SAS-L page