```Date: Wed, 22 Aug 2007 16:15:42 -0500 Reply-To: Mary Sender: "SAS(r) Discussion" From: Mary Subject: Re: SAS data step compute COUNTs and Percentages Comments: To: Tom White Content-Type: text/plain; charset="iso-8859-1" Hi Tom, I've played around with some code that would get your data in the format that you want it- before percentages. Here it is below. Mary Howard Research Assistant III Dept. of Ophthalmology Univ. of Iowa Hospitals and Clinics data set1; informat diag \$10.; input id diag \$; cards; 1 heart 1 heart 1 diabetes 2 eye 3 hip 3 hip 4 diabetes 4 diabetes 4 diabetes ; run; proc print data=set1; run; proc sql; create table set1a as select distinct id from set1; data set2a; informat diag \$10.; set set1a; diag='heart'; run; data set2b; informat diag \$10.; set set1a; diag='diabetes'; run; data set2c; informat diag \$10.; set set1a; diag='eye'; run; data set2; set set2a set2b set2c; run; proc print data=set2 noobs; var id diag; run; proc sql; create table set3 as select id, diag, count(diag) as count from set1 group by id,diag order by id, diag; quit; run; proc print data=set3 noobs; run; proc sort data=set2; by id diag; proc sort data=set3; by id diag; proc sql; create table set4 as select set2.id as id, set2.diag as diag, set3.count as count from set2 left outer join set3 on set2.id=set3.id and set2.diag=set3.diag order by set2.id,set2.diag; quit; run; data set4a; set set4; if count=. then count=0; proc print data=set4a noobs; var id diag count; run; ----- Original Message ----- From: "Tom White" To: Sent: Wednesday, August 22, 2007 2:04 PM Subject: SAS data step compute COUNTs and Percentages > Hello SAS-L, > > I have a data set containing about 7M records (and about 30 more fields besides the two fields, ID & CODE) I show here. > > ID is one of the fields in this data set. It stands for a patient ID. > It is a character field of length 1. > > CODE is another 5-digit character field. It stands for a medical procedure. > This field takes on thousands of values. There is, however, a subset of these > values I am interested in. This subset contains the CODE values > > NULL 59510 J2505 59400 J1745 J3490 27447 > A0431 FACIL J9355 J9310 E0784 > > Here is sample dataset containing 34 records. > > ID CODE > 1 J3490 > 1 NULL > 1 36415 > 1 88142 > 1 59510 > 2 99213 > 2 80053 > 2 J2505 > 3 FACIL > 3 J3490 > 4 99232 > 5 80053 > 5 E0784 > 5 E0784 > 5 J9310 > 6 A0431 > 7 59400 > 7 59400 > 8 NULL > 8 NULL > 8 NULL > 8 NULL > 9 27447 > 9 J9355 > 10 E0784 > 10 NULL > 10 E0784 > 10 NULL > 10 J2505 > 10 59510 > 10 NULL > 10 NULL > 10 E0784 > 10 59510 > > > I would like to create additional fields to augment the above data set for modeling. > > > ***************************************************************************************************** > I would like to say in words what I am trying to compute. > > Maybe you can find an easier way to get the results I show below. > > For each patient (ID) in my sample dataset, create new data columnns to compute > the total number of medical services (CODE) they had within any of the top medical services (CODEs) > shown below: > > NULL 59510 J2505 59400 J1745 J3490 27447 > A0431 FACIL J9355 J9310 E0784 > > The above CODEs are the medical procedures, out of many thousands of medical procedures, > I have determined to be of interest to me. > > Therefore I would like to know, given the thousands of procedures, and given my 12 proceduers shwon above, > the total number of procedures (CODEs) a patientb (ID) had within any of the 12 proceduers shown above. > > I would like to create counts and percentages. > ***************************************************************************************************** > > > > (1) I would like to know, for each ID, how many CODEs are found in > the subset of codes mentioned above (I repeat them here): > NULL 59510 J2505 59400 J1745 J3490 27447 > A0431 FACIL J9355 J9310 E0784 > > For example, create a new field, call it, say, SUM_COUNT, to count > the total number of codes per ID: > > ID CODE SUM_COUNT > 1 J3490 1 > 1 NULL 1 > 1 36415 0 > 1 88142 0 > 1 59510 1 > 2 99213 0 > 2 80053 0 > 2 J2505 1 > 3 FACIL 1 > 3 J3490 1 > 4 99232 0 > 5 80053 0 > 5 E0784 2 > 5 E0784 2 > 5 J9310 1 > 6 A0431 1 > 7 59400 2 > 7 59400 2 > 8 NULL 4 > 8 NULL 4 > 8 NULL 4 > 8 NULL 4 > 9 27447 1 > 9 J9355 1 > 10 E0784 3 > 10 NULL 4 > 10 E0784 3 > 10 NULL 4 > 10 J2505 1 > 10 59510 2 > 10 NULL 4 > 10 NULL 4 > 10 E0784 3 > 10 59510 2 > > (2) Now that I have counts, I would like to compute WITHIN ID percentages. > > For example, within ID=10, we have 10 records. Of those 10 records, we have: > > NULL=4 records or 4/10=40% > E0784=3 records or 3/10=30% > J2505=1 records or 1/10=10% > 59510=2 records or 2/10=20% > > My data set now will look like: > > ID CODE SUM_COUNT WITHIN_PCT > 1 J3490 1 0.20 > 1 NULL 1 0.20 > 1 36415 0 0.00 > 1 88142 0 0.00 > 1 59510 1 0.20 > > 2 99213 0 0.00 > 2 80053 0 0.00 > 2 J2505 1 0.33 > > 3 FACIL 1 0.50 > 3 J3490 1 0.50 > > 4 99232 0 0.00 > > 5 80053 0 0.00 > 5 E0784 2 0.50 > 5 E0784 2 0.50 > 5 J9310 1 0.25 > > 6 A0431 1 1.00 > > 7 59400 2 1.00 > 7 59400 2 1.00 > > 8 NULL 4 1.00 > 8 NULL 4 1.00 > 8 NULL 4 1.00 > 8 NULL 4 1.00 > > 9 27447 1 0.50 > 9 J9355 1 0.50 > > 10 E0784 3 0.30 > 10 NULL 4 0.40 > 10 E0784 3 0.30 > 10 NULL 4 0.40 > 10 J2505 1 0.10 > 10 59510 2 0.20 > 10 NULL 4 0.40 > 10 NULL 4 0.40 > 10 E0784 3 0.30 > 10 59510 2 0.20 > > > (3) Finally, I woul like to have an OVERALL percentage. > > For example, lets take ID=1. (We have a TOTAL of 34 records.) > > I would like a new field, call it, OVERALL_PCT. > > For ID=1 and CODE='J3490' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or 2.94%). > For ID=1 and CODE='NULL' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or 2.94%). > For ID=1 and CODE='36415' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or 0.00%). > For ID=1 and CODE='88142' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or 0.0%). > For ID=1 and CODE='59510' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or 2.94%). > > ID CODE SUM_COUNT OVERALL_PCT > 1 J3490 1 0.02941 > 1 NULL 1 0.02941 > 1 36415 0 0.00000 > 1 88142 0 0.00000 > 1 59510 1 0.02941 > > > Therefore, my final desired data set will look like: > > ID CODE SUM_COUNT WITHIN_PCT OVERALL_PCT > 1 J3490 1 0.20 0.02941 > 1 NULL 1 0.20 0.02941 > 1 36415 0 0.00 0.00000 > 1 88142 0 0.00 0.00000 > 1 59510 1 0.20 0.02941 > 2 99213 0 0.00 0.00000 > 2 80053 0 0.00 0.00000 > 2 J2505 1 0.33 0.02941 > 3 FACIL 1 0.50 0.02941 > 3 J3490 1 0.50 0.02941 > 4 99232 0 0.00 0.00000 > 5 80053 0 0.00 0.00000 > 5 E0784 2 0.50 0.05882 > 5 E0784 2 0.50 0.05882 > 5 J9310 1 0.25 0.02941 > 6 A0431 1 1.00 0.02941 > 7 59400 2 1.00 0.05882 > 7 59400 2 1.00 0.05882 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 9 27447 1 0.50 0.02941 > 9 J9355 1 0.50 0.02941 > 10 E0784 3 0.30 0.08823 > 10 NULL 4 0.40 0.11765 > 10 E0784 3 0.30 0.08823 > 10 NULL 4 0.40 0.11765 > 10 J2505 1 0.10 0.02941 > 10 59510 2 0.20 0.05882 > 10 NULL 4 0.40 0.11765 > 10 NULL 4 0.40 0.11765 > 10 E0784 3 0.30 0.08823 > 10 59510 2 0.20 0.05882 > > Thank so much!!! > tom > > -- > We've Got Your Name at http://www.mail.com ! > Get a FREE E-mail Account Today - Choose From 100+ Domains > ```

Back to: Top of message | Previous page | Main SAS-L page