LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 2 Dec 2003 14:16:05 -0500
Reply-To:   Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:   Re: Calculate highest Mode for character variables
Comments:   To: Pankaj Singal <pankaj.singal@WELLSFARGO.COM>
Content-Type:   text/plain

Pankaj,

You will have to write the code for such a specialized problem. here is a model using your sample data.

data w ; input key1 $ key2 $ c1 $ c2 $ ; cards ; 123 abc TB ban 123 abc TB ban 123 abc TB can 123 abc TB ban 123 abc PS ban 123 abc PS ban 123 abc KT ban 123 abc KT man 124 abd TM zan 125 abe TN lan 125 abe TN lan 125 abe TN van 125 abe PS van 125 abe PS van 126 adf NE pan 126 adf KT pan ;

proc freq data = w ; table key1 * key2 / missing noprint out = fkey ; table key1 * key2 * c1 / missing noprint out = fc1 ; table key1 * key2 * c2 / missing noprint out = fc2 ; run ;

proc sort data = fc1 ; by key1 key2 descending count c1 ; run ;

proc sort data = fc2 ; by key1 key2 descending count c2 ; run ;

data q ; merge fkey ( rename = ( count = total ) drop = percent ) fc1 ( rename = ( count = c1_high_mode ) drop = percent ) fc2 ( rename = ( count = c2_high_mode ) drop = percent ) ; by key1 key2 ; if first.key2 ; c1_percent = c1_high_mode / total ; c2_percent = c2_high_mode / total ; run ;

The code could be easily macroized to handle 15 C-variables. If you need help then ask.

The 80 million records may cause more trouble. It is quite likely that the number of categories is too many for one PROC FREQ to handle. PROC SUMMARY is probably better but still may need to be broken down into several runs.

IanWhitlock@westat.com -----Original Message----- From: Pankaj Singal [mailto:pankaj.singal@WELLSFARGO.COM] Sent: Tuesday, December 02, 2003 12:17 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Calculate highest Mode for character variables

Hi All:

We are having a problem to calculate modes for character variables,

This is what we want We need to calculate the mode for a number of variables and keep the highest frequency. Also, if there is more than one value with the same frequency, we just want to keep one of the values of mode.

e.g. Here is my data (originally we have 80 million records with 15 columns)

key1 key2 c1 c2 123 abc TB ban 123 abc TB ban 123 abc TB can 123 abc TB ban 123 abc PS ban 123 abc PS ban 123 abc KT ban 123 abc KT man 124 abd TM zan 125 abe TN lan 125 abe TN lan 125 abe TN van 125 abe PS van 125 abe PS van 126 adf NE pan 126 adf KT pan

This is what we need finally by key1 key2,

key1 key2 c1 c1_percent c1_high_mode c2 c2_percent c2_high_mode 123 abc TB .50 4 ban .75 6 124 abd TM 1.0 1 zan 1.0 1 125 abe TN .6 3 van .6 3 126 adf NE .5 1 pan 1.0 2

Could anyone help suggest a SAS utility/procedure to generate this kind of output.

Any help would be highly appreciated. Pankaj Singal


Back to: Top of message | Previous page | Main SAS-L page