LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 30 Sep 2005 18:24:10 -0400
Reply-To:     "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject:      Re: Difference between 2 N's

You could code an output statement in the PROC MEANS step, then run a DATA step against the output data set to do the subtraction.

I'd advise Toby's approach instead. Here's a slight variation on that, with some live data:

data combined; input DUBPARTS PARTICIPANTS; cards; 1 . 1 1 1 2 1 3 ;

proc sql ; select count(DubParts) as DubPartsN , count(Participants) as ParticipantsN , range(calculated DubPartsN, calculated ParticipantsN) as Diff from combined ; quit ;

Result:

DubPartsN ParticipantsN Diff ------------------------------------- 4 3 1

On Fri, 30 Sep 2005 07:08:35 -0700, Alex Pavluck <apavluck@GMAIL.COM> wrote:

>Hello. I have a dataset where I want to display the number of >observations for 2 different variables and the difference between them. > Right now I am using the following code but I am not sure about how to >get the difference. Thanks in advance. > > >proc means data=combined N; >var DUBPARTS PARTICIPANTS; >label DUBPARTS='Count of Person_ID from original dataset' > PARTICIPANTS='Count of unique Person_ID'; >title 'Breakdown of participants'; >run;

On Fri, 30 Sep 2005 10:05:19 -0700, awz <aiwu.zhang@GMAIL.COM> wrote:

>I guess Toby's asking to figure out the number of unique id, those >duplicated or null should not be counted.

Duplicates are counted, nulls are not. That's consistent with the PROC MEANS results which Alex considered satisfactory.

Of course the SQL solution can be made to ignore dupes, were that desired, by inserting the keyword DISTINCT before the variable name.

> >Your SQL will give the total number of records in the table if the >condition clause "where" is not specified. So the diff will always be >0.

No. You are thinking of COUNT(*), which counts all rows and is analogous to the _FREQ_ variable in PROC MEANS/SUMMARY output data sets.

> >There is no simple solution to the problem because we need to count the >unique unduplicated or null variables. And these variables are >unlikely to be indexed.

Looks to me like Toby gave a simple solution.

>SQL has an option to count the dupicated records, but I don't remember >now..

I've covered the options, namely COUNT( DISTINCT [expression]) to exclude duplicates and COUNT(*) to include nulls.

> >AWZ


Back to: Top of message | Previous page | Main SAS-L page