Date: Fri, 30 Sep 2005 18:24:10 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Difference between 2 N's
You could code an output statement in the PROC MEANS step, then run a DATA
step against the output data set to do the subtraction.
I'd advise Toby's approach instead. Here's a slight variation on that, with
some live data:
data combined;
input DUBPARTS PARTICIPANTS;
cards;
1 .
1 1
1 2
1 3
;
proc sql ;
select count(DubParts) as DubPartsN ,
count(Participants) as ParticipantsN ,
range(calculated DubPartsN, calculated ParticipantsN) as Diff
from combined ;
quit ;
Result:
DubPartsN ParticipantsN Diff
-------------------------------------
4 3 1
On Fri, 30 Sep 2005 07:08:35 -0700, Alex Pavluck <apavluck@GMAIL.COM> wrote:
>Hello. I have a dataset where I want to display the number of
>observations for 2 different variables and the difference between them.
> Right now I am using the following code but I am not sure about how to
>get the difference. Thanks in advance.
>
>
>proc means data=combined N;
>var DUBPARTS PARTICIPANTS;
>label DUBPARTS='Count of Person_ID from original dataset'
> PARTICIPANTS='Count of unique Person_ID';
>title 'Breakdown of participants';
>run;
On Fri, 30 Sep 2005 10:05:19 -0700, awz <aiwu.zhang@GMAIL.COM> wrote:
>I guess Toby's asking to figure out the number of unique id, those
>duplicated or null should not be counted.
Duplicates are counted, nulls are not. That's consistent with the PROC
MEANS results which Alex considered satisfactory.
Of course the SQL solution can be made to ignore dupes, were that desired,
by inserting the keyword DISTINCT before the variable name.
>
>Your SQL will give the total number of records in the table if the
>condition clause "where" is not specified. So the diff will always be
>0.
No. You are thinking of COUNT(*), which counts all rows and is analogous to
the _FREQ_ variable in PROC MEANS/SUMMARY output data sets.
>
>There is no simple solution to the problem because we need to count the
>unique unduplicated or null variables. And these variables are
>unlikely to be indexed.
Looks to me like Toby gave a simple solution.
>SQL has an option to count the dupicated records, but I don't remember
>now..
I've covered the options, namely COUNT( DISTINCT [expression]) to exclude
duplicates and COUNT(*) to include nulls.
>
>AWZ