Date: Fri, 26 Apr 1996 16:14:37 GMT
Reply-To: Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA>
Organization: York University, Ontario, Canada
Subject: Lags with BY variables: Merge seeks help, possibly from SQL
Many thanks to all those who answered my query re constructing
lags with BY variables. Here's a new problem for anyone willing
to help, but first I'll explain the old problem and the context:
The simplest solutions to make the
lags restart when the BY variable(s) change were variations of
the code below to produce the dataset lags
data codes;
input subj @;
do i=1 to 16;
input code $ @;
output;
end;
cards;
1 c a a b a c a c b b a b a a b c
2 c c b b a c a c c a c b c b c c
;
data lags;
set codes;
by subj;
drop cnt;
lag1 = lag(code);
lag2 = lag2(code);
if first.subj then cnt=0;
cnt+1;
if cnt<=1 then lag1 = ' ';
if cnt<=2 then lag2 = ' ';
The goal here (for a macro I'm working on) is to produce a complete
n-way frequency table giving joint frequencies
of events for an arbitrary number of lags. By
'complete' I mean (for 2 subjects, 3 codes, lag1 x lag0), a dataset
with 2 x 3 x 3 observations, including those which occur with zero
frequencies. Unfortunately, it does not appear that I can convince
PROC FREQ to include the zero counts in the output dataset.
proc freq data=lags;
tables lag1 * code / noprint out=freq;
by subj;
proc print;
id subj;
by subj;
which gives only counts >=1:
SUBJ LAG1 CODE COUNT PERCENT
1 c 1 .
a a 2 13.3333
a b 3 20.0000
a c 2 13.3333
b a 3 20.0000
b b 1 6.6667
b c 1 6.6667
c a 2 13.3333
c b 1 6.6667
2 c 1 .
a c 3 20.0000
b a 1 6.6667
b b 1 6.6667
b c 2 13.3333
c a 2 13.3333
c b 3 20.0000
c c 3 20.0000
My original idea to fill in the zero entries was to construct
a dataset of all zeros and merge it with the freq datset for
each subject.
That is, I want all combinations of a,b,c in pairs filled in
with observations of COUNT=0 for those combinations which do
not occur.
This worked when there were no BY variables for the lag dataset,
but does not work now. Here's the simplified verion of the
code:
*-- Sort, putting subject last;
proc sort data=freq;
by lag1 code subj;
*-- Create a table of 0 counts to fill in missing entries;
data zeros;
do lag1='a', 'b', 'c';
do code='a', 'b', 'c';
count = 0;
end; end;
data freq;
merge freq(drop=percent) zeros;
by lag1 code;
if lag1 = ' ' then delete;
*-- Resort, putting subject first;
proc sort data=freq;
by subj lag1 code;
proc print;
id subj;
by subj;
and the output:
SUBJ LAG1 CODE COUNT
1 a a 2
a b 3
a c 2
b a 3
b b 1
b c 1
c a 2
c b 1
2 a c 3
b a 1
b b 1
b c 2
c a 2
c b 3
c c 0
Using SET instead of MERGE above gives a different, but incorrect
result. The whole job of filling in the zero entries BY subject
seems like something that could be much more easily accomplished
with PROC SQL, but I'm not familiar enough with it to see how.
Can anyone help?
--
Michael Friendly Internet: friendly@hotspur.psych.yorku.ca (NeXTmail
OK)
Psychology Department
York University Voice: 416 736-5118
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA