LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 1996, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 26 Apr 1996 16:14:37 GMT
Reply-To:     Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA>
Organization: York University, Ontario, Canada
Subject:      Lags with BY variables: Merge seeks help, possibly from SQL

Many thanks to all those who answered my query re constructing lags with BY variables. Here's a new problem for anyone willing to help, but first I'll explain the old problem and the context:

The simplest solutions to make the lags restart when the BY variable(s) change were variations of the code below to produce the dataset lags

data codes; input subj @; do i=1 to 16; input code $ @; output; end; cards; 1 c a a b a c a c b b a b a a b c 2 c c b b a c a c c a c b c b c c ; data lags; set codes; by subj; drop cnt; lag1 = lag(code); lag2 = lag2(code); if first.subj then cnt=0; cnt+1; if cnt<=1 then lag1 = ' '; if cnt<=2 then lag2 = ' ';

The goal here (for a macro I'm working on) is to produce a complete n-way frequency table giving joint frequencies of events for an arbitrary number of lags. By 'complete' I mean (for 2 subjects, 3 codes, lag1 x lag0), a dataset with 2 x 3 x 3 observations, including those which occur with zero frequencies. Unfortunately, it does not appear that I can convince PROC FREQ to include the zero counts in the output dataset.

proc freq data=lags; tables lag1 * code / noprint out=freq; by subj; proc print; id subj; by subj;

which gives only counts >=1: SUBJ LAG1 CODE COUNT PERCENT

1 c 1 . a a 2 13.3333 a b 3 20.0000 a c 2 13.3333 b a 3 20.0000 b b 1 6.6667 b c 1 6.6667 c a 2 13.3333 c b 1 6.6667

2 c 1 . a c 3 20.0000 b a 1 6.6667 b b 1 6.6667 b c 2 13.3333 c a 2 13.3333 c b 3 20.0000 c c 3 20.0000

My original idea to fill in the zero entries was to construct a dataset of all zeros and merge it with the freq datset for each subject. That is, I want all combinations of a,b,c in pairs filled in with observations of COUNT=0 for those combinations which do not occur. This worked when there were no BY variables for the lag dataset, but does not work now. Here's the simplified verion of the code:

*-- Sort, putting subject last; proc sort data=freq; by lag1 code subj;

*-- Create a table of 0 counts to fill in missing entries; data zeros; do lag1='a', 'b', 'c'; do code='a', 'b', 'c'; count = 0; end; end;

data freq; merge freq(drop=percent) zeros; by lag1 code; if lag1 = ' ' then delete;

*-- Resort, putting subject first; proc sort data=freq; by subj lag1 code;

proc print; id subj; by subj;

and the output:

SUBJ LAG1 CODE COUNT

1 a a 2 a b 3 a c 2 b a 3 b b 1 b c 1 c a 2 c b 1

2 a c 3 b a 1 b b 1 b c 2 c a 2 c b 3 c c 0

Using SET instead of MERGE above gives a different, but incorrect result. The whole job of filling in the zero entries BY subject seems like something that could be much more easily accomplished with PROC SQL, but I'm not familiar enough with it to see how. Can anyone help?

-- Michael Friendly Internet: friendly@hotspur.psych.yorku.ca (NeXTmail OK) Psychology Department York University Voice: 416 736-5118 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA


Back to: Top of message | Previous page | Main SAS-L page