| Date: | Thu, 5 Aug 1999 16:46:14 +0100 |
| Reply-To: | tra <tra@proteus.co.uk> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | tra <tra@PROTEUS.CO.UK> |
| Organization: | Proteus Molecular Design Ltd |
| Subject: | Re: A Challenging Problem [2] |
|
| Content-Type: | text/plain; charset=us-ascii |
John,
what an interesting problem.
I have been worrying about my previous solution.
It will give the wrong answers for some data because the same period of
observation for rate 1 may
match more than one periods for rater 2 (after allowing the 1 second
leeway).
I cannot see an easy and complete way to fix this up in sql, using
start-end intervals.
If you can assume that all the start-end times are integers, then a
simpler approach is possible (but at the cost of more computation).
Here is my second attept at a solution.
data test;
length id code 8 key $ 1 start end 8;
input id code key start end;
datalines;
2001 1 V 04 10
2001 1 B 10 15
2001 1 V 15 16
2001 1 N 16 30
2001 1 P 17 17
2001 1 V 30 35
2001 1 \ 35 35
2002 1 V 10 15
2002 1 B 15 20
2002 1 P 17 17
2002 1 V 20 35
2002 1 \ 35 35
2001 2 V 05 10
2001 2 B 10 16
2001 2 V 16 17
2001 2 N 17 30
2001 2 P 18 18
2001 2 V 30 36
2001 2 \ 36 36
2002 2 V 9 15
2002 2 B 15 21
2002 2 P 17 17
2002 2 V 21 37
2002 2 \ 37 37
2003 1 X 17 25
2003 1 \ 25 25
2003 2 Y 18 20
2003 2 X 20 21
2003 2 X 21 22
2003 2 \ 22 22
2004 2 X 17 25
2004 2 \ 25 25
2004 1 Z 18 20
2004 1 X 20 21
2004 1 X 21 22
2004 1 \ 22 22
;
%print;
/* discretised solution */
/* this is only sensible if all start and end times are integers */
/* again assume that 'P' is an 'event' */
/* expand data to a record for each key-second */
data tdisc;
set test;
drop start end;
if key ne '\';
do time = start to end-1+(key in ('P'));
output;
end;
run;
/* same as above, but with 1-second leeway */
data tdisc1;
set test;
drop start end;
if key ne '\';
do time = start-1 to end+(key in ('P'));
output;
end;
run;
proc sql;
/* matchdsc - key-seconds in tdisc for which there is a match in
tdisc1 */
create table matchdsc as
select a.id, a.key, a.time from tdisc a, tdisc1 b
where a.code = 1
and b.code = 2
and a.id = b.id and a.key = b.key and a.time = b.time
group by a.id, a.key, a.time
having count(*) > 0;
/* commndsc - key-seconds in tdisc for which ther is a near match in
tdisc1
near matches do NOT have to have the same key */
create table commndsc as
select a.id, a.key, a.time from tdisc a, tdisc1 b
where a.code = 1
and b.code = 2
and a.id = b.id and a.time = b.time
group by a.id, a.key, a.time
having count(*) > 0;
/* mis-matches */
create table mismatch as
select * from commndsc
except
select * from matchdsc;
/* collect statistics */
create table kappa as
select a.key, a.count as duration, max(b.count,0) as durmatch,
calculated durmatch/calculated duration as kappa
from
(
select key, count(time) as count from commndsc
group by key
) a
left join
(
select key, count(time) as count from matchdsc
group by key
) b
on a.key = b.key
;
select * from kappa;
quit;
JGerstle@SW.UA.EDU wrote:
> Greetings and Salutations All
> I posted a question last week about sequential analyses via
> SAS and received some responses that I could use SAS/ETS.
> Thank you for the info. Unfortunately, we do not have this module.
>
> As my subject line indicates, I have somewhat a challenge to
> any of you that have the time to come up with some sort of plan to
> address the problem I'm going to relate. First, I should tell you that
> I am using SAS 6.12 for Windows 95. What I'm looking for are
> ideas and clues that will, hopefully, lead me to solve my problem.
> OK, onto the actual query.
>
> I have a dataset (shown at the bottom) that was put together
> from several flat files created via a BASICA program that we use to
> do our observational data collection on residents in nursing homes.
> The program, in a nutshell, asks for certain header info (name, id,
> date, etc..) and then starts recording, using the internal clock of
> the laptop, the start and stop times, in seconds, of several different
> behaviors which are represented by various keys on the keyboard
> (i.e. 'V' for disruptive behavior, 'B' for talking to self, 'N' for talking to
> another resident, etc.).
> I wrote a SAS program (I will send a copy for any that are
> interested personally) that wil read in the hundreds of flat files
> containing this info and separate the header and data portions of
> the files into separate datasets. Then I can simply do the analyses
> I need to do (like keypercents). The problem I need to address is a
> way to calculate reliability kappas for a pair of primary and rely flat
> files. The procedure we have now uses a couple of Pascal
> programs, but the composer of these programs does not work with
> us anymore and we have the need to modify how we calculate our
> kappas.
> Now some of our keys are event keys, only 'on' for a second,
> while the rest are duration keys, 'on' for several seconds. We want
> to give a one second window on either side of both types of keys
> so if one of the raters is off by a second with the onset of a key,
> the kappa program will take this into account and not discount the
> lost second.
>
> Here's a sample dataset with variable names ID, Primary/Rely
> Code (1 for primary, 2 for rely), KEY, START time, END time:
> (Keys V, B, & N are duration and key P is event, / is used aas
> end of file). The length of the file (the total number of seconds) and
> the number of lines of data are the last two lines of the header,
> which can be merged with the data and used.
>
> 2001 1 V 04 10
> 2001 1 B 10 15
> 2001 1 V 15 16
> 2001 1 N 16 30
> 2001 1 P 17 17
> 2001 1 V 30 35
> 2001 1 \ 35 35
> 2002 1 V 10 15
> 2002 1 B 15 20
> 2002 1 P 17 17
> 2002 1 V 20 35
> 2002 1 \ 35 35
> 2001 2 V 05 10
> 2001 2 B 10 16
> 2001 2 V 16 17
> 2001 2 N 17 30
> 2001 2 P 18 18
> 2001 2 V 30 36
> 2001 2 \ 36 36
> 2002 2 V 9 15
> 2002 2 B 15 21
> 2002 2 P 17 17
> 2002 2 V 21 37
> 2002 2 \ 37 37
>
> etc......
>
> Thank you much for any ideas and leads that you may come up
> with...
>
> John Gerstle
> Program Analyst, Sr.
> Applied Gerontology Program
> University of Alabama
--
T R Auton PhD MSc C.Math
Head of Biomedical Statistics
Proteus Molecular Design Ltd
Beechfield House
Lyme Green Business Park
Macclesfield
Cheshire SK11 0JL
UK
email: tra@proteus.co.uk
|