Date: Mon, 27 Nov 2006 21:10:29 -1000
Reply-To: Bob Schacht <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Bob Schacht <firstname.lastname@example.org>
Subject: Re: Jaccard used to infer importance
Content-Type: text/plain; charset=us-ascii; format=flowed
At 08:33 PM 11/27/2006, Mark Webb wrote:
>Typical statement x brand association data can be converted into inferred
>importance [of the statements] using Jaccard coefficients.
>Can anyone explain a-the format of the binary data, & b-how importance is
>inferred from the similarity scores.
One of the things to be wary about is what Jaccard's coefficient does to
double negatives. For example, if you are comparing whether or not two
things are present together, is it meaningful if both are *absent*? Your
answer to this question will vary with circumstances. For example, if you
sample contexts in which one or the other are likely to occur, then the
absence of both might be quite unusual and significant. But if you sample
the whole environment without knowing whether either thing will be there,
and if the things are not very common to start with, you'll wind up looking
in the wrong place a lot of the time. And that will mean lots of double
negatives. But in this case, the number of "agreements" will become
artificially inflated, and it will *look* as though there's an important
correlation, even when there isn't.
I don't know of any way to control this factor in assessing "importance" of
the level of the Jaccard's coefficient. If sampling is random, but
misdirected, you'll be finding agreement where there may not be any. It may
not mean what you think.
Bob in HI