| Date: | Tue, 29 Jul 2003 13:54:23 +1000 |
| Reply-To: | paulandpen@optusnet.com.au |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | Paul Dickson <paulandpen@optusnet.com.au> |
| Subject: | Re: CI for median of differences of 2 samples - Slow codeproblem |
|
| Content-Type: | text/plain |
Hi there
It takes about 1.5-2 seconds on my PC. Thank you so much for that syntax, I will
be using it in the future!!!
Cheers Paul
> Asesoría Bioestadística <bioestadistica@eresmas.net> wrote:
>
> Hi Richard
>
> > A. A lot of the time, the sort provided with the system is good.
> Could
> > you write the elements of your vector out as cases (VARSTOCASES or
> > LOOP/XSAVE), use SORT CASES, and re-assemble the vector?
>
> Mmmm. Right now I'm modifying the code to leave the original dataset
> untouched, so I'd rather not write anything into it.
>
> > B. I do not have this article anymore, but recall it as a clear
> > explanation of a fast sorting algorithm, not requiring (I think)
> > pointers or other techniques absent from SPSS:
> >
> > Robert Sedgewick: Implementing Quicksort Programs.
> > *Communications of the ACM* 21(10): 847-857 (1978).
>
> Thanks for the information. I made a Google search for "quicksort
> sorting
> algorithm", and I found Java code for that fast sorting method.
> Unfortunately, I have problems trying to translate it to SPSS MATRIX
> code.
> I think I would need a complex macro for that. Meanwhile, I sort of
> cheated
> and found a way of avoiding the sorting step by using a pointer
> created
> with the MATRIX function GRADE.
>
> The final version for the code works reasonably fast and in any SPSS
> version. I have tried it with W98SE/SPSS 11.5 in a Pentium at 350 Mhz
> with
> 64 Mb RAM with sample sizes up to 100 each one and it takes less than
> 3
> seconds.
>
> ¿Could anybody give me some feedback about the speed in different
> systems?
>
> Thanks
>
> Marta Garcia-Granero
>
> * CI for a difference in 2 medians by Hodges-Lehman method:
> Hodges JR and Lehman EL (1963) "Estimates of location based on rank
> tests"
> Annals of Mathematical Statistics 34, 598-611.
>
> * Although the example provided has equal sample sizes,
> it will work with unequal sample sizes, too.
>
> * Example dataset & Mann-Whitney test *.
> DATA LIST FREE /group(F4.0) wghtgain(F8.0).
> BEGIN DATA
> 1 175 1 149 1 132 1 187 1 218 1 123 1 151
> 1 248 1 200 1 206 1 219 1 179 1 234 1 206
> 2 142 2 214 2 311 2 249 2 337 2 176 2 262
> 2 211 2 302 2 216 2 195 2 236 2 253 2 199
> END DATA.
> VAR LABEL group 'Treatment group' /wghtgain 'Weight gain (pounds)'.
> VALUE LABELS group 1 'Control' 2 'Vit. A'.
>
> * Median & IQR calculation *.
> EXAMINE
> VARIABLES=wghtgain BY group
> /PLOT NONE
> /PERCENTILES(25,50,75)
> /STATISTICS NONE
> /NOTOTAL.
>
> * Mann-Whitney test *.
> NPAR TESTS /M-W= wghtgain BY group(1 2).
>
> ************************************************
> * CI CODE *
> ************************************************
> * If n1*n2>10000, change MXLOOPS accordingly.
> * There is a temporary listwise deletion of missing data before
> running
> MATRIX *.
>
> SET MXLOOPS=10000.
> TEMPORARY.
> SELECT IF((~MISSING(group)) AND (~MISSING(wghtgain))).
> MATRIX.
> PRINT /TITLE ' CONFIDENCE INTERVAL FOR THE DIFFERENCE OF TWO
> MEDIANS'.
> * Get data: replace variable names by your own *.
> GET group /var=group.
> GET depvar /var=wghtgain.
> * Count sample sizes & separate data in two groups *.
> * Assumes group values are 1 & 2, change code values if different*.
> COMPUTE n=nrow(group).
> COMPUTE n1=0.
> COMPUTE n2=0.
> LOOP i=1 TO n.
> DO IF group(i) EQ 1.
> COMPUTE n1=n1+1.
> ELSE IF group(i) EQ 2.
> COMPUTE n2=n2+1.
> END IF.
> END LOOP.
> COMPUTE group1=MAKE(1,n1,0).
> COMPUTE group2=MAKE(1,n2,0).
> COMPUTE n1=0.
> COMPUTE n2=0.
> LOOP i=1 to n.
> DO IF group(i) EQ 1.
> COMPUTE n1=n1+1.
> COMPUTE group1(n1)=depvar(i).
> ELSE IF group(i) EQ 2.
> COMPUTE n2=n2+1.
> COMPUTE group2(n2)=depvar(i).
> END IF.
> END LOOP.
> RELEASE group,depvar,n.
> * Compute vector of all differences *.
> COMPUTE linear=MAKE(1,n1*n2,0).
> LOOP i=1 to n1.
> LOOP j=1 to n2.
> COMPUTE linear(n2*(i-1)+j)=group1(i)-group2(j).
> END LOOP.
> END LOOP.
> COMPUTE pos=GRADE(linear).
> RELEASE group1,group2.
> * Computes median of differences *.
> COMPUTE half=(n1*n2)/2.
> COMPUTE pair=(TRUNC(half)=half).
> DO IF (pair eq 0).
> COMPUTE medpos=(n1*n2+1)/2.
> LOOP i=1 TO n1*n2.
> DO IF medpos=pos(i).
> BREAK.
> END IF.
> END LOOP.
> COMPUTE median=linear(i).
> ELSE IF (pair NE 0).
> COMPUTE medpos1=n1*n2/2.
> COMPUTE medpos2=1+medpos1.
> LOOP i=1 TO n1*n2.
> DO IF medpos1=pos(i).
> BREAK.
> END IF.
> END LOOP.
> COMPUTE lmedian=linear(i).
> LOOP i=1 TO n1*n2.
> DO IF medpos2=pos(i).
> BREAK.
> END IF.
> END LOOP.
> COMPUTE umedian=linear(i).
> COMPUTE median=(lmedian+umedian)/2.
> RELEASE lmedian,umedian.
> END IF.
> PRINT {n1;n2}
> /FORMAT="f5.0"
> /TITLE="Sample sizes"
> /RLABELS=" N1=" " N2=".
> PRINT median
> /TITLE="Point estimation (*)"
> /rlabels="Median"
> /FORMAT="f8.1".
> PRINT /TITLE="(*) Remember it is NOT the same as the difference of
> the 2
> sample medians".
> * Exact or asymptotic 95% & 99% CI *.
> DO IF ((n1 LE 20) AND (n2 LE 20)).
> COMPUTE data95={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,2,3,5,6,7,8,9,11,12,13,14,15,17,18,19,20;
> 0,0,0,0,3,5,6,8,10,11,13,14,16,17,19,21,22,24,25,27;
> 0,0,0,0,5,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34;
> 0,0,0,0,6,8,10,13,15,17,19,22,24,26,29,31,34,36,38,41;
> 0,0,0,0,7,10,12,15,17,20,23,26,28,31,34,37,39,42,45,48;
> 0,0,0,0,8,11,14,17,20,23,26,29,33,36,39,42,45,48,52,55;
> 0,0,0,0,9,13,16,19,23,26,30,33,37,40,44,47,51,55,58,62;
> 0,0,0,0,11,14,18,22,26,29,33,37,41,45,49,53,57,61,65,69;
> 0,0,0,0,12,16,20,24,28,33,37,41,45,50,54,59,63,67,72,76;
> 0,0,0,0,13,17,22,26,31,36,40,45,50,55,59,64,69,74,78,83;
> 0,0,0,0,14,19,24,29,34,39,44,49,54,59,64,70,75,80,85,90;
> 0,0,0,0,15,21,26,31,37,42,47,53,59,64,70,75,81,86,92,98;
> 0,0,0,0,17,22,28,34,39,45,51,57,63,69,75,81,87,93,99,105;
> 0,0,0,0,18,24,30,36,42,48,55,61,67,74,80,86,93,99,106,112;
> 0,0,0,0,19,25,32,38,45,52,58,65,72,78,85,92,99,106,113,119;
> 0,0,0,0,20,27,34,41,48,55,62,69,76,83,90,98,105,112,119,127}.
> COMPUTE data99={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
> 0,0,0,0,0,1,1,2,3,4,5,6,7,7,8,9,10,11,12,13;
> 0,0,0,0,1,2,3,4,5,6,7,9,10,11,12,13,15,16,17,18;
> 0,0,0,0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24;
> 0,0,0,0,2,4,6,7,9,11,13,15,17,18,20,22,24,26,28,30;
> 0,0,0,0,3,5,7,9,11,13,16,18,20,22,24,27,29,31,33,36;
> 0,0,0,0,4,6,9,11,13,16,18,21,24,26,29,31,34,37,39,42;
> 0,0,0,0,5,7,10,13,16,18,21,24,27,30,33,36,39,42,45,48;
> 0,0,0,0,6,9,12,15,18,21,24,27,31,34,37,41,44,47,51,54;
> 0,0,0,0,7,10,13,17,20,24,27,31,34,38,42,45,49,53,57,60;
> 0,0,0,0,7,11,15,18,22,26,30,34,38,42,46,50,54,58,63,67;
> 0,0,0,0,8,12,16,20,24,29,33,37,42,46,51,55,60,64,69,73;
> 0,0,0,0,9,13,18,22,27,31,36,41,45,50,55,60,65,70,74,79;
> 0,0,0,0,10,15,19,24,29,34,39,44,49,54,60,65,70,75,81,86;
> 0,0,0,0,11,16,21,26,31,37,42,47,53,58,64,70,75,81,87,92;
> 0,0,0,0,12,17,22,28,33,39,45,51,57,63,69,74,81,87,93,99;
> 0,0,0,0,13,18,24,30,36,42,48,54,60,67,73,79,86,92,99,105}.
> COMPUTE u95=data95(n1,n2).
> COMPUTE u99=data99(n1,n2).
> PRINT /TITLE"Exact confidence intervals calculated (N1 and N2=<20)".
> ELSE IF ((n1 GT 20) OR (n2 GT 20)).
> COMPUTE u95=TRUNC(n1*n2/2-0.5-1.96*sqrt(n1*n2*(n1+n2+1)/12)).
> COMPUTE u99=TRUNC(n1*n2/2-0.5-2.576*sqrt(n1*n2*(n1+n2+1)/12)).
> PRINT /TITLE"Asymptotic confidence intervals calculated (N1 and/or
> N2>20)".
>
> END IF.
> LOOP i=1 TO n1*n2.
> DO IF pos(i) EQ u95.
> BREAK.
> END IF.
> END LOOP.
> COMPUTE low95=linear(i).
> LOOP i=1 TO n1*n2.
> DO IF pos(i) EQ n1*n2-u95.
> BREAK.
> END IF.
> END LOOP.
> COMPUTE high95=linear(i).
> LOOP i=1 TO n1*n2.
> DO IF pos(i) EQ u99.
> BREAK.
> END IF.
> END LOOP.
> COMPUTE low99=linear(i).
> LOOP i=1 TO n1*n2.
> DO IF pos(i) EQ n1*n2-u99.
> BREAK.
> END IF.
> END LOOP.
> COMPUTE high99=linear(i).
> PRINT {low95,high95;low99,high99}
> /FORMAT="f8.1"
> /TITLE="CI for difference of medians"
> /RLABELS="95%Level" "99%Level"
> /CLABELS="Lower" "Upper".
> END MATRIX.
|