LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2003)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 29 Jul 2003 13:54:23 +1000
Reply-To:   paulandpen@optusnet.com.au
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Paul Dickson <paulandpen@optusnet.com.au>
Subject:   Re: CI for median of differences of 2 samples - Slow codeproblem
Comments:   To: Asesoría Bioestadística <bioestadistica@eresmas.net>
Content-Type:   text/plain

Hi there

It takes about 1.5-2 seconds on my PC. Thank you so much for that syntax, I will be using it in the future!!!

Cheers Paul

> Asesoría Bioestadística <bioestadistica@eresmas.net> wrote: > > Hi Richard > > > A. A lot of the time, the sort provided with the system is good. > Could > > you write the elements of your vector out as cases (VARSTOCASES or > > LOOP/XSAVE), use SORT CASES, and re-assemble the vector? > > Mmmm. Right now I'm modifying the code to leave the original dataset > untouched, so I'd rather not write anything into it. > > > B. I do not have this article anymore, but recall it as a clear > > explanation of a fast sorting algorithm, not requiring (I think) > > pointers or other techniques absent from SPSS: > > > > Robert Sedgewick: Implementing Quicksort Programs. > > *Communications of the ACM* 21(10): 847-857 (1978). > > Thanks for the information. I made a Google search for "quicksort > sorting > algorithm", and I found Java code for that fast sorting method. > Unfortunately, I have problems trying to translate it to SPSS MATRIX > code. > I think I would need a complex macro for that. Meanwhile, I sort of > cheated > and found a way of avoiding the sorting step by using a pointer > created > with the MATRIX function GRADE. > > The final version for the code works reasonably fast and in any SPSS > version. I have tried it with W98SE/SPSS 11.5 in a Pentium at 350 Mhz > with > 64 Mb RAM with sample sizes up to 100 each one and it takes less than > 3 > seconds. > > ¿Could anybody give me some feedback about the speed in different > systems? > > Thanks > > Marta Garcia-Granero > > * CI for a difference in 2 medians by Hodges-Lehman method: > Hodges JR and Lehman EL (1963) "Estimates of location based on rank > tests" > Annals of Mathematical Statistics 34, 598-611. > > * Although the example provided has equal sample sizes, > it will work with unequal sample sizes, too. > > * Example dataset & Mann-Whitney test *. > DATA LIST FREE /group(F4.0) wghtgain(F8.0). > BEGIN DATA > 1 175 1 149 1 132 1 187 1 218 1 123 1 151 > 1 248 1 200 1 206 1 219 1 179 1 234 1 206 > 2 142 2 214 2 311 2 249 2 337 2 176 2 262 > 2 211 2 302 2 216 2 195 2 236 2 253 2 199 > END DATA. > VAR LABEL group 'Treatment group' /wghtgain 'Weight gain (pounds)'. > VALUE LABELS group 1 'Control' 2 'Vit. A'. > > * Median & IQR calculation *. > EXAMINE > VARIABLES=wghtgain BY group > /PLOT NONE > /PERCENTILES(25,50,75) > /STATISTICS NONE > /NOTOTAL. > > * Mann-Whitney test *. > NPAR TESTS /M-W= wghtgain BY group(1 2). > > ************************************************ > * CI CODE * > ************************************************ > * If n1*n2>10000, change MXLOOPS accordingly. > * There is a temporary listwise deletion of missing data before > running > MATRIX *. > > SET MXLOOPS=10000. > TEMPORARY. > SELECT IF((~MISSING(group)) AND (~MISSING(wghtgain))). > MATRIX. > PRINT /TITLE ' CONFIDENCE INTERVAL FOR THE DIFFERENCE OF TWO > MEDIANS'. > * Get data: replace variable names by your own *. > GET group /var=group. > GET depvar /var=wghtgain. > * Count sample sizes & separate data in two groups *. > * Assumes group values are 1 & 2, change code values if different*. > COMPUTE n=nrow(group). > COMPUTE n1=0. > COMPUTE n2=0. > LOOP i=1 TO n. > DO IF group(i) EQ 1. > COMPUTE n1=n1+1. > ELSE IF group(i) EQ 2. > COMPUTE n2=n2+1. > END IF. > END LOOP. > COMPUTE group1=MAKE(1,n1,0). > COMPUTE group2=MAKE(1,n2,0). > COMPUTE n1=0. > COMPUTE n2=0. > LOOP i=1 to n. > DO IF group(i) EQ 1. > COMPUTE n1=n1+1. > COMPUTE group1(n1)=depvar(i). > ELSE IF group(i) EQ 2. > COMPUTE n2=n2+1. > COMPUTE group2(n2)=depvar(i). > END IF. > END LOOP. > RELEASE group,depvar,n. > * Compute vector of all differences *. > COMPUTE linear=MAKE(1,n1*n2,0). > LOOP i=1 to n1. > LOOP j=1 to n2. > COMPUTE linear(n2*(i-1)+j)=group1(i)-group2(j). > END LOOP. > END LOOP. > COMPUTE pos=GRADE(linear). > RELEASE group1,group2. > * Computes median of differences *. > COMPUTE half=(n1*n2)/2. > COMPUTE pair=(TRUNC(half)=half). > DO IF (pair eq 0). > COMPUTE medpos=(n1*n2+1)/2. > LOOP i=1 TO n1*n2. > DO IF medpos=pos(i). > BREAK. > END IF. > END LOOP. > COMPUTE median=linear(i). > ELSE IF (pair NE 0). > COMPUTE medpos1=n1*n2/2. > COMPUTE medpos2=1+medpos1. > LOOP i=1 TO n1*n2. > DO IF medpos1=pos(i). > BREAK. > END IF. > END LOOP. > COMPUTE lmedian=linear(i). > LOOP i=1 TO n1*n2. > DO IF medpos2=pos(i). > BREAK. > END IF. > END LOOP. > COMPUTE umedian=linear(i). > COMPUTE median=(lmedian+umedian)/2. > RELEASE lmedian,umedian. > END IF. > PRINT {n1;n2} > /FORMAT="f5.0" > /TITLE="Sample sizes" > /RLABELS=" N1=" " N2=". > PRINT median > /TITLE="Point estimation (*)" > /rlabels="Median" > /FORMAT="f8.1". > PRINT /TITLE="(*) Remember it is NOT the same as the difference of > the 2 > sample medians". > * Exact or asymptotic 95% & 99% CI *. > DO IF ((n1 LE 20) AND (n2 LE 20)). > COMPUTE data95={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,2,3,5,6,7,8,9,11,12,13,14,15,17,18,19,20; > 0,0,0,0,3,5,6,8,10,11,13,14,16,17,19,21,22,24,25,27; > 0,0,0,0,5,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34; > 0,0,0,0,6,8,10,13,15,17,19,22,24,26,29,31,34,36,38,41; > 0,0,0,0,7,10,12,15,17,20,23,26,28,31,34,37,39,42,45,48; > 0,0,0,0,8,11,14,17,20,23,26,29,33,36,39,42,45,48,52,55; > 0,0,0,0,9,13,16,19,23,26,30,33,37,40,44,47,51,55,58,62; > 0,0,0,0,11,14,18,22,26,29,33,37,41,45,49,53,57,61,65,69; > 0,0,0,0,12,16,20,24,28,33,37,41,45,50,54,59,63,67,72,76; > 0,0,0,0,13,17,22,26,31,36,40,45,50,55,59,64,69,74,78,83; > 0,0,0,0,14,19,24,29,34,39,44,49,54,59,64,70,75,80,85,90; > 0,0,0,0,15,21,26,31,37,42,47,53,59,64,70,75,81,86,92,98; > 0,0,0,0,17,22,28,34,39,45,51,57,63,69,75,81,87,93,99,105; > 0,0,0,0,18,24,30,36,42,48,55,61,67,74,80,86,93,99,106,112; > 0,0,0,0,19,25,32,38,45,52,58,65,72,78,85,92,99,106,113,119; > 0,0,0,0,20,27,34,41,48,55,62,69,76,83,90,98,105,112,119,127}. > COMPUTE data99={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; > 0,0,0,0,0,1,1,2,3,4,5,6,7,7,8,9,10,11,12,13; > 0,0,0,0,1,2,3,4,5,6,7,9,10,11,12,13,15,16,17,18; > 0,0,0,0,1,3,4,6,7,9,10,12,13,15,16,18,19,21,22,24; > 0,0,0,0,2,4,6,7,9,11,13,15,17,18,20,22,24,26,28,30; > 0,0,0,0,3,5,7,9,11,13,16,18,20,22,24,27,29,31,33,36; > 0,0,0,0,4,6,9,11,13,16,18,21,24,26,29,31,34,37,39,42; > 0,0,0,0,5,7,10,13,16,18,21,24,27,30,33,36,39,42,45,48; > 0,0,0,0,6,9,12,15,18,21,24,27,31,34,37,41,44,47,51,54; > 0,0,0,0,7,10,13,17,20,24,27,31,34,38,42,45,49,53,57,60; > 0,0,0,0,7,11,15,18,22,26,30,34,38,42,46,50,54,58,63,67; > 0,0,0,0,8,12,16,20,24,29,33,37,42,46,51,55,60,64,69,73; > 0,0,0,0,9,13,18,22,27,31,36,41,45,50,55,60,65,70,74,79; > 0,0,0,0,10,15,19,24,29,34,39,44,49,54,60,65,70,75,81,86; > 0,0,0,0,11,16,21,26,31,37,42,47,53,58,64,70,75,81,87,92; > 0,0,0,0,12,17,22,28,33,39,45,51,57,63,69,74,81,87,93,99; > 0,0,0,0,13,18,24,30,36,42,48,54,60,67,73,79,86,92,99,105}. > COMPUTE u95=data95(n1,n2). > COMPUTE u99=data99(n1,n2). > PRINT /TITLE"Exact confidence intervals calculated (N1 and N2=<20)". > ELSE IF ((n1 GT 20) OR (n2 GT 20)). > COMPUTE u95=TRUNC(n1*n2/2-0.5-1.96*sqrt(n1*n2*(n1+n2+1)/12)). > COMPUTE u99=TRUNC(n1*n2/2-0.5-2.576*sqrt(n1*n2*(n1+n2+1)/12)). > PRINT /TITLE"Asymptotic confidence intervals calculated (N1 and/or > N2>20)". > > END IF. > LOOP i=1 TO n1*n2. > DO IF pos(i) EQ u95. > BREAK. > END IF. > END LOOP. > COMPUTE low95=linear(i). > LOOP i=1 TO n1*n2. > DO IF pos(i) EQ n1*n2-u95. > BREAK. > END IF. > END LOOP. > COMPUTE high95=linear(i). > LOOP i=1 TO n1*n2. > DO IF pos(i) EQ u99. > BREAK. > END IF. > END LOOP. > COMPUTE low99=linear(i). > LOOP i=1 TO n1*n2. > DO IF pos(i) EQ n1*n2-u99. > BREAK. > END IF. > END LOOP. > COMPUTE high99=linear(i). > PRINT {low95,high95;low99,high99} > /FORMAT="f8.1" > /TITLE="CI for difference of medians" > /RLABELS="95%Level" "99%Level" > /CLABELS="Lower" "Upper". > END MATRIX.


Back to: Top of message | Previous page | Main SPSSX-L page