Date: Wed, 16 Jul 2008 13:47:50 +0000
Reply-To: brucercolton@comcast.net
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Bruce Colton <brucercolton@comcast.net>
Subject: REDUCING PROCESSING TIME
I cut processing off when it hadn't completed after 2 hours. The syntax I used follows.
The cases are hourly, with 24 x 50 = 1200 cases for 50 days. The data is arranged from most recent day, or day 1(cases 1-24) to oldest day(day50 - cases 1177-1200). My objective: for the first 30 days of the data, compute average based on prior 20 days. For example, for day1, average for any given hour of the day = fc<day1>*.05*(node<day2>/fc<day2> + ..... + node<day21>/fc<day21>). For day2, average=fc<day2>*.05*(node<day3>/fc<day3> + ..... +node<day22>/fc<day22>). Variables are day, hr, fc, node1 - node1500.
I attempt to do it all in one pass, storing 20 days worth of values in day_2, thus creating too many variables, and probably the source of the problem. Day_2 has a FIFO structure, where I am writing over, thus eliminating the most recent day in day_2 as cases are read. Question: Might there be a better way to do this, possibly with multiple passes that is more efficient? Richard mentioned in an e-mail that SPSS is efficient processing a 'long' data structure. Would it make sense then to split the data vertically, yielding 2 files with the same number of cases and each containing approx. half the variables, process separately, and concatenate the result?
Any help/suggestions/recommendations are greatly appreciated - in the meantime, I'll pursue the long data structure idea.
SYNTAX
get file = 'c:\users\bruce\documents\spsstestdata_1500.sav'.
*file handle data_xout name = 'c:\users\bruce\documents\data_xout1.sav'.
* ----------.
*
compute daymnth=xdate.mday(day).
do if lag(daymnth) ne daymnth.
compute day_1=day_1+1.
end if.
* -----------------.
* day_1 gives the day of the case, starting with the most recent (day_1 = 0) to the oldest day - in a dataset of 50 days - last day - day_1=49.
* -----------------.
* ---------.
get file = 'c:\users\bruce\documents\spsstestdata_1500.sav'.
*file handle data_xout name = 'c:\users\bruce\documents\data_xout1.sav'.
* ----------.
* keep day 2 for weighted ave calc.
compute daymnth=xdate.mday(day).
do if lag(daymnth) ne daymnth.
compute day_1=day_1+1.
end if.
*leave day_1.
* -----------------.
* day_1 gives the day of the case, starting with the most recent (day_1 = 0) to the oldest day - in a dataset of 50 days - last day - day_1=49.
* -----------------.
* ---------.
vector loadpaste (720) node=var00001 to var01500 day_2(720000) weightedave(36000) nodeforecast(1500).
leave loadpaste1 to loadpaste720 weightedave1 to weightedave36000 nodeforecast1 to nodeforecast1500
day_21 to day_2720000 day_1.
* ---------------.
* process day1.
* --------------.
do if $casenum < 25.
compute loadpaste (hr) = fc.
loop #i=1 to 1500.
compute day_2(#i + 1500*(hr-1))=0.
end loop.
end if.
* ----------------.
* process days 2 thru 20.
* -----------------.
do if $casenum >24 and $casenum <481.
compute loadpaste(day_1*24 + hr)=fc.
loop #i=1 to 1500.
compute day_2(day_1)*36000 + #i + 1500*(hr-1)) = node(#i).
compute weightedave(#i + 1500*(hr-1))=weightedave(#i + 1500*(hr-1)) + node(#i)/fc.
end loop.
end if.
* ----------------.
* process days 21 thru 30 - loadpaste.
* -----------------.
do if $casenum >480 and $casenum <721.
compute loadpaste(day_1*24 + hr)=fc.
end if.
* -----------.
* print forecast day1-day30 starting with day21.
* ----------.
do if $casenum >480 and $casenum <1201.
loop #i=1 to 1500.
*compute day_2(day_1)*36000 + #i + 1500*(hr-1)) = node(#i).
compute weightedave(#i + 1500*(hr-1))=weightedave(#i + 1500*(hr-1)) + node(#i)/fc -
day_2((0)*36000 + #i + 1500*(hr-1))/loadpaste((day_1-20)*24+hr).
compute nodeforecast(#i)=loadpaste((day_1-20)*24+hr)*.05*weightedave(#i + 1500*(hr-1)).
end loop.
xsave outfile='c:\users\bruce\documents\data_xout1.sav'
/keep hr nodeforecast1 to nodeforecast1500.
* -----------------.
* streamline this later.
* --------------.
loop #i=1 to 1500.
compute day_2((0)*36000 + #i + 1500*(hr-1)) = day_2((1)*36000 + #i + 1500*(hr-1)).
compute day_2((1)*36000 + #i + 1500*(hr-1)) = day_2((2)*36000 + #i + 1500*(hr-1)).
compute day_2((2)*36000 + #i + 1500*(hr-1)) = day_2((3)*36000 + #i + 1500*(hr-1)).
compute day_2((3)*36000 + #i + 1500*(hr-1)) = day_2((4)*36000 + #i + 1500*(hr-1)).
compute day_2((4)*36000 + #i + 1500*(hr-1)) = day_2((5)*36000 + #i + 1500*(hr-1)).
compute day_2((5)*36000 + #i + 1500*(hr-1)) = day_2((6)*36000 + #i + 1500*(hr-1)).
compute day_2((6)*36000 + #i + 1500*(hr-1)) = day_2((7)*36000 + #i + 1500*(hr-1)).
compute day_2((7)*36000 + #i + 1500*(hr-1)) = day_2((8)*36000 + #i + 1500*(hr-1)).
compute day_2((8)*36000 + #i + 1500*(hr-1)) = day_2((9)*36000 + #i + 1500*(hr-1)).
compute day_2((9)*36000 + #i + 1500*(hr-1)) = day_2((10)*36000 + #i + 1500*(hr-1)).
compute day_2((10)*36000 + #i + 1500*(hr-1)) = day_2((11)*36000 + #i + 1500*(hr-1)).
compute day_2((11)*36000 + #i + 1500*(hr-1)) = day_2((12)*36000 + #i + 1500*(hr-1)).
compute day_2((12)*36000 + #i + 1500*(hr-1)) = day_2((13)*36000 + #i + 1500*(hr-1)).
compute day_2((13)*36000 + #i + 1500*(hr-1)) = day_2((14)*36000 + #i + 1500*(hr-1)).
compute day_2((14)*36000 + #i + 1500*(hr-1)) = day_2((15)*36000 + #i + 1500*(hr-1)).
compute day_2((15)*36000 + #i + 1500*(hr-1)) = day_2((16)*36000 + #i + 1500*(hr-1)).
compute day_2((16)*36000 + #i + 1500*(hr-1)) = day_2((17)*36000 + #i + 1500*(hr-1)).
compute day_2((17)*36000 + #i + 1500*(hr-1)) = day_2((18)*36000 + #i + 1500*(hr-1)).
compute day_2((18)*36000 + #i + 1500*(hr-1)) = day_2((19)*36000 + #i + 1500*(hr-1)).
compute day_2((19)*36000 + #i + 1500*(hr-1)) = node(#).
end loop.
end if.
execute.
Data looks like:
day hr fc node1 node2 node3 node4 node5
17-Nov-2008 1 23 45 12 17 25 77
17-Nov-2008 2 18 41 99 77 88 77
18-Nov-2008 1 66 33 45 22 44 11
etc.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|