LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 21 Oct 2010 16:27:34 +0000
Reply-To:   "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Subject:   Re: Transaction level processing with rolling window of time
Comments:   To: Denise Kruse <krusesas@YAHOO.COM>
In-Reply-To:   <932067.64811.qm@web57508.mail.re1.yahoo.com>
Content-Type:   text/plain; charset="iso-8859-1"

Denise:

Here's a DATA step approach, which will work if your data is already sorted by ID and DT (datetime). It assumes that no more than 3,000 transactions will be found in the entire history for any single ID. Of course you could increase that, if needed, to whatever the limit is for SAS array size in your situation.

data have; input id dt datetime18.0 amt @@; **Read multi records per line**; format dt datetime18.0; datalines; 1 01oct2008:01:30:05 1 1 01oct2008:05:30:05 2 1 01oct2008:10:30:05 3 1 01oct2008:16:30:05 4 1 01oct2008:20:30:05 5 1 01oct2008:23:30:05 6 1 02oct2008:02:30:05 7 1 02oct2008:07:30:05 8 1 02oct2008:15:30:05 9 1 01oct2008:01:30:05 1 2 01oct2008:05:30:05 2 2 01oct2008:10:30:05 3 2 01oct2008:16:30:05 4 2 01oct2008:20:30:05 5 2 01oct2008:23:30:05 6 2 02oct2008:02:30:05 7 2 02oct2008:07:30:05 8 2 02oct2008:15:30:05 9 run;

data want; array _dt {3000}; ** Expect <= 3000 trans per id **; array _amt {3000};

total_amt24=0; ** Initialize totals for each ID **; n_trans24=0; cutoff_i=1; ** Index of earliest within-range datetime **;

do i=1 by 1 until (last.id); set have; by id; _dt{i}=dt; ** Populate Arrays **; _amt{i}=amt;

n_trans24=n_trans24+1; ** Add new trans to running totals **; total_amt24=total_amt24+amt;

** Find earliest array element still within 24 hours **; do j=cutoff_i to i while (dt>_dt{j}+86400); n_trans24=n_trans24-1; ** Delete expired activity **; total_amt24=total_amt24 - _amt{i}; end; cutoff_i=j; ** Update the CUTOFF value **; drop _: i cutoff_i j; output; end; run;

----+----|----+----|----+----|----+----|----+----|----+----|----+----| Note: This is yet another application of the DOW (Do-Whitlock) loop.

Every time a transaction is read, the following happens: 1. The transaction is added to the totals. 2. The "Ith" transaction is placed in the "Ith" array element. 3. A new cutoff point (first array element within the 24-hour range) is located. 3a. While the cutoff location is determined, each newly- disqualified transaction is removed from the total. 4. output.

Array elements are automatically reset to missing by SAS every time a new ID is started, so the RETAIN statement would be superfluous and possibly conterproductive.

If your data are not sorted by ID DT, and pre-sorting is not feasible, then consider this:

If the data is pre-sorted by DT, let us know, because a hash table (actually a hash-of-hashes) approach might be considered.

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Denise Kruse > Sent: Thursday, October 21, 2010 10:48 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Transaction level processing with rolling window of time > > Hello SAS-L, > I am working on making the following PROC SQL code more efficient (that > may mean > removing it from PROC SQL). The query works but slows down > significantly when > dealing with more than 20K transactions. As I started to think of > different > ways to process this data (set/set, first/last, merge?) it ocurred to > me that > some of you may have very creative solutions. > > Here is some more detail on the goal: > The transaction data (VISA table) has columns: Account #, Tran Amt, > Dt_time and > I want to calculate and display the summary data in the last two > columns. The > tricky part is that I only want to sum up transactions that occurred in > the last > 24 hours. So, it's a rolling window of time. > > I mocked up an example below. It shows a number of transactions that > took place > on one account. > > > The two columns I'm trying to append are in bold. For each transaction, > they > count the number and total dollar amount of transactions that occurred > in the > preceding 24 hours (including the current transaction). > > ACCOUNT # Tran Amt Dt_time #of trans in last 24 hours SUM AMT of trans > in last > 24 hours > > 11112222 10 01OCT2010:17:51:23 1 10 > 11112222 21.11 02OCT2010:05:51:24 2 31.11 > 11112222 14.71 05OCT2010:01:47:01 1 14.71 > 11112222 48.47 05OCT2010:07:11:41 2 63.18 > 11112222 91.08 05OCT2010:10:17:58 3 154.26 > 11112222 35.68 05OCT2010:10:18:51 4 189.94 > 11112222 18.98 06OCT2010:03:44:37 4 194.21 > 11112222 42.79 07OCT2010:17:51:30 1 42.79 > 11112222 32.81 10OCT2010:23:13:31 1 32.81 > > Here is the current code: > proc sql; > create table tmp2 as > select curr.*, > (select sum(TRANAMT) > from visa prev > where curr.osn=prev.osn and prev.dt_tm between > intnx('SECOND', > curr.dt_tm ,-86400) and curr.dt_tm > ) as tran_amt_24, > > (select count(*) > from visa prev > where curr.osn=prev.osn and prev.dt_tm between > intnx('SECOND', > curr.dt_tm ,-86400) and curr.dt_tm > ) as tran_num_24 > from visa curr; > quit; > > Thanks for any ideas you choose to share! > > Denise > >


Back to: Top of message | Previous page | Main SAS-L page