Date: Fri, 17 Jul 2009 10:46:19 -0700
Reply-To: "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Subject: Re: Best way to accumulate routine data?
In-Reply-To: A<037AB3FF38D44C4BAFB5DFF3D06B57BA07C540ED@EX-CMS01.westat.com>
Content-Type: text/plain; charset="us-ascii"
Hi Ya,
Michael's suggestion using the MODIFY is
one of many datastep or SQL techniques
supporting your Solution#1 of using a single
large dataset. I, and many others I know,
when this same question arises, usually
alway favor the single maintained dataset.
The "maintained" part is the same as Michael's
"but with a twist". Whether the values get
overwritten for the duplicate rows or the rows
get deleted and re-appended is just which is
best in your situation. Lots of different lingo
people apply to this: Kill-N-Fill, Drop&Insert,
Delete&Add, ..., etc. All amount to the same
thing, ... including Michael's
"Use Solution#1, but with a twist"
:o)
The bottom line is -- lean toward the single
large maintained dataset.
Hope this is helpful.
Mark Terjeson
Investment Business Intelligence
Investment Management & Research
Russell Investments
253-439-2367
Russell
Global Leaders in Multi-Manager Investing
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Michael Raithel
Sent: Friday, July 17, 2009 10:31 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Best way to accumulate routine data?
Dear SAS-L-ers,
Ya Huang posted the following interesting question:
> Hi there,
>
> My situation is that I have small datasets generated daily
> with some statistics in it. They have to be accumulated to
> server as historic data, so that each new data come in can be
> compared with the historic data for trending analysis.
>
> Two ways I'm think about are: (1) keep only one permanent
> dataset, use proc append to keep patching the new dataset to
> it. 2. Keep all the small dataset separate, and combine them
> only when trending is needed.
>
> For (1), my concern is that sometimes, the small dataset is a
> repeat run, proc append will create duplication. Another
> concern is that if something goes wrong and somehow corrupt
> the big dataset, I'll lose all the historic data.
>
> For (2), I don't have to worry about the repeat run, since
> the dataset name for repeat run is the same, so it will
> overwrite the old version. My concern is that the number of
> dataset will go too big, and it will be less efficient down the road.
>
> Any comments? Or maybe even better solution.
>
Ya, interesting problem, and I like your thinking on this! I would (and
actually do, in such situations) favor solution #1, but with a twist.
To guard against multiple runs that could append the same "Transaction
Data Set" to the "Master Data Set" again and again, I prefer to use
MODIFY instead of APPEND. So, my programs tend to look like this:
* Sort the data by product date/time/userid/pid/product. *;
proc sort data=linuxact;
by year month sysid userid;
run;
* Update accounting production SAS data set. *;
data prodfile.linuxact;
modify prodfile.linuxact linuxact;
by year month sysid userid;
if _iorc_= 0 then replace;
else output;
run;
...where LINUXACT is the Transaction SAS Data Set, PRODFILE.LINUXACT is
the Master SAS Data Set (that will be updated) and both data sets are
sorted by the same four "key" variables--WHICH MUST BE UNIQUE IN THE
MASTER SAS DATA SET FOR THIS SCHEME TO WORK RIGHT.
If SAS finds a match for the transaction observation key variables in
the Master SAS data set, it replaces the obs in Master with that in
Transaction. Otherwise, it "appends" the Transaction obs to the Master
SAS data set.
As for the issue of putting all of your informational eggs into one data
set basket--which could be electronically dropped--well, that is a
toughie, because any data set could be corrupted (if the drug cartels
offer them enough money). I would recommend that you have your Master
SAS Data set backed up frequently and/or you consider looking into
Generation Data Sets:
http://support.sas.com/documentation/cdl/en/lrdict/61724/PDF/default/lrd
ict.pdf
...See GENMAX and examples, starting on Page 57.
Ya, best of luck in all of your SAS endeavors!
I hope that this suggestion proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not
reflect those of my organization or my associates. All SAS code and/or
methodologies specified in this posting are for illustrative purposes
only and no warranty is stated or implied as to their accuracy or
applicability. People deciding to use information in this posting do so
at their own risk.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel@westat.com
Author: Tuning SAS Applications in the MVS Environment
Author: Tuning SAS Applications in the OS/390 and z/OS Environments,
Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172
Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
When you think of the long and gloomy history of man, you will
find more hideous crimes have been committed in the name of
obedience than have ever been committed in the name of
rebellion. - C.P. Snow
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++