LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 17 Jul 2009 10:46:19 -0700
Reply-To:     "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Subject:      Re: Best way to accumulate routine data?
In-Reply-To:  A<037AB3FF38D44C4BAFB5DFF3D06B57BA07C540ED@EX-CMS01.westat.com>
Content-Type: text/plain; charset="us-ascii"

Hi Ya,

Michael's suggestion using the MODIFY is one of many datastep or SQL techniques supporting your Solution#1 of using a single large dataset. I, and many others I know, when this same question arises, usually alway favor the single maintained dataset. The "maintained" part is the same as Michael's "but with a twist". Whether the values get overwritten for the duplicate rows or the rows get deleted and re-appended is just which is best in your situation. Lots of different lingo people apply to this: Kill-N-Fill, Drop&Insert, Delete&Add, ..., etc. All amount to the same thing, ... including Michael's "Use Solution#1, but with a twist" :o)

The bottom line is -- lean toward the single large maintained dataset.

Hope this is helpful.

Mark Terjeson Investment Business Intelligence Investment Management & Research Russell Investments 253-439-2367

Russell Global Leaders in Multi-Manager Investing

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Michael Raithel Sent: Friday, July 17, 2009 10:31 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Best way to accumulate routine data?

Dear SAS-L-ers,

Ya Huang posted the following interesting question:

> Hi there, > > My situation is that I have small datasets generated daily > with some statistics in it. They have to be accumulated to > server as historic data, so that each new data come in can be > compared with the historic data for trending analysis. > > Two ways I'm think about are: (1) keep only one permanent > dataset, use proc append to keep patching the new dataset to > it. 2. Keep all the small dataset separate, and combine them > only when trending is needed. > > For (1), my concern is that sometimes, the small dataset is a > repeat run, proc append will create duplication. Another > concern is that if something goes wrong and somehow corrupt > the big dataset, I'll lose all the historic data. > > For (2), I don't have to worry about the repeat run, since > the dataset name for repeat run is the same, so it will > overwrite the old version. My concern is that the number of > dataset will go too big, and it will be less efficient down the road. > > Any comments? Or maybe even better solution. > Ya, interesting problem, and I like your thinking on this! I would (and actually do, in such situations) favor solution #1, but with a twist.

To guard against multiple runs that could append the same "Transaction Data Set" to the "Master Data Set" again and again, I prefer to use MODIFY instead of APPEND. So, my programs tend to look like this:

* Sort the data by product date/time/userid/pid/product. *;

proc sort data=linuxact;

by year month sysid userid;

run;

* Update accounting production SAS data set. *;

data prodfile.linuxact;

modify prodfile.linuxact linuxact;

by year month sysid userid;

if _iorc_= 0 then replace;

else output;

run;

...where LINUXACT is the Transaction SAS Data Set, PRODFILE.LINUXACT is the Master SAS Data Set (that will be updated) and both data sets are sorted by the same four "key" variables--WHICH MUST BE UNIQUE IN THE MASTER SAS DATA SET FOR THIS SCHEME TO WORK RIGHT.

If SAS finds a match for the transaction observation key variables in the Master SAS data set, it replaces the obs in Master with that in Transaction. Otherwise, it "appends" the Transaction obs to the Master SAS data set.

As for the issue of putting all of your informational eggs into one data set basket--which could be electronically dropped--well, that is a toughie, because any data set could be corrupted (if the drug cartels offer them enough money). I would recommend that you have your Master SAS Data set backed up frequently and/or you consider looking into Generation Data Sets:

http://support.sas.com/documentation/cdl/en/lrdict/61724/PDF/default/lrd ict.pdf

...See GENMAX and examples, starting on Page 57.

Ya, best of luck in all of your SAS endeavors!

I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Michael A. Raithel "The man who wrote the book on performance" E-mail: MichaelRaithel@westat.com

Author: Tuning SAS Applications in the MVS Environment

Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition

http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172

Author: The Complete Guide to SAS Indexes

http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ When you think of the long and gloomy history of man, you will find more hideous crimes have been committed in the name of obedience than have ever been committed in the name of rebellion. - C.P. Snow +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Back to: Top of message | Previous page | Main SAS-L page