LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2011, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 20 May 2011 13:33:24 -0500
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: Does SAS really process data one record at a time?
Comments: To: Gerhard Hellriegel <gerhard.hellriegel@t-online.de>
In-Reply-To:  <201105201736.p4KArbX1002696@willow.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1

LAG2/etc. are not relevant for this discussion (except as an example of the other weird things that may happen in SAS). LAG/LAG2/etc. do not operate on the previous records at all, but create a new stack (separate from the location of the read-in set data).

-Joe

On Fri, May 20, 2011 at 12:36 PM, Gerhard Hellriegel < gerhard.hellriegel@t-online.de> wrote:

> I think there must be a buffer for the last "few" records. Physical that > is right for sure: the input buffer contains much more than only one > record. > Let's have a kind of logical view on it: > > for the mentioned first / last logic, what's needed? > first FIRST: if a new record is filled in PDV, the BY variable has to be > compared to the previous content. To do that, the previous record must be > anywhere (ok, not the whole record, in our case only the by list, but it > seems to be easier to store the whole record. I don't think, that the > records are moved through memory, but only a pointer change is necessary > to get it. Means, the current pointer contains the start address of the > actual (new) record and the previous pointer is still present.) > Comparing the by-variables sets the first.variable(s) to 1 or 0. > When the next record is read, the "old" pointer and it's memory can be > reused for the next content, it is not needed any more. (in truth it IS > needed and cannot be overwritten - think of LAG2! And how Mike pointed > out, there are much more in a buffer and each obs gets his own pointer) > Do we need also a memory place for the "next obs"? I don't think so for > FIRST. > Now LAST: that cannot be accomplished by comparing with the previous > content. The information must come from comparing to the next record. So > for that, the following record must be present while the current is > processed. > > So we need at least 3 records in memory, the previous, the current and the > next. For other purpose, e.g. for the LAG2 logic, some more. > > That influences also the buffer management: the idea of buffers is to > avoid slow sequential reading and writing. Reading and writing 1000 > records in one block is much faster than reading / writing 1000 times only > one. Also it is possible to work with a buffer at the same time when > another is read or written. It is not necessary to wait until the IO > operation is finished. > What might be a problem is: what do you do when a buffer is at his end, > means, you reach the last record. If you now switch to another buffer and > the old one is overwritten with another (new) 1000 records, the previous > record is gone as soon as the buffer is switched. So the refilling of a no- > longer-needed buffer is not a easy thing. There must be always a small > amount of memory for each buffer to contain some records at the start- > point and for the end. > > Another thing is, the management must always "know" which obs are > currently in memory. If you use direct access with point=, the management > must decide if the needed obs is already in buffer, or is still on disk, > which causes a refilling of a buffer. > In that case it is not sure what's better: very big buffers to get a > better chance of hits, or smaller buffer to reduce IO time. > For "normal" sequential access, which in 99.9% of all SAS programs is > used, some rather big buffers are good. Also if direct access is needed > (think SQL might need such things) and the tables are not too big. You > have the chance to get whole datasets in buffer. > > By the way: on mainframes you should use "half-track" buffers which is > related to the IO routines on the (in our days virtual) DASD. But I don't > think you talk about zOS - SAS... > > Gerhard > > > > > > > On Fri, 20 May 2011 12:52:52 +0000, Michael Raithel > <michaelraithel@WESTAT.COM> wrote: > > >Dear SAS-L-ers, > > > >Haikuo posted the following: > > > >> Dear SAS_Lers, > >> > >> I am sorry if this question has been asked before. > >> > >> I have been taught many times that SAS deals with data on "one record" > >> basis. This can be seen in the mechanism of PDV among others. What > >> puzzles me is that SAS behavior seems changed after "by" statement. In > >> short, how could SAS determine the value of "FIRST.VARIBLE" or > >> "LAST.VARIBLE" without foreseeing the next record? > >> > >Haikuo, I think that only the brainiacs at the SAS Institute can answer > your question properly, but the true answer likely lies in the way that > SAS reads SAS data sets into memory. > > > >The BUFNO= option specifies the number of buffers that SAS allocates for > each data set it opens (CBUFNO= is the related option for SAS catalogs). > In a SAS data set, observations are stored on data set "pages", which are > actually data "blocks" on your storage media. Each of the buffers that is > allocated in memory via the CBUFNO= option is the size of the page of the > related data set. (See the BUFSIZE= and CBUFSIZE= options for SAS data > set and SAS catalog page size, respectively). > > > >When SAS reads a SAS data set, it reads pages from the media and > transfers them into the buffers in memory. So, SAS has scores, dozens, > hundreds, thousands, etc.--depending upon the page size and the > observation size--of observations in memory after the transfer. > Thereafter, individual observations are read into/built in the area of > memory known as the PDV. Nonetheless, SAS has access to all of the > observations that are currently in the buffers in memory. So, that is > likely where all of the action is taking place with SAS doing the FIRST. > and LAST. processing. > > > >This is a fascinating question, and I would love for a Birdie with some > insight to set the record straight. Hint... hint... > > > >Haikuo, best of luck in all your SAS endeavors! > > > >I hope that this suggestion proves helpful now, and in the future! > > > >Of course, all of these opinions and insights are my own, and do not > reflect those of my organization or my associates. All SAS code and/or > methodologies specified in this posting are for illustrative purposes only > and no warranty is stated or implied as to their accuracy or > applicability. People deciding to use information in this posting do so at > their own risk. > > > >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >Michael A. Raithel > >"The man who wrote the book on performance" > >E-mail: MichaelRaithel@westat.com > > > >Author: Tuning SAS Applications in the MVS Environment > > > >Author: Tuning SAS Applications in the OS/390 and z/OS Environments, > Second Edition > >http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172 > > > >Author: The Complete Guide to SAS Indexes > >http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409 > > > >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >SAS buffers are the best performance tools in recent memory. - Michael A. > Raithel > >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >


Back to: Top of message | Previous page | Main SAS-L page