| Date: | Fri, 20 May 2011 13:33:24 -0500 |
| Reply-To: | Joe Matise <snoopy369@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Joe Matise <snoopy369@GMAIL.COM> |
| Subject: | Re: Does SAS really process data one record at a time? |
|
| In-Reply-To: | <201105201736.p4KArbX1002696@willow.cc.uga.edu> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
|---|
LAG2/etc. are not relevant for this discussion (except as an example of the
other weird things that may happen in SAS). LAG/LAG2/etc. do not operate on
the previous records at all, but create a new stack (separate from the
location of the read-in set data).
-Joe
On Fri, May 20, 2011 at 12:36 PM, Gerhard Hellriegel <
gerhard.hellriegel@t-online.de> wrote:
> I think there must be a buffer for the last "few" records. Physical that
> is right for sure: the input buffer contains much more than only one
> record.
> Let's have a kind of logical view on it:
>
> for the mentioned first / last logic, what's needed?
> first FIRST: if a new record is filled in PDV, the BY variable has to be
> compared to the previous content. To do that, the previous record must be
> anywhere (ok, not the whole record, in our case only the by list, but it
> seems to be easier to store the whole record. I don't think, that the
> records are moved through memory, but only a pointer change is necessary
> to get it. Means, the current pointer contains the start address of the
> actual (new) record and the previous pointer is still present.)
> Comparing the by-variables sets the first.variable(s) to 1 or 0.
> When the next record is read, the "old" pointer and it's memory can be
> reused for the next content, it is not needed any more. (in truth it IS
> needed and cannot be overwritten - think of LAG2! And how Mike pointed
> out, there are much more in a buffer and each obs gets his own pointer)
> Do we need also a memory place for the "next obs"? I don't think so for
> FIRST.
> Now LAST: that cannot be accomplished by comparing with the previous
> content. The information must come from comparing to the next record. So
> for that, the following record must be present while the current is
> processed.
>
> So we need at least 3 records in memory, the previous, the current and the
> next. For other purpose, e.g. for the LAG2 logic, some more.
>
> That influences also the buffer management: the idea of buffers is to
> avoid slow sequential reading and writing. Reading and writing 1000
> records in one block is much faster than reading / writing 1000 times only
> one. Also it is possible to work with a buffer at the same time when
> another is read or written. It is not necessary to wait until the IO
> operation is finished.
> What might be a problem is: what do you do when a buffer is at his end,
> means, you reach the last record. If you now switch to another buffer and
> the old one is overwritten with another (new) 1000 records, the previous
> record is gone as soon as the buffer is switched. So the refilling of a no-
> longer-needed buffer is not a easy thing. There must be always a small
> amount of memory for each buffer to contain some records at the start-
> point and for the end.
>
> Another thing is, the management must always "know" which obs are
> currently in memory. If you use direct access with point=, the management
> must decide if the needed obs is already in buffer, or is still on disk,
> which causes a refilling of a buffer.
> In that case it is not sure what's better: very big buffers to get a
> better chance of hits, or smaller buffer to reduce IO time.
> For "normal" sequential access, which in 99.9% of all SAS programs is
> used, some rather big buffers are good. Also if direct access is needed
> (think SQL might need such things) and the tables are not too big. You
> have the chance to get whole datasets in buffer.
>
> By the way: on mainframes you should use "half-track" buffers which is
> related to the IO routines on the (in our days virtual) DASD. But I don't
> think you talk about zOS - SAS...
>
> Gerhard
>
>
>
>
>
>
> On Fri, 20 May 2011 12:52:52 +0000, Michael Raithel
> <michaelraithel@WESTAT.COM> wrote:
>
> >Dear SAS-L-ers,
> >
> >Haikuo posted the following:
> >
> >> Dear SAS_Lers,
> >>
> >> I am sorry if this question has been asked before.
> >>
> >> I have been taught many times that SAS deals with data on "one record"
> >> basis. This can be seen in the mechanism of PDV among others. What
> >> puzzles me is that SAS behavior seems changed after "by" statement. In
> >> short, how could SAS determine the value of "FIRST.VARIBLE" or
> >> "LAST.VARIBLE" without foreseeing the next record?
> >>
> >Haikuo, I think that only the brainiacs at the SAS Institute can answer
> your question properly, but the true answer likely lies in the way that
> SAS reads SAS data sets into memory.
> >
> >The BUFNO= option specifies the number of buffers that SAS allocates for
> each data set it opens (CBUFNO= is the related option for SAS catalogs).
> In a SAS data set, observations are stored on data set "pages", which are
> actually data "blocks" on your storage media. Each of the buffers that is
> allocated in memory via the CBUFNO= option is the size of the page of the
> related data set. (See the BUFSIZE= and CBUFSIZE= options for SAS data
> set and SAS catalog page size, respectively).
> >
> >When SAS reads a SAS data set, it reads pages from the media and
> transfers them into the buffers in memory. So, SAS has scores, dozens,
> hundreds, thousands, etc.--depending upon the page size and the
> observation size--of observations in memory after the transfer.
> Thereafter, individual observations are read into/built in the area of
> memory known as the PDV. Nonetheless, SAS has access to all of the
> observations that are currently in the buffers in memory. So, that is
> likely where all of the action is taking place with SAS doing the FIRST.
> and LAST. processing.
> >
> >This is a fascinating question, and I would love for a Birdie with some
> insight to set the record straight. Hint... hint...
> >
> >Haikuo, best of luck in all your SAS endeavors!
> >
> >I hope that this suggestion proves helpful now, and in the future!
> >
> >Of course, all of these opinions and insights are my own, and do not
> reflect those of my organization or my associates. All SAS code and/or
> methodologies specified in this posting are for illustrative purposes only
> and no warranty is stated or implied as to their accuracy or
> applicability. People deciding to use information in this posting do so at
> their own risk.
> >
> >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Michael A. Raithel
> >"The man who wrote the book on performance"
> >E-mail: MichaelRaithel@westat.com
> >
> >Author: Tuning SAS Applications in the MVS Environment
> >
> >Author: Tuning SAS Applications in the OS/390 and z/OS Environments,
> Second Edition
> >http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172
> >
> >Author: The Complete Guide to SAS Indexes
> >http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409
> >
> >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >SAS buffers are the best performance tools in recent memory. - Michael A.
> Raithel
> >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
|