|
proc surveyselect data=have out=need rate=1 rep=100;
run;
On 4/29/09, Søren Lassen <s.lassen@post.tele.dk> wrote:
> Gerhard,
> You are absolutely right. Still, I think Dan has a point,
> as PROC APPEND basically is faster than the data step.
>
> Therefore, if the input data set is large enough,
> the repeated APPEND solution may be faster than any
> of the data step solutions.
>
> Regards,
> Søren
>
> On Wed, 29 Apr 2009 05:39:11 -0400, Gerhard Hellriegel
> <gerhard.hellriegel@T-ONLINE.DE> wrote:
>
> >I meant that statement:
> >
> >data A;
> > set A A A . .... ..................;
> >run;
> >
> >That is not optimal. I once cased a IPL on a production mainframe with a
> >set with 50 input datasets with around 400 variables. That caused a memory
> >overflow and went into a loop, unfortunately just in the moment when a
> >message was sent to the system console... That message blocked the console
> >(think one of the 10 billions must have been bad...).
> >If you try that out with sashelp class, you see, that it will last a long
> >time to allocate all the buffers.
> >Using
> >
> >set a
> > a open=defer
> > a open=defer
> > ...
> > ;
> >
> >opens all the dataset in sequence. Only one buffer is needed. It is
> >necessary that all datasets have the same structure for that.
> >That has advantages over PROC APPEND, because the overhead for starting
> >the PROC is avoided.
> >
> >With 100 datasets and short obs you will see no difference. If you use
> >1000 or more that could be seen.
> >
> >options fullstimer;
> >%macro test(iterate,ds);
> >data a;
> > set
> > %do i=1 %to &iterate;
> > &ds open=defer
> > %end;
> > ;
> >run;
> >%mend;
> >%test(1000,sashelp.class);
> >
> >Gerhard
> >
> >
> >
> >
> >
> >On Wed, 29 Apr 2009 05:01:25 -0400, S=?ISO-8859-1?Q?=C3=B8ren?= Lassen
> ><s.lassen@POST.TELE.DK> wrote:
> >
> >>Gerhard,
> >>I do not suggest using "set statement with 100 input datasets". I suggest
> >>iterating over the same set statement a number of times.
> >>
> >>Why so complicated? Because the original poster wanted that order
> >>of the obs.
> >>
> >>I think that Dan is right - as the size of the input data set grows,
> >>the advantage of using a single data step decreases, and may
> >>eventually disappear. On the other hand, I still prefer this log
> >>entry (the times were for the original 3 obs. sample data set):
> >>
> >>NOTE: The data set WORK.WANT has 300 observations and 2 variables.
> >>NOTE: DATA statement used:
> >> real time 0.00 seconds
> >> cpu time 0.00 seconds
> >>
> >>to parsing a log with 100 notes about PROC APPEND.
> >>
> >>But of course, if the order of the observations is not
> >>important, your suggestion is probably to be preferred.
> >>
> >>Regards,
> >>Søren
> >>
> >>On Wed, 29 Apr 2009 04:44:37 -0400, Gerhard Hellriegel
> >><gerhard.hellriegel@T-ONLINE.DE> wrote:
> >>
> >>>I'd not use a set statement with 100 input datasets! Note that SAS
> >creates
> >>>a buffer for each dataset which costs a lot of memory and a lot of CPU
> >>>time.
> >>>If you want to do that, use at least open=defer as option for each
> >>dataset.
> >>>
> >>>One question: why so complicated? The following does the same, only with
> >>>another order for the obs:
> >>>
> >>>data a;
> >>> set sashelp.class;
> >>> do i= 1 to 100;
> >>> output;
> >>> end;
> >>> drop i;
> >>>run;
> >>>
> >>>Gerhard
> >>>
> >>>
> >>>
> >>>On Wed, 29 Apr 2009 01:15:37 -0700, Daniel Nordlund
> >>><djnordlund@VERIZON.NET> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> >>>>> Behalf Of S øren Lassen
> >>>>> Sent: Tuesday, April 28, 2009 11:55 PM
> >>>>> To: SAS-L@LISTSERV.UGA.EDU
> >>>>> Subject: Re: How to create a dataset by appending a single
> >>>>> (i.e same) dataset multiple times.??
> >>>>>
> >>>>> While I generally recommend using proc append for appending data,
> >>>>> there are limits - running the append procedure one hundred times
> >>>>> after each other will cost a lot of overhead compared to this
> >>>>> solution:
> >>>>>
> >>>>> data want;
> >>>>> do __i=1 to 100;
> >>>>> do __p=1 to __n;
> >>>>> set A nobs=__n point=__p;
> >>>>> output;
> >>>>> end;
> >>>>> end;
> >>>>> stop;
> >>>>> drop __:;
> >>>>> run;
> >>>>>
> >>>>> Regards,
> >>>>> Søren
> >>>>>
> >>>>> On Tue, 28 Apr 2009 23:24:43 -0700, pinu
> >>>>> <amarmundankar@GMAIL.COM> wrote:
> >>>>>
> >>>>> >There is a dataset A as;
> >>>>> >id num
> >>>>> >1 11
> >>>>> >2 22
> >>>>> >3 33
> >>>>> >Now I want to create a dataset named A which will consists
> >>>>> of records
> >>>>> >from A appended 100 times.
> >>>>> >Sample o/p of Dataset A will be:
> >>>>> >1 11
> >>>>> >2 22
> >>>>> >3 33
> >>>>> >1 11
> >>>>> >2 22
> >>>>> >3 33
> >>>>> >..
> >>>>> >..
> >>>>> >..
> >>>>> >..
> >>>>> >Is there any other way than using the set statement and writing A
> 100
> >>>>> >times after that
> >>>>> >e.g. . data A;
> >>>>> > set A A A . .... ..................;
> >>>>> > run;
> >>>>
> >>>>S�ren,
> >>>>
> >>>>I ran a few quick and dirty tests. The first test used a file of 100
> >>>>records. The second used a file with 100000 records. With the small
> >>>file,
> >>>>the proc append solution ran in approx 3 seconds (real time), your set
> >>>with
> >>>>point solution ran in .09 seconds. With the large file, the proc
> append
> >>>>solution ran in 25-30 seconds and the set with point solution ran in
> 15-
> >>30
> >>>>seconds. This is not conclusive because it was a quick and dirty test.
> >>>>
> >>>>But I did notice two points of interest. First, as one might expect,
> it
> >>>>appears that the overhead of proc append will become a small percentage
> >>of
> >>>>the overall processing time as file size increases. Second, the times
> >>for
> >>>>both methods were quite variable, probably due to a variety of
> >background
> >>>>tasks (I am running on a WinXP system). But it was interesting that
> the
> >>>>individual times for proc append with the 100k record file varied
> >between
> >>>>.04 and 3.1 seconds. It would seem that the proc append could
> >>>theoretically
> >>>>finish in as little as 4 seconds. So the overhead of running proc
> >append
> >>>>may not rule out using it 100 times. The variability of these times
> >will
> >>>>probably vary across systems depending on amount of ram (and how the OS
> >>>>manages it), type of file system, background activity, etc.
> >>>>
> >>>>I may may try to benchmark this a little more carefully to get a better
> >>>>assessment of the timings for these two approaches. I would be
> >>interested
> >>>>in your comments (others feel free to jump in here as well).
> >>>>
> >>>>Dan
> >>>>
> >>>>Daniel Nordlund
> >>>>Bothell, WA USA
>
|