Date: Wed, 1 May 2002 23:39:42 GMT
Reply-To: julierog@ix.netcom.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Roger Lustig <trovato@BELLATLANTIC.NET>
Subject: Re: [DATA STEP] Zubrowka
Content-Type: text/plain; charset=us-ascii
Hi, Zubrowka!
First of all, do let us know how the different solutions perform.
700,000 records is enough to produce some interesting results. (You'll
also have to choose your formats and lengths carefully.)
To your questions:
--I don't understand either: why would one use LAG() unless it was
absolutely necessary?
--the carat (^) is a "not" sign.
--the second program is data-dependent. If values of COL are greater
than 6 characters long, you've got a problem. Consider:
COL
twenty
twenty
twentyone
twentytwo
.
.
But Paul was trying to solve a slightly different problem there, namely:
how to note that the current value of COL is the Nth different value to
have been encountered during a pass through the data.
Best,
Roger
zubrowka wrote:
>
> hi folks,
>
> I wanted to say a big thanxs to all of you. I feel real dumb when i
> see all your response. I understand nearly all the solutions. The
> point is that since i'm a student i dont have acces to all the
> documentation of the sas system, and i didn't new the notsorted
> option.
>
> I still dont understand
>
> data optionthree;
> set xx;
> length colid $8;
> collag=lag(col);
> if collag^=col then groupCount+1;
> colid=trim(col)||'_'||compress(put(groupCount,8.));
> keep col colid;
> run;
>
> I don't understand what the ^ for in the if collag^=col then
> groupCount+1; statement. Nor i noticed it doesn't do the trick when i
> dont put the ^ in the if statement. What does it do?
>
> Second,
>
> %let h = 200003 ;
>
> data w ( drop = j n );
> array c (0:&h) $ _temporary_ ;
> array x (0:&h) _temporary_ ;
> set q ;
> do j = mod(input(col,pib6.), &h) until ( c(j) = col ) ;
> if j = &h then j = 0 ;
> if x(j) = . then do ;
> n ++ 1 ;
> x(j) = n ;
> c(j) = col ;
> end ;
> end ;
> colid = trim(col) || '_' || put(x(j), best.-l) ;
> run ;
>
> This solution works fine to, but behind the fact in dont understand
> anything of the script, what the advantage of this solution?
>
> Last, my data have like 700000 lines and i wonder which solution will
> be the fastest?
>
> Again, thanxs to all of you, real professional response.
>
> Zubrowka
>
>
>
> On Wed, 01 May 2002 17:38:48 +0200, zubrowka <zubrowka@gmx.net> wrote:
>
> >Hi all,
> >
> >here is my small problem.
> >I have a table like that.
> >
> >obs col
> >1 one
> >2 one
> >3 one
> >4 two
> >5 two
> >6 two
> >7 one
> >8 one
> >9 three
> >10 three
> >
> >I want to obtain this
> >
> >
> > obs col colid
> >1 one one_1
> >2 one one_1
> >3 one one _1
> >4 two two_2
> >5 two two_2
> >6 two two_2
> >7 one one_3
> >8 one one_3
> >9 three three_4
> >10 three three_4
> >etc
> >
> >Obviously i cant do a proc sort by col because i will loose the order
> >of data, which is important. I didn't manage to find a solution. How
> >can i solve that.
> >
> >Thanxs in advance for replying.
> >
> >
> >Zubrowka
|