Date: Tue, 16 Oct 2001 15:43:39 -0400
Reply-To: Edward Heaton <HEATONE@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Edward Heaton <HEATONE@WESTAT.COM>
Subject: Re: Merge w/o sort first
Content-Type: text/plain; charset="iso-8859-1"
Okay, I've been away from SAS-L for a few days, but I want to jump in
here. I think the problem is one of semantics. SAS will attempt to sort
the data if it is already in order but not sorted. (Sorted is the past
tense of sort, and requires that a sort has happened. In order simply
refers to the condition of the data.)
Of course, we can always lie to SAS and flag a data set as sorted, or
flag it as not sorted even if it is sorted. For that matter, we can flag
the data set as being sorted by VAR1, and by VAR2 within each level of VAR1,
even if the data is sorted by VAR3 without respect to VAR1 or VAR2, and SAS
will not execute a PROC SORT ; BY VAR3 ; RUN ;.
So,
1. SAS really has no way of knowing if the data in in order without
performing a sort, regardless of what the manual says.
2. SAS will check to see if the header CLAIMS that the data has been
ordered. If the claim is yes, SAS will not run PROC SORT.
3. Even if the data is in order - even if it was put in order by PROC SORT -
SAS will attempt the sort if the SORTEDBY option claims that it was not put
in the correct order by SAS.
Ed
Edward Heaton, Senior Systems Analyst,
Westat (An Employee-Owned Research Corporation),
1550 Research Boulevard, Room 2018, Rockville, MD 20850-3195
Voice: (301) 610-4818 Fax: (301) 294-3992
mailto:EdwardHeaton@westat.com http://www.westat.com
-----Original Message-----
From: Michael Gibson [mailto:michael_gibson@NOSPAM.STORTEK.COM]
Sent: Friday, October 12, 2001 4:18 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Merge w/o sort first
Lets go RFTM again Proc sort will not sort the data set if it is already
sorted.
Just my 2 cents
So I suggest ALWAYS putting a Proc Sort before doing a merge.
Just my 2 cents
Michael Gibson
----------From the SAS 8.1 documentation on the proc sort procedure -------
Stored Sort Information
PROC SORT records the BY variables, collating sequence, and character set
that it uses to sort the data set. This information is stored with the data
set to help avoid unnecessary sorts.
Before PROC SORT sorts a data set, it checks the stored sort information. If
you try to sort a data set the way that it is currently sorted, PROC SORT
does not perform the sort and writes a message to the log to that effect. To
override this behavior, use the FORCE option. If you try to sort a data set
the way that it is currently sorted and you specify an OUT= data set, PROC
SORT simply makes a copy of the DATA= data set.
To override the sort information that PROC SORT stores, use the _NULL_ value
with the SORTEDBY= data set option. For information about SORTEDBY=, see the
section on data set options in SAS Language Reference: Dictionary.
If you want to change the sort information for an existing data set, use the
SORTEDBY= data set option in the MODIFY statement in the DATASETS procedure.
To access the sort information that is stored with a data set, use the
CONTENTS statement in PROC DATASETS. For details, see
"wei cheng" <cheng_wei@HOTMAIL.COM> wrote in message
news:LAW2-F27NeOckKacClH0000373e@hotmail.com...
> Thanks for all the answers. Especially Ian Whitlock's answer. I think he
can
> read my mind.:) I know what to do now. At least, I corrected my false
> understanding that "must be sorted" means "marked sorted" (Thanks to
Karsten
> M. Self). Where did I got that understanding though? :)
>
> Learning from SAS-L everyday!
>
> Wei Cheng
> =================================================================
> http://www.geocities.com/prochelp
> INTERNET and Web Resources for SAS Programmers and Statisticians
> =================================================================
>
>
>
> >From: Ian Whitlock <WHITLOI1@WESTAT.com>
> >To: 'wei cheng' <cheng_wei@HOTMAIL.COM>, SAS-L@LISTSERV.UGA.EDU
> >Subject: RE: Merge w/o sort first
> >Date: Wed, 10 Oct 2001 15:21:26 -0400
> >
> >Wei,
> >
> >The phrase "must be sorted" does not mean PROC SORT must be used. It
only
> >means the observations must be in order, because SAS checks to insure
that
> >they are in order, because the correctness of the merge process depends
on
> >the order.
> >
> >Now for the question should you always use PROC SORT before a merge. I
> >think it depends on
> >
> > 1) your knowledge of the data
> > 2) the consequences of being wrong
> > 3) the cost of an unneeded sort
> >
> >If I create the data I usually know what order it is in and do not use
PROC
> >SORT unless needed. If the data comes from someone else, I usually ask
or
> >test. If I find it in order then I assume it will be in order when I
run.
> >Typically the cost for being wrong is less than minute and I can easily
> >afford it. If I were writing programs that would result in my being
hauled
> >out of bed at 3:00AM when a merge failed, I would change my policy with
> >respect to data created by a source other than myself. If the cost of
> >doing
> >
> >an extra sort was several hundred dollars or hours of run time, I might
> >change that policy again.
> >
> >I suggest you explain the situation to your colleague. If she is
satisfied
> >with the consequences of her choice, then fine. If you are not satisfied
> >with the consequences then change your policy. If you are managing the
> >colleague and not satisfied with her choice then you can add your own
> >consequences and ask again. Just remember that your decisions and
requests
> >also have consequences.
> >
> >IanWhitlock@westat.com
> >
> >-----Original Message-----
> >From: wei cheng [mailto:cheng_wei@HOTMAIL.COM]
> >Sent: Wednesday, October 10, 2001 1:46 PM
> >To: SAS-L@LISTSERV.UGA.EDU
> >Subject: Re: Merge w/o sort first
> >
> >
> >Hi there,
> >
> >Thanks for all the thoughts. But let's RFTM:
> >
> >SAS OnlineDoc, V8 SAS Language Reference: under MERGE statement
> >
> >Match-Merging
> >
> >Match-merging combines observations from two or more SAS data sets into a
> >single observation in a new data set according to the values of a common
> >variable. The number of observations in the new data set is the sum of
the
> >largest number of observations in each BY group in all data sets. To
> >perform
> >a match-merge, use a BY statement immediately after the MERGE statement.
> >The
> >variables in the BY statement must be common to all data sets. Only one
BY
> >statement can accompany each MERGE statement in a DATA step. ----- (Read
> >here) ----The data sets that are listed in the MERGE statement must be
> >sorted in order of the values of the variables that are listed in the BY
> >statement, or they must have an appropriate index. (Snip)
> >
> >Let's forget the index here (suppose the data set has no index).Does the
> >"sorted in order of the values... " mean you don't need the data set be
> >marked sorted (Karsten M. Self: dataset has been ordered by a SORT,
other
> >proc output, or a dataset with BY processing, and is so marked: marked
> >sorted.)? Of course I won't sort the data set again if it is marked
sorted
> >for the BY variables before the MERGE.
> >
> >From all the answers, it seems if the data set is in order already w/o
> >marked sorted ( Karsten M. Self: naturally collated: collated.), the
MERGE
> >BY will works fine. Then what should I tell my colleague about what
should
> >she do? Let she do as she always did for omitting the SORT process.She is
a
> >junior level SAS programmer, but she said:"Since SAS runs correctly, why
> >should we sort it if it does not have a sorted mark but collated?"
> >
> >Thanks again for your comments.
> >
> >Wei Cheng
> >=================================================================
> >http://www.geocities.com/prochelp
> >INTERNET and Web Resources for SAS Programmers and Statisticians
> >=================================================================
> >
> >
> >
> > >From: "Lambert, Bob" <Bob_Lambert@AFCC.COM>
> > >Reply-To: "Lambert, Bob" <Bob_Lambert@AFCC.COM>
> > >To: SAS-L@LISTSERV.UGA.EDU
> > >Subject: Re: Merge w/o sort first
> > >Date: Wed, 10 Oct 2001 12:06:50 -0500
> > >
> > >Tom Mendicino wrote:
> > >
> > ><snip>
> > > > One of the main tasks of a programmer is to
> > > > try and anticipate "land mines" and develop routines which avoid
them.
> > ><snip>
> > >
> > >"Main tasks" are determined by the programmer's manager. Typically,
> >these
> > >are, e.g., "Produce reports as required". Especially in SAS, the
"how"
> >is
> > >left to the programmer. The programmer of discussion here probably
lacks
> > >your programming knowledge and background and is being successful with
> >her
> > >skillset and is meeting the "minimum job standards". Without changes
in
> > >her
> > >current environment (Skinner, not Gestalt), no changes in her
> >(programming)
> > >behavior are expected. The responsiblity of environmental changes
> >belongs
> > >to her manager. Unfortunately, many a manager of SAS programmers has
no
> >or
> > >little experience with SAS and haven't a clue as to what needs to be
> > >changed. As long as reports are being produced, everything is fine.
> > >
> > >So your statement is correct from a programmer's standpoint -- but
> >perhaps
> > >not from a manager's.
> > >
> > >One more thing -- Somebody once told me, "Every system is perfectly
> > >designed for its output."
> > >
> > >hth
> > >
> > >Bob Lambert
> >
> >
> >_________________________________________________________________
> >Get your FREE download of MSN Explorer at
http://explorer.msn.com/intl.asp
>
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
|