Date: Wed, 6 Sep 2000 14:16:00 -0400
Reply-To: Lee Medoff <lmedoff@VANTAGETRAVEL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Lee Medoff <lmedoff@VANTAGETRAVEL.COM>
Content-Type: text/plain; charset="iso-8859-1"
Dear Ian, Paul, Gerhard, et al.,
Thanks for all of your help. I've come a long way with this project because
of all of the assistance you've offered. While it's not quite done, it's
almost there. (Famous last words.)
Please accept my mea culpa for airing my difficulties over email. I'm a
statistician by training, not a programmer, and yet I'm finding that I
nevertheless need to know more and more about data management: for
reporting as well as for the construction of data sets. Not knowing the
data management capabilities that SAS has to offer, I've been learning as I
go. Let me tell you, wading through volume after volume of of impenetrable
SAS manuals can bring about many a late night at the office. Anyhow, I did
eventually (and haphazardly) come to discover that PROC TRANSPOSE would be
ideally suited for handling this task.
So Doug's question is relevant for me, too. I'm off to NESUG in a few
weeks, hopefully to learn a thing or two about programming in general. Can
anyone offer other suggestions, in the way of resources, for learning SAS in
conjunction with programming skills?
Much thanks,
Lee Medoff
-----Original Message-----
From: Ian Whitlock [mailto:WHITLOI1@WESTAT.com]
Sent: Wednesday, September 06, 2000 10:52 AM
To: 'Lee Medoff'; SAS-L@LISTSERV.UGA.EDU
Cc: 'paul_dorfman@HOTMAIL.COM'; 'mikeh@AMGEN.COM';
'doug_zirbel@HOTMAIL.COM'
Subject: RE: Debugging question
Subject: Debugging question
Summary: Is it a debugging or an analysis question?
Respondent: Ian Whitlock <whitloi1@westat.com>
Last week Lee Medoff [lmedoff@VANTAGETRAVEL.COM] asked how to
correct a macro that he had almost working. Paul Dorfman
[paul_dorfman@HOTMAIL.COM] responded with a direct answer - use step
boundaries. Unfortunately this cheated me from understanding the
problem. In reading the macro, I had the feeling that something
simple was happening, but I couldn't see what it was.
Later Mike Harris [mikeh@AMGEN.COM] gave a rather cogent answer to
Doug Zirbel's [doug_zirbel@HOTMAIL.COM] question - What do
well-rounded SAS programmers need to know??
MH> Doug poses a question that can be answered on many different
MH> levels. I am expected to provide programming support in a
MH> variety of languages including SAS, Java, Java Script, HTML,
MH> bourne and C shell script, and Visual Basic, among others. The
MH> higher level question is "what should every professional
MH> programmer know?" From my perspective, a solid knowledge of
MH> general programming concepts, systems analysis, and software
MH> engineering are just as important as knowing constructs and
MH> techniques in a particular language. For all but the most
MH> trivial programming job, knowing how to collect clear,
MH> testable requirements is the single skill that is most likely
MH> to help you build the right thing. The next steps are knowing
MH> how to document the design of your program, then comes the
MH> actual coding. In my experience, many SAS programmers are
MH> clueless about the application of a documented software
MH> development life cycle to SAS programming. Without that, it is
MH> impossible to build valid SAS systems.
DZ> Have you noticed big knowledge-gaps in programmers who are no longer SAS
DZ> beginners, i.e., those who have been doing SAS for a year or more?
DZ>
DZ> What are those techniques that every professional SAS programmer should
DZ> master that they generally don't?
DZ>
DZ> I suspect that I have many such gaps myself. The more responses, the
DZ> better. It could be an eye-opener to many of us.
I think the original problem posed by Lee gives a good illustration
to the above remarks. First, in his words, the problem:
LM> I am attempting to do the following with this macro:
LM>
LM> (1) take an initial data set (backend.jobs) and, for each unique
"job"
LM> in the data set, create a subset (for each of x number of
"programs",
LM> within the job) of another pre-existing set of data (for each "job")
LM> (2) The final data step in the macro attempts to merge ALL programs
LM> within a given job, into a final data set, which contains
LM> summary information;
LM>
LM> All of this is done within two nested do loops. As of now, it's not
quite
LM> working. Anyone care to take a stab at helping me debug this?
LM> I'd most appreciate any help.
LM>
LM> %macro d;
LM> %let no=0;
LM> data _null_;
LM> set backend.jobs;
LM> call symput ("no",_n_);
LM> call symput ("j"||compress(put(_n_,7.)),job);
LM> %do i=1 %to &no;
LM> %let num=0;
LM> data _null_;
LM> set backend.&&j&i;
LM> call symput("num",_n_);
LM> call symput("k"||compress(put(_n_,3.)),program);
LM> run;
LM> %do p=1 %to #
LM> data backend.&&k&p;
LM> set backend.&&j&i;
LM> if program="&&k&p";
LM> data backend.&&k&p (drop=job1 job2 year circ
cost
LM> profit);
LM> set backend.&&k&p;
LM> data backend.&&k&p (rename=(pax=&&k&p.._Pax ));
LM> set backend.&&k&p;
LM> data backend.&&k&p (drop=pax program);
LM> set backend.&&k&p;
LM> data backend.&&j&i..m;
LM> merge backend.&&j&i..m backend.&&k&p;
LM> by keycode;
LM> run;
LM> %end;
LM> %end;
LM> %mend d;
LM> %d;
To understand the problem I constructed the following data as work
data sets:
data jobs ;
input job $ ;
cards ;
joba
jobb
;
data joba ;
input program $ pax keycode ;
cards ;
a1 1 1
a2 2 1
a1 3 2
;
data jobam ;
input keycode ;
cards ;
1
2
3
;
data jobb ;
input program $ pax keycode ;
cards ;
b1 4 3
b2 5 4
b2 6 5
;
data jobbm ;
input keycode ;
cards ;
2
3
4
;
I then modified the macro D to fit these data sets, fix mistakes, and
remove extraneous steps.
%macro d;
%let no=0;
data _null_;
set jobs end = eof ;
if eof then call symput ("no",_n_);
call symput ("j"||compress(put(_n_,7.)),compress(job));
run ;
%do i=1 %to &no;
%let num=0;
data _null_;
set &&j&i end = eof ;
if eof then call symput("num",_n_);
call symput("k"||compress(put(_n_,3.)),compress(program));
run;
%do p=1 %to #
data &&k&p (rename=(pax=&&k&p.._Pax )
drop = program) ;
*(drop=job1 job2 year circ cost profit);
set &&j&i;
if program="&&k&p";
run ;
data &&j&i..m;
merge &&j&i..m &&k&p;
by keycode;
run;
%end;
%end;
%mend d;
After running the macro, it became clear that this was a just a
series of transposes. Let's look at a macro to handle one job.
%macro job ( data = joba , keydata = jobam , out = tjobam ) ;
/* add column name as a function of PROGRAM */
data temp ;
set &data ;
_name_ = compress(program) || "_pax" ;
run ;
/* transpose the data */
proc transpose data = temp out = ttemp ( drop = _name_ ) ;
by keycode ;
var pax ;
id _name_ ;
run ;
/* merge with key data */
data &out ;
merge &keydata ttemp ;
by keycode ;
run ;
%mend job ;
In this form, even I can understand the problem. Now, the solution
is to generate the correct macro calls.
data _null_ ;
set jobs ;
call execute ( '%job(data = work.' || job
|| ", keydata = " || compress(job) || "m"
|| ", out = t" || compress(job) || "m )" ) ;
run ;
I agree with Mike that general analysis if very important, but as the
above problem illustrates, the analysis should be tempered with a
knowledge of the SAS tools available. No amount of C, Basic, and
Java; or general systems analysis, is going to lead one to say, "Oh
it's a transpose problem with CALL EXECUTE to generate the macro
invocations."
I am reminded of a story by Gerald Weinberg. In the early days of
computing when one took anyone off the street to make them a
programmer, the big question was - how do you spot ability when you
cannot depend on knowledge? Gerald's favorite answer: Give the
candidate ten shoe boxes and forty pages of paper with random words
written on them. Ask him to file away the pages in the shoe boxes.
Then a week later invite him back and time how long it takes him to
locate a given word. The shortest time shows the best chances of
finding a good programmer.
Ian Whitlock