LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 20 May 2010 10:40:37 -0600
Reply-To:     Alan Churchill <alan.churchill@SAVIAN.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Alan Churchill <alan.churchill@SAVIAN.NET>
Subject:      Re: Tools to visualize dataset dependencies?
Content-Type: text/plain; charset="us-ascii"

Cool idea. Hard as hell to do.

I started work on it a few years ago and realized the complexity and also the sheer uselessness of it. SAS programs tend to be linear so all of the processes were coming out as columns of one step after another. There is also no consistent way to output a SAS dataset from 1 proc to another.

This is an exercise in parsing and the SAS language is very, very difficult to parse. You lay out a simple example but it doesn't look that way in the real world.

The log, btw, is a better place to tackle this issue IMO since the parsing has already occurred.

Alan

Alan Churchill Savian Work: 719-687-5954 Cell: 719-310-4870

-----Original Message----- From: W. Matthew Wilson [mailto:matt@TPLUS1.COM] Sent: Thursday, May 20, 2010 8:32 AM Subject: Tools to visualize dataset dependencies?

I inherited some REALLY long SAS programs that use lots and lots of data steps and I'm having a hard time keeping it all in my brain.

I'm a big fan of dot (http://graphviz.org) and I would like to use it to graph the dependencies. Has anyone done anything like this?

For example, I want to translate the SAS code below:

data b; set a; /* skip lots of variable assignments here */ run;

proc summary data=b; /* skip various options here */ output out=c; run;

data e; merge c d; run;

Into something like this dot syntax:

digraph G { a -> b [label="data step"]; b -> c [label="proc summary"]; c -> e [label="data step"]; d -> e [label="data step"]; };

And then dot will make a purty picture, like this one :http://scratch.tplus1.com/scratch.png

When I look at that picture, it is obvious to me that the two input datasets that must already exist for this code are a and d. That fact is NOT obvious when I read the code, especially since I really have > 50 intermediate data steps in this program and at least a dozen prerequisite datasets.

Is there already a tool to visualize dependencies like this? Does anyone have any other ideas for how to attack this problem?

Thanks in advance.

-- W. Matthew Wilson http://tplus1.com


Back to: Top of message | Previous page | Main SAS-L page