|
I have REALLY BIG file (14GB). I am doing this on the UNIX side (with
graphical environment, of course). It's essentially patient diagnosis data
on 14billion people (this is from the NIS over 6 years). In this file there
are 15 variables where they list diagnosis, labelled DX1-DX15. For the
analysis I am doing I need to create dummy variables for certain diagnoses.
For instance, if I want to index everyone who had a heart attack I would
search each DX variable for the code "410" (text) and if found then assign a
dummy code of MI=1. I actually have several of these.
Originally, I was trying to do this by arraying all 15 DX variables.
Unfortunately, this caused the file to grow to 40GB and SAS would throw me a
pop-up window to tell me that it was out of resources and ask me if I wanted
to Retry, Tell SAS No More Resources, or Terminate SAS.
So, then I decided just to do this one DX at a time by macro-fying the code.
Well...same thing is happening.
Anyone have any experience with this? It is totally hamstringing my project....
Anyone use SAS UNIX on big ole heifer files like this?
Thanks, Jen
Here's my code, for what it's worth:
libname tmp "/tmp/nis"; *temporary folder with 180GB space;
libname nis "/home/prj/nis"; *my project folder, has 50GB space;
data tmp.dx3; set nis.dx2; run; *copy the original to preserve;
%macro dx(dx);
DATA tmp.dx3; set tmp.dx3;
if substr(&dx,1,5) = "74685" then coronary=1; else coronary=0;
if substr(&dx,1,3) in ("630", "631", "632", "633", "634", "635", "636",
"637", "638", "639", "640", "641", "642", "643", "644", "645", "646", "647",
"648","650", "651", "652", "653", "654" "655", "656", "657", "658", "659",
"670") then pg=1;
if substr(&dx,1,2) in ("72", "73", "74", "75") then pg=1;
if substr(&dx,1,3) = "V27" then pg=1;
if pg ne 1 then pg=0;
if substr(&dx,1,4) = "3051" then smk=1; if substr(&dx,1,5)= "V1582" then
smk=1; if smk ne 1 then smk=0;
if substr(&dx,1,4) in ("6483","3050","3052","3053","3055","3056","3057")
then s_abuse=1; else s_abuse=0;
if substr(&dx,1,4) in ("7100", "6954") then lupus=1; else lupus=0;
if substr(&dx,1,4) = "4461" then kawasaki=1; else kawasaki=0;
if substr(&dx,1,4) in ("2724","2723","2722","2721","2720") then
hyperlipid=1; else hyperlipid=0;
drop &dx;
run;
proc freq data= tmp.dx3;
tables coronary pg smk s_abuse lupus kawasaki hyperlipid / missing;
run;
%mend;
%dx(dx1); %dx(dx2); %dx(dx3); %dx(dx4); %dx(dx5); %dx(dx6); %dx(dx7);
%dx(dx8); %dx(dx9); %dx(dx10); %dx(dx11); %dx(dx12); %dx(dx13); %dx(dx14);
%dx(dx15);
*Now, put the file back into my project folder;
data nis.dx3; set tmp.dx3;
run;
|