|
Roy, thank you so much!!!!
The thing is that diag codes are not in v_medical_service. That is why I first had to join out.Main_diag to this table just to get ICD_DIAG_GRP_KEY of those 20 diagnoses (as in terms of dw, diagnoses codes and ICD_DIAG_GRP_KEY are different and there are a bunch of ICD_DIAG_GRP_Key-s).
Actually I even had to generated Main_diag (I named it so although it did not contain real diagnoses) based on V_ICD_DIAG_GRP table (that really had diagnoses) .....before all other steps in order to get icd_diag_grp_key. Sorry for this confusion.
Taking into account this fact and what you suggested, would the following code look reasonable?
**************************************************************************
PROC SQL;
CREATE TABLE out.ppl_with_diag AS
(SELECT s.icd_diag_grp_key
,s.clm_mem_key
,c.received_DT
FROM DW.V_ICD_DIAG_GRP t
INNER JOIN DW.V_MEDICAL_SERVICE S ON S.ICD_DIAG_GRP_KEY=t.ICD_DIAG_GRP_KEY
INNER JOIN DW.V_MEDICAL_CLAIM C ON c.clm_mem_key=S.clm_mem_key
WHERE (C.RECEIVED_DT GE '01Jan2006'd ) And ( C.RECEIVED_DT LE '31Dec2006'd ) AND
substr(t.icd_diag1_cd,1,3)="595" Or substr(t.icd_diag1_cd,1,3)="599" OR substr(t.icd_diag1_cd,1,4)="5950" OR
substr(t.icd_diag1_cd,1,4)="5952" OR substr(t.icd_diag1_cd,1,4)="5953" OR substr(t.icd_diag1_cd,1,4)="5958" OR
substr(t.icd_diag1_cd,1,4)="5959" OR substr(t.icd_diag1_cd,1,4)="5990" OR substr(t.icd_diag1_cd,1,5)="59589" OR substr(t.icd_diag1_cd,1,5)="59651"
OR substr(t.icd_diag1_cd,1,5)="59654" ) AND t.icd_DIAG1_DESC ne "DW Created" OR
(substr(t.icd_diag2_cd,1,3)="595" Or substr(t.icd_diag2_cd,1,3)="599" OR substr(t.icd_diag2_cd,1,4)="5950" OR
substr(t.icd_diag2_cd,1,4)="5952" OR substr(t.icd_diag2_cd,1,4)="5953" OR substr(t.icd_diag2_cd,1,4)="5958" OR
substr(t.icd_diag2_cd,1,4)="5959" OR substr(t.icd_diag2_cd,1,4)="5990" OR substr(t.icd_diag2_cd,1,5)="59589" OR substr(t.icd_diag2_cd,1,5)="59651"
OR substr(t.icd_diag2_cd,1,5)="59654" ) AND t.icd_DIAG2_DESC ne "DW Created" OR
(substr(t.icd_diag3_cd,1,3)="595" Or substr(t.icd_diag3_cd,1,3)="599" OR substr(t.icd_diag3_cd,1,4)="5950" OR
substr(t.icd_diag3_cd,1,4)="5952" OR substr(t.icd_diag3_cd,1,4)="5953" OR substr(t.icd_diag3_cd,1,4)="5958" OR
substr(t.icd_diag3_cd,1,4)="5959" OR substr(t.icd_diag3_cd,1,4)="5990" OR substr(t.icd_diag3_cd,1,5)="59589" OR substr(t.icd_diag3_cd,1,5)="59651"
OR substr(t.icd_diag3_cd,1,5)="59654" ) AND t.icd_DIAG3_DESC ne "DW Created"
);
after that the second part that you suggested:
create table women_with_diag as
select d.*
, mem_num
, mem_age label = "Member age on &sysdate"
from ppl_with_diag as d INNER JOIN
dw.v_claim_member_profile as m
on d.mem_prof_key = m.mem_prof_key
where m.mem_gender = "F" AND
m.mem_age between 19 and 64
Thank you once again!!!!
Irin
"Pardee, Roy" <pardee.r@GHC.ORG> wrote:
Hey Irin,
In general, you want to try to:
- write queries that don't cross the sas/db2 boundary, and
- start with your most restrictive criteria first, so
as to keep the number of records you have to drag
from step to step to a minimum.
It looks like you want to know how many women between the ages of 19 and
64 had any of a list of 20 diagnoses on claims received in 2006. The
diag codes are in a db table (or view?) called v_medical_service, the
claims are in v_medical_claim, and member demographics are in
v_member_profile.
My guess would be that the most restrictive part of your query will be
the date received and dx code requirements, so I'd start with a join
between v_medical_service and v_medical_claim, like so:
create table ppl_with_diag as
select s.icd_diag_grp_key
, s.clm_mem_key
, s.mem_prof_key
, c.received_dt
from v_medical_service as s INNER JOIN
v_medical_claim as c
on s.clm_mem_key = c.clm_mem_key
where s.icd_diag_grp_key in ('', '',
'', ...) AND
c.received_dt between "01Jan2006:00:00:00"dt and
"31Dec2006:00:00:00"dt
;
Note that you'll need to take your 20 dx codes from your sas dataset and
hand-write them into that in () list (you could also do a 3-way JOIN &
add your table of codes in, but I'd keep it simple if I were you).
I also changed your date constants to datetimes, on the theory that DB2
(like the rdbms' I'm familiar with) doesn't actually have date fields,
only datetimes. You may need to change that back if I'm wrong about
that.
SAS *should* be able to just hand that off to DB2 for processing and
catch the results in a dataset. That's going to involve much less i/o
and so be much faster than a query where SAS has to drag one or both
entire tables over the boundary into temp datasets & process the query
itself.
Now you need to figure out how many of these people are women of the
right age. For that we join your sas table ppl_with_diag to the db2
table to v_claim_member_profile. So we're crossing the sas/db2
boundary, but hopefully just for a few thousand records.
create table women_with_diag as
select d.*
, mem_num
, mem_age label = "Member age on &sysdate"
from ppl_with_diag as d INNER JOIN
dw.v_claim_member_profile as m
on d.mem_prof_key = m.mem_prof_key
where m.mem_gender = "F" AND
m.mem_age between 19 and 64
;
At the end of that, you should have a table of mem_nums you can do a
count(distinct mem_num) on & get the count you want.
HTH,
-Roy
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Irin later
Sent: Tuesday, August 14, 2007 11:38 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Long run against data warehouse (Will the Alternative work?)
I have a table out.Main_diag with about 20 diagnoses. All I need is to
know how many members have those diagnoses. For this purpose I generated
a table of diagnoses and joined it to DW.V_MEDICAL_SERVICE (in data
warehouse) to get mem_prof_key
LIBNAME DW DB2 UID=***
PROC SQL;
CREATE TABLE out.medSrvComb AS
(SELECT Distinct m.icd_diag_grp_key,m.clm_mem_key, m.mem_prof_key
FROM out.Main_diag g INNER JOIN DW.V_MEDICAL_SERVICE m ON
m.ICD_DIAG_GRP_KEY=g.ICD_DIAG_GRP_KEY
);
RUN;
I got 2,008,177 rows and 3 columns
Last step is to join it to DW.V_CLAIM_MEMBER_PROFILE table &
medsrvclaimsComb to apply demographic and date limitations. When I tried
to do this joins through PROC SQL CREATE TABLE, similar to above ,IT
terminated my job due to long 2-days run (which probably means
ineffective code).
My question is:
Would it be more effective if I try to do ALTER statement first to get
fields' stru that I need and then do update?
OR
It will not help me to avoid a problem? How should I modify my code to
resolve it?
Thank you in advance,
Irin
*********************************************************
PROC SQL;
(ALTER TABLE out.medsrvclaimsComb
ADD ( mem_gender char(1),
mem_num varchar(16),
received_dt date(10)
mem_age number (3));
);
quit;
proc sql ;
update out.medsrvclaimsComb
set mem_gender mem_num mem_age = (select mem_gender mem_num mem_age
from DW.V_CLAIM_MEMBER_PROFILE g
where
out.medsrvclaimsComb.mem_prof_key=g.mem_prof_key)
where out.medsrvclaimsComb.mem_prof_key in (select mem_prof_key from
DW.V_CLAIM_MEMBER_PROFILE)
AND mem_gender="F"
AND mem_age <65
AND mem_age >=19;
quit;
proc sql ;
update out.medsrvclaimsComb
set received_dt = (select received_dt from DW.V_MEDICAL_CLAIM g
where out.medsrvclaimsComb.clm_mem_key=g.clm_mem_key)
where out.medsrvclaimsComb.clm_mem_key in (select clm_mem_key from
DW.V_MEDICAL_CLAIM)
AND (RECEIVED_DT GE '01Jan2006'd ) And (RECEIVED_DT LE '31Dec2006'd)
;
quit;
---------------------------------
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
---------------------------------
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
|