| Date: | Wed, 17 Jun 1998 14:00:44 -0400 |
| Reply-To: | "Dorfman, Paul" <pdorfma@UCS.ATT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU> |
| From: | "Dorfman, Paul" <pdorfma@UCS.ATT.COM> |
| Subject: | Re: Using 2 datasets at once |
|
| Content-Type: | text/plain |
|---|
Haven Battles <battles@SPRYNET.COM>, in particular, wrote:
>I have 2 datasets, one is basically a health insurance premium chart
>containing the carrier code (carcode), the plan type--hmo vs. ppo,
standard
>vs. preferred-(plancode), age group category-(agegp), family group
>category-(famgp) and the actual premium for each combination of the
>preceeding variables (rateamt). I also have a dataset containing
employer
>survey responses (about their health insurance contribution policy)
>employee and employer demographic variables and employee plan choice
>information. I've already merged the two to obtain the actual premium
for
>each employee. Now, I need to create a variable that reflects the
>difference between the price of the plan that the employee chose and
the
>price of the most expensive plan they could have chosen for their age
and
>family group. I think the way to do this is to create a program that
for
>each employee in the larger analysis file, goes into the 2nd premium
>database and searches for the most expensive 'rateamt' for the
>corresponding age and family group. I have no idea if this is
possible,
>and if so, how it would be done. I would GREATLY appreciate any input
you
>all have. I hope I was clear, but please feel free to ask questions to
>clarify. I can be reached directly at battles@sprynet.com. Thanks so
>much.
Haven,
Although it is possible to search for the most expensive 'rateamt' for
the corresponding age and family group, I'd approach it differently.
First, I'd create a subset of the 2nd premium database containing all
the agegp and famgp pairs with the rateamt corresponding to its maximum
for each agegp-famgp subgroup, and sort the subset by agegp and famgp.
Then I'd simply merge it with the dataset you have already created by
agegp and famgp and for each observation, calculate the difference being
sought.
I'm not aware of the nature of the 2nd premium database. If it can be
accessed from SAS via SQL then the following query:
PROC SQL;
CONNECT TO <database name> <other necessary parms>;
CREATE TABLE SUBSET AS
SELECT * FROM CONNECTION TO <database name>
(
SELECT DISTINCT X.AGEGP, X.FAMGP, X.PREM FROM <table name> X
WHERE X.PREM =
(SELECT MAX(PREM) FROM <table name>
WHERE AGEGP = X.AGEGP AND FAMGP = X.FAMGP)
ORDER BY 1, 2
);
DISCONNECT FROM <database name>;
QUIT;
can create the (already sorted) subset. If the 2nd premium is a SAS
data set the above query will work without the external database stuff :
PROC SQL;
CREATE TABLE SUBSET AS
SELECT DISTINCT X.AGEGP, X.FAMGP, X.PREM FROM <SAS data set name> X
WHERE X.PREM =
(SELECT MAX(PREM) FROM <SAS data set name>
WHERE AGEGP = X.AGEGP AND FAMGP = X.FAMGP)
ORDER BY 1, 2
;
QUIT;
even though in this case, it will be faster and simpler to avoid using
SQL at all:
PROC SORT DATA=BASE2 OUT=SUBSET NODUPKEY;
BY AGEGP FAMGP DESCENDING PREM; RUN;
PROC SORT DATA=SUBSET NODUPKEY; BY AGEGP FAMGP; RUN;
Alternatively, the second PROC SORT can be replaced by
DATA SUBSET; SET SUBSET; BY AGEGP FAMGP; IF FIRST.FAMGP; RUN;
or, better yet, the "IF FIRST.FAMGP;" statement can be used directly in
the forthcoming merge.
Cordially,
Paul.
++++++++++++++++++++++++++++++
Paul M. Dorfman
Citibank UCS Decision Support Systems
Jacksonville, FL
++++++++++++++++++++++++++++++
|