LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 1998, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 17 Jun 1998 14:00:44 -0400
Reply-To:   "Dorfman, Paul" <pdorfma@UCS.ATT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:   "Dorfman, Paul" <pdorfma@UCS.ATT.COM>
Subject:   Re: Using 2 datasets at once
Comments:   To: "battles@SPRYNET.COM" <battles@SPRYNET.COM>
Content-Type:   text/plain

Haven Battles <battles@SPRYNET.COM>, in particular, wrote:

>I have 2 datasets, one is basically a health insurance premium chart >containing the carrier code (carcode), the plan type--hmo vs. ppo, standard >vs. preferred-(plancode), age group category-(agegp), family group >category-(famgp) and the actual premium for each combination of the >preceeding variables (rateamt). I also have a dataset containing employer >survey responses (about their health insurance contribution policy) >employee and employer demographic variables and employee plan choice >information. I've already merged the two to obtain the actual premium for >each employee. Now, I need to create a variable that reflects the >difference between the price of the plan that the employee chose and the >price of the most expensive plan they could have chosen for their age and >family group. I think the way to do this is to create a program that for >each employee in the larger analysis file, goes into the 2nd premium >database and searches for the most expensive 'rateamt' for the >corresponding age and family group. I have no idea if this is possible, >and if so, how it would be done. I would GREATLY appreciate any input you >all have. I hope I was clear, but please feel free to ask questions to >clarify. I can be reached directly at battles@sprynet.com. Thanks so >much.

Haven,

Although it is possible to search for the most expensive 'rateamt' for the corresponding age and family group, I'd approach it differently. First, I'd create a subset of the 2nd premium database containing all the agegp and famgp pairs with the rateamt corresponding to its maximum for each agegp-famgp subgroup, and sort the subset by agegp and famgp. Then I'd simply merge it with the dataset you have already created by agegp and famgp and for each observation, calculate the difference being sought.

I'm not aware of the nature of the 2nd premium database. If it can be accessed from SAS via SQL then the following query:

PROC SQL; CONNECT TO <database name> <other necessary parms>; CREATE TABLE SUBSET AS SELECT * FROM CONNECTION TO <database name> ( SELECT DISTINCT X.AGEGP, X.FAMGP, X.PREM FROM <table name> X WHERE X.PREM = (SELECT MAX(PREM) FROM <table name> WHERE AGEGP = X.AGEGP AND FAMGP = X.FAMGP) ORDER BY 1, 2 ); DISCONNECT FROM <database name>; QUIT;

can create the (already sorted) subset. If the 2nd premium is a SAS data set the above query will work without the external database stuff :

PROC SQL; CREATE TABLE SUBSET AS SELECT DISTINCT X.AGEGP, X.FAMGP, X.PREM FROM <SAS data set name> X WHERE X.PREM = (SELECT MAX(PREM) FROM <SAS data set name> WHERE AGEGP = X.AGEGP AND FAMGP = X.FAMGP) ORDER BY 1, 2 ; QUIT;

even though in this case, it will be faster and simpler to avoid using SQL at all:

PROC SORT DATA=BASE2 OUT=SUBSET NODUPKEY; BY AGEGP FAMGP DESCENDING PREM; RUN; PROC SORT DATA=SUBSET NODUPKEY; BY AGEGP FAMGP; RUN;

Alternatively, the second PROC SORT can be replaced by

DATA SUBSET; SET SUBSET; BY AGEGP FAMGP; IF FIRST.FAMGP; RUN;

or, better yet, the "IF FIRST.FAMGP;" statement can be used directly in the forthcoming merge.

Cordially,

Paul.

++++++++++++++++++++++++++++++ Paul M. Dorfman Citibank UCS Decision Support Systems Jacksonville, FL ++++++++++++++++++++++++++++++


Back to: Top of message | Previous page | Main SAS-L page