Date: Wed, 12 May 1999 23:38:08 -0400
Reply-To: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Paul M. Dorfman" <sashole@EARTHLINK.NET>
Subject: Re: IF...THEN based upon formatted values
Content-Type: text/plain; charset=koi8-r
Dave,
One reason why your code does not work the way you want is that you are trying
to use SAS intrinsics, in particular, a format, in a query passed to DB2
optimizer unfamiliar with these kind of things. Therefore, DB2 parser issues the
syntax error message. Since you are using the Pass-Through, everything within
the parentheses after 'CONNECTION TO DB2' must conform to DB2 query rules.
Using as format to select records is a rational idea. I would use a hash table,
but it does not change the principle: The keys from the driver file (source
number) are loaded into memory (format or hash table) along with the satellite
information (program), the DB2 table is read sequentially, and for each row, a
search in memory is performed looking up for a matching key. If search is
successful, some processing is done based on the satellite information, and the
record is written out to a SAS dataset, otherwise the next row from the DB2
table is read.
Another way of achieving the goal is to reverse the roles of the keys-to-match
and lookup table: Create a DB2 query where all keys from the driver are listed
as operands of the IN ( ) operator, and let DB2 organize a lookup table. In this
case, the driver file is 'read', and the entire DB2 table serves as a lookup
table. If the key in the DB2 table is indexed, DB2 might choose to look it up
for every driver key without performing a full table scan. Whether it is more
efficient or not than the first path, depends on the number of the driver keys,
memory available to DB2, and the way DB2 optimizer sees the world. If the number
of the driver keys is too large, and so the number of bytes in the DB2 query
exceeds its 32767 limit, the driver keys can be passed to DB2 in chunks with the
results concatenated. Just off the top of my head, one of the posters at SUGI 24
gives an example of this kind of approach.
If it is decided to take the first path, I would create a view into DB2, and use
the format (or hash) and do the rest of the necessary processing outside DB2 in
SAS. That _might_ look like:
DATA SRCFMT;
RETAIN FMTNAME '$SRCECD' TYPE 'C';
INFILE SOURCEIN END=END;
INPUT @01 START $8. /*source #*/
@10 LABEL $3.; /*grade */
OUTPUT;
IF END THEN DO;
LABEL = ' ';
HLO = 'O';
END;
RUN;
CONNECT TO DB2 (SSID=DB2P);
CREATE VIEW PROSM AS
SELECT * FROM CONNECTION TO DB2
(SELECT CUST_NBR AS PROSP_NBR
,LIST_SEG_CD AS SOURCE
,CASE GNDR_CD AS GENDER
);
DISCONNECT FROM DB2;
QUIT;
DATA PROSM.PROSM;
SET PROSM;
PROGRAM = PUT(SOURCE,$SRCECD.);
IF PROGRAM > ' ';
IF GENDER NE 'M' THEN GENDER = 'O';
RUN;
Frankly, I have found some parts of your request a bit hazy (in particular,
variable names not exactly matching what you are calling them in the text), but
hope that my guesses have been more or less in the ballpark.
Kind regards,
================
Paul M. Dorfman
Jacksonville, FL
================
David Eppler <david.eppler@SARALEEDIRECT.COM>, in part, wrote:
I have 2 files, the first contains prospects. This is on a DB2 table
containing @18 million records today. I expect this table to max out
around 50 million eventually. Each prospect has a source number
(there are @900 sources).
In a separate file each source number has been assigned a "program".
(Actually for testing, only some of the "A" sources/programs are in
this file and everyone else is assumed to be a "B" program, but
eventually this file will have "A", "B", "C" programs with everyone
else assumed to be a "D" program).
I can get the desired results by merging the prospect file and the
program file by source, but with this many records, I don't want to
pass the file any more than I have to.
BTW... Later in my processing I need to perform certain actions based
upon what PROGRAM a prospect is in, not what their source# is.
Any advice either on fixing my current code, support for a separate
merge step or ways to perform actions based upon the "formatted" value
of a field would be greatly appreciated.
Thanks,
David Eppler
Below is my code when trying to use the format in the select step.
5 DATA SRCFMT;
6 INFILE SOURCEIN;
7 IF _N_ = 1 THEN FMTNAME = '$SRCECD';
8 TYPE = 'C';
9 RETAIN FMTNAME TYPE;
10 INPUT @01 START $8. /*source #*/
11 @10 LABEL $3.; /*grade */
12
NOTE: The infile SOURCEIN is:
Dsname=SDDEV.UEPPLD.PVTLIB(OHPJMSIN),
Unit=3390,Volume=TST032,Disp=SHR,Blksize=27930,
Lrecl=133,Recfm=FB
NOTE: 22 records were read from the infile SOURCEIN.
NOTE: The data set WORK.SRCFMT has 22 observations and 4 variables.
NOTE: The DATA statement used 0.16 CPU seconds and 2947K.
13 PROC FORMAT CNTLIN=SRCFMT;
NOTE: Format $SRCECD has been output.
14
NOTE: The PROCEDURE FORMAT used 0.05 CPU seconds and 3188K.
15 PROC SQL FEEDBACK STIMER NOERRORSTOP;
NOTE: The SQL Statement used 0.02 CPU seconds and 3367K.
16 CONNECT TO DB2 (SSID=DB2P);
NOTE: The SQL Statement used 0.05 CPU seconds and 3460K.
17 CREATE TABLE PROSM.PROSM
19 AS
20 SELECT * FROM CONNECTION TO DB2
21 ( SELECT CUST_NBR AS PROSP_NBR
22 ,LIST_SEG_CD FORMAT=$SRCECD AS SOURCE
23 ,CASE GNDR_CD
24 WHEN 'M' THEN 'M'
25 ELSE 'O'
26 END AS GENDER
.
.
.
36 ORDER BY PROSP_NBR);
NOTE: The SQL Statement used 0.06 CPU seconds and 3751K.
ERROR: A DB2 error has occurred while processing this statement.
-104 DSNT408I SQLCODE = -104, ERROR: ILLEGAL SYMBOL "=". SOME
SYMBOLS THAT MIGHT BE LEGAL ARE: , FROM INTO
*********************************************************************
When I tried to define the column I got this...
16 CONNECT TO DB2 (SSID=DB2P);
NOTE: The SQL Statement used 0.05 CPU seconds and 3460K.
17 CREATE TABLE PROSM.PROSM
18 (SOURCE CHAR(3))
19 AS
20 SELECT * FROM CONNECTION TO DB2
21 ( SELECT CUST_NBR AS PROSP_NBR
22 ,LIST_SEG_CD FORMAT=$SRCECD AS SOURCE
.
.
.
36 ORDER BY PROSP_NBR);
ERROR: SAS data set options for the table being created may not
include column definitions.