Date: Thu, 10 Jan 2002 23:14:11 -0000
Reply-To: "Vyverman, Koen" <koen.vyverman@FID-INTL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Vyverman, Koen" <koen.vyverman@FID-INTL.COM>
Subject: Re: MS Excel column widths
Content-Type: text/plain; charset="iso-8859-1"
LS,
Michael Kemp (original post included below the .sig) asks an inte-
resting question. Paraphrasing: when using DDE to read data into SAS
from an Excel worksheet, how to optimally define the lengths of SAS
character variables? In other words, is there a way to make Excel
cough up the maximal length of string-type data in a given column,
and thereby evading the creation of SAS variables that are either
too short, or unnecessarily long with respect to the incoming data.
Whether Excel does some internal book-keeping to keep track of such
things, I know not. But the answer to Michael's question is luckily
affirmative. As is my wont, I have included below a self-contained
slab of code demonstrating a possible solution. All this runs fine
on my WinNT4.0(SP6) box, with SAS8.2 and Office97. Copy to the pro-
gram editor, submit step-by-step, and observe the results ...
A few words about the technical bits are in order however. The core
of the solution as presented below relies on the usage of a beast
known as an Excel Formula Array (don't ask me...). In the present
case, the needed formula array is as follows:
{=max(len(a1:a1000))}
Simply typing this expression into a cell on the worksheet accom-
plishes nothing. The proper way to enter the above Formula Array is
as follows. Type
=max(len(a1:a1000))
into a cell. Do not hit the return or enter-key to confirm your en-
try, but press CONTROL-SHIFT-RETURN. That's the way it works ...
In order to get SAS/DDE to perform such acrobatics, we must resort
to the use of the Excel macro function SEND.KEYS, which makes the
solution less elegant, as this requires the Excel application to
be in the foreground. On the other hand, it is the only way I know
of to perform a CONTROL-SHIFT-RETURN, so it'll have to do. A further
complication is that parentheses cannot be included as such in the
argument of SEND.KEYS, but need to be masked by pairs of braces.
Which is the reason why the first SEND.KEYS below looks perhaps
slightly garbled. The second SEND.KEYS instruction is equivalent to
pressing CONTROL-SHIFT-RETURN and activates the Formula Array ...
%* Launch an instance of the Excel application, and define a DDE *;
%* system doublet fileref SAS2XL. *;
options noxsync noxwait xmin;
filename sas2xl dde 'excel|system';
data _null_;
length fid rc start stop time 8;
fid=fopen('sas2xl','s');
if (fid le 0) then do;
rc=system('start excel');
start=datetime();
stop=start+10;
do while (fid le 0);
fid=fopen('sas2xl','s');
time=datetime();
if (time ge stop) then fid=1;
end;
end;
rc=fclose(fid);
run;
%* Create a SAS data set STRINGY_DATA with 1000 observations and *;
%* a single variable STRING which gets populated with a random *;
%* number (maximally 1500) of uppercase A-Z characters. We will *;
%* dump this data set into Excel to generate a testing workbook. *;
data stringy_data(drop=i j k);
length
string $ 1500
i j k 8;
do i=1 to 1000;
string='';
k=1500*ranuni(0);
do j=1 to k;
string=trim(left(string))||byte(int(65+26*ranuni(0)));
end;
output;
end;
run;
%* Before we proceed, we compute the maximum length of STRING in *;
%* SAS, and write the resulting number to the log so as to be *;
%* able to compare with the Excel outcome later on. Note that *;
%* since we used RANUNI 1000 times to pick STRING lengths up to *;
%* 1500, it should come as no surprise that the resulting length *;
%* tends to get pretty close to 1500. *;
proc sql noprint;
select
trim(left(put(max(length(string)),4.)))
into :
maxlen
from
stringy_data
;
quit;
%put ===> SAS says: Max length is &maxlen;
%* Close the default workbook, create a new one containing a re- *;
%* gular worksheet (by default named SHEET1), toggle the Excel *;
%* error checking off, save somewhere to fixate the filename. *;
data _null_;
file sas2xl;
put '[file.close(false)]';
put '[new(1)]';
put '[error(false)]';
put '[save.as("c:\temp\X Find Max Col Len")]';
run;
%* Define a DDE triplet-style fileref pointing to 1000 cells in *;
%* the first column of SHEET1. *;
filename recrange dde
'excel|[x find max col len.xls]sheet1!r1c1:r1000c1' notab;
%* Dump the contents of STRINGY_DATA there. Note the increased *;
%* LRECL to accommodate for the long SAS variable. Clear the *;
%* fileref. *;
data _null_;
set stringy_data;
file recrange lrecl=1500;
put
string;
run;
filename recrange clear;
%* Save the workbook, insert an Excel macro-sheet, named MACRO1 *;
%* by default. *;
data _null_;
file sas2xl;
put '[save]';
put '[workbook.insert(3)]';
run;
%* Define a DDE triplet-style fileref pointing to 10 cells in the *;
%* first column of MACRO1. *;
filename xlmacro dde 'excel|macro1!r1c1:r10c1' notab lrecl=200;
%* We write an old-style Excel macro there, and subsequently run *;
%* it. The macro brings the Excel application to the foreground, *;
%* an unfortunate necessity for using the SEND.KEYS function, ac- *;
%* tivates our data sheet SHEET1, and employs the SEND.KEYS func- *;
%* tion to enter an Excel formula array into the R1C2 cells, *;
%* which evaluates to the maximum length of the first 1000 cells *;
%* in the first column. Observe that the formula array requires *;
%* the use of _named_ cell references A1:A1000. It will not work *;
%* with row-column style ranges like R1C1:R1000C1. *;
data _null_;
file xlmacro;
put '=app.activate("microsoft excel - x find max col len.xls")';
put '=workbook.activate("sheet1")';
put '=select("r1c2")';
put '=send.keys("=max{(}len{(}a1:a1000{)}{)}")';
put '=send.keys("^+{return}")';
put '=halt(true)';
put '!dde_flush';
file sas2xl;
put '[run("macro1!r1c1")]';
run;
%* Clear the fileref after running the Excel macro. *;
filename xlmacro clear;
%* Delete the MACRO1 sheet, save the workbook. *;
data _null_;
file sas2xl;
put '[workbook.delete("macro1")]';
put '[save]';
run;
%* The maximum length as perceived by Excel now sits in R1C2 of *;
%* the worksheet SHEET1. A simple reading operation stores it in *;
%* a SAS macro variable XLMAX. *;
filename xlmax dde "excel|sheet1!r1c2" lrecl=200;
data _null_;
length
xlmax 8;
infile xlmax;
input xlmax;
call symput('xlmax',trim(left(put(xlmax,4.))));
run;
filename xlmax clear;
%* And lo ... *;
%put ===> XL says: Max length is &xlmax;
I hope this will help? As a matter of fact, one of the things on my
to-do-in-case-of-insane-amounts-of-time-on-hands-list is writing a
SAS macro that, given the physical path of an Excel workbook, would
generate SAS data sets similar to the VTABLE and VCOLUMN SAS dictio-
nary tables, holding such information as: names of sheets on the
workbook, type of sheet, rows and columns used for each sheet, column
labels (or whatever sits in the first row), maximal value length for
each column, ... I'm sure this kind of meta-data would be useful
enough for those who attempt to optimize and automate the reading in
of Excel data, be it via DDE or by means of proc import.
Kind Regards,
Koen.
---------------------------------
Koen Vyverman
Database Marketing Manager
Fidelity Investments - Luxembourg
---------------------------------
> -----Original Message-----
> From: Kemp, Michael [mailto:Michael.Kemp@GS.COM]
> Sent: Thursday, January 10, 2002 17:25
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: MS Excel column widths
>
>
> I have often wondered how that "Autofit" command (in Excel) works
> internally. I mean, I can see that it could work, if once issued,
> the application, runs down the column, finds the value with the
> greatest length, and then adjusts the column width accordingly.
> However, it seems to work immediately upon execution, regardless
> of the number of rows. Obviously, then, no such processing is
> taking place. If this is true, then there must be some sort of
> internal record being kept about the length of values whenever a
> value is added. I believe that a record of the length of the
> longest value is internally stored, and whenever a new value is
> added, it compares the length of the new value to the existing
> value (which represents the length of the PREVIOUSLY longest
> value). If the length of the new cell value is larger than that
> already existing value, the existing value is replaced by the new.
> If not, no action is taken.
> Then, when the Autofit command is initiated, it simply refers to
> this internal data table, makes reference to this value and then
> adjusts the column width accordingly.
> Assuming, again, that all of this is true, is there a way to
> access this piece of data. I am loading Excel data into SAS, and
> it would be very helpful to know the length of the longest value
> in a given column. This is obviously not a SAS question, but since
> I think many of you use SAS in the same fashion as this, I thought
> you might have more insight into this issue.
> Thanking you in advance.
>
> Mike Kemp
> Goldman Sachs
> 100 Second Ave. South, Suite 200N
> St. Petersburg, FL 33701
> Phone 727-825-3830
> Fax 727-825-3834