**Date:** Mon, 27 Jan 2003 15:28:43 -0800
**Reply-To:** Mark Terjeson <mark.terjeson@NWCSR.COM>
**Sender:** "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
**From:** Mark Terjeson <mark.terjeson@NWCSR.COM>
**Subject:** Re: Proc Format comparison differences
**In-Reply-To:** <9B501B3774931C469BCCCC021BE537223D4358@remailnt2-re01.westat.com>
**Content-Type:** text/plain; charset="us-ascii"
Hi Ian,

>>To be explicit - how is the format handled that
>>it provides a descrepancy with everything else?

While I do think you'll agree with me that the
precision boundary is definately where the problem
becomes invoked, I do now see your question.

The things we do know are that the Xgt50 Xeq50 Xlt50
and Ygt50 Yeq50 Ylt50 are calling the same GT/EQ/LT
routine and most likely using the same logarithm
tables for the internal floating point conversions
and indeed do yield the same results. I see your
point that the comparison operation in the proc format
yields a slightly different result that the GT/EQ/LT
in the datastep.

We know that with floating point number that beyond
the 15th digit is going to be random digits due to
the internal floating point conversions to and from
via internal logarithm table. In playing with the
15th-16th-17th digits, such as ...

proc format ;
value testx
low - < 50.0 = "small"
other = "big"
;
run;

data _null_;
x = 49.999999999999990; put '90 ' x=32.28 x=test.;
x = 49.999999999999991; put '91 ' x=32.28 x=test.;
x = 49.999999999999992; put '92 ' x=32.28 x=test.;
x = 49.999999999999993; put '93 ' x=32.28 x=test.;
x = 49.999999999999994; put '94 ' x=32.28 x=test.;
x = 49.999999999999995; put '95 ' x=32.28 x=test.;
x = 49.999999999999996; put '96 ' x=32.28 x=test.;
x = 49.999999999999997; put '97 ' x=32.28 x=test.;
x = 49.999999999999998; put '98 ' x=32.28 x=test.;
x = 49.999999999999999; put '99 ' x=32.28 x=test.;
run;

...we see above a selected choice of digit-16 of 9
happens to work up through subdividing it by one more
digit up through 6 and failing at 7. Taking this one
step further and subdividing the tenths between that
6 and 7 we get the fine tuning point at 65...

data _null_;
x = 49.9999999999999960; put '960 ' x=32.28 x=test.;
x = 49.9999999999999961; put '961 ' x=32.28 x=test.;
x = 49.9999999999999962; put '962 ' x=32.28 x=test.;
x = 49.9999999999999963; put '963 ' x=32.28 x=test.;
x = 49.9999999999999964; put '964 ' x=32.28 x=test.;
x = 49.9999999999999965; put '965 ' x=32.28 x=test.;
x = 49.9999999999999966; put '966 ' x=32.28 x=test.;
x = 49.9999999999999967; put '967 ' x=32.28 x=test.;
x = 49.9999999999999968; put '968 ' x=32.28 x=test.;
x = 49.9999999999999969; put '969 ' x=32.28 x=test.;
run;

...granted we know that digit 16-17-18+ are gonna be
spurious and we can't plan what they are. Is it
coincidental that at some point (as we've seen) the
turning point toggles on a 5? I agree with you that
the question is still 'live' that "Is the proc format
comparison logic slightly different and incorporating
a rounding somewhere? or some other difference?"

I did see something similar years ago when I was porting
languages across plaforms and the floating point routines
internally processed on Motorola and Intel CPU's for PC
and Unix o/s's would match and the new (at the time) RISC
for IBM's AIX o/s yielded different results (at the
precision boundary) and I had to custom patch in the
language (for the comparison routines to match) so that
the comparison routines would match across all platforms.
The culprit was the RISC/AIX platform had a *different*
internal logarithm table, thus the floating point conversion
would yield subtle differences way down in the mantissa.

While I doubt that these routines being on same hdwr and o/s
would be using different log tables, your identification
that the proc format routine for handling comparisions of
floating point numbers next to or beyond the precision
boundary sure seems to be different indeed.

Most problems are that "we know the *what*, we just need
to find out the *why*. I think it's humorous that in this
case "we know the *why*, we need to find out the *what*.

Good find!,
Mark

-----Original Message-----
From: Ian Whitlock [mailto:WHITLOI1@WESTAT.com]
Sent: Monday, January 27, 2003 2:00 PM
To: 'Mark Terjeson'; SAS-L@LISTSERV.UGA.EDU
Subject: RE: Numeric format ranges query

Mark,

I do not think you understand my problem. So let me try again.
Consider
the following.

i=15 goal=4049000000000000
x=4049000000000001 x=small Xgt50=1 Xeq50=0 Xlt50=0
y=4049000000000000 y=big XgtY=1 XeqY=0 XltY=0

The program to create the report is shown below.

In decimal goal is 50 it is shown HEX16. to show it as stored in on the
PC.

Now look at X also shown HEX16. as stored. How does it compare with 50?
You can see that they differ only in the last bit. What about computer
comparison? Assume for the moment that the names involving comparisons
were
created as indicated and are truth values. Hence X > 50 is true, and
consistently not(X = 50) and not(X < 50).

Now look at Y. We can see it is stored the same way as 50. And
correspondingly X > Y and not(X = Y) and not(X < Y). All as expected.

Finally let's look at the formatted values X=SMALL and Y=BIG. The
format
was written to make anything less than 50 small and otherwise big.
Hence Y
is classified correctly? But what happen to X? It is classified as
SMALL.
That is the inconsistency that I wish to be explained.

Now I perfectly well understand that the creation of X may lead to a
number
a little bit smaller or bigger than 50. That doesn't bother me. As you
say
it is a question of handling arithmetic beyond the precision of the
machine.
What I am bothered by is the consistency of the comparison test with
what I
think are the facts at odds with the comparison as reported by the
format.
Can you explain that by referring to the format processing? To be
explicit
- how is the format handled that it provides a descrepancy with
everything
else?

Here is the program to reproduce the report.

%let n = 49 ;
proc format ;
value test
low - < %eval(&n+1) = "small"
other = "big"
;
run ;

data _null_ ;
array q ( 17 ) q1-q17 (
&n..
&n..9
&n..99
&n..999
&n..9999
&n..99999
&n..999999
&n..9999999
&n..99999999
&n..999999999
&n..9999999999
&n..99999999999
&n..999999999999
&n..9999999999999
&n..99999999999999
&n..999999999999999
&n..9999999999999999
) ;

goal = %eval(&n+1) ;
x = &n ;
do i = 1 to 17 ;
x = x + 9/10**i ;
diff = goal - x ;
p = 9/10**i ;
if i = 15 then
do ;
put i= goal= hex16. ;
Xgt50 = ( x > 50 ) ;
Xeq50 = ( x = 50 ) ;
Xlt50 = ( x < 50 ) ;
put x=hex16. x= test. xgt50= xeq50= xlt50= ;
y = q[i] + p ;
XgtY = ( x > y ) ;
XeqY = ( x = y ) ;
XltY = ( x < y ) ;
put y=hex16. y= test. xgty= xeqy= xlty= ;
end ;
end ;
run ;

IanWhitlock@westat.com

-----Original Message-----
From: Mark Terjeson [mailto:mark.terjeson@NWCSR.COM]
Sent: Monday, January 27, 2003 3:18 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Numeric format ranges query

Hi Ian,

Good ol' precision. (so you probably remember now)

Many systems only have 11 to 13 digits of precision
and SAS has 15 digits of precision. So if you ignore
the decimal point you only get 15 valid digits.
Anything after 15 digits is either random garbage or
usually truncated.

For STORAGE (or actual value) of a number -- remember to
ignore the decimal point, this is the key. DISPLAYING
that number is a different thing in displaying a bunch
of characters.

Your typical floating point number is going to store
15 good digits plus knowledge of the position where the
decimal point is supposed to go.

When you have just a single digit of 9, but you move it
15 or more places to the right, such as:

data _null_;
x = 9.0E-15;
format x 30.20;
put x=;
run;

data _null_;
x = 9.0E-18; * more than 15 ;
format x 30.20;
put x=;
run;

The single digit of 9 shows up good (more than 15 digits)
because the format routine is merely concatenating the
leading zero string character "0" together plus a period
symbol in the string somewhere. So what you see is not
a numeric value but a character string of a single digit
of 9 plus the routine adding zeros and period. The
counting of 15 digits of precision starts at the 9 and
you have room for 14 more good digits, such as:

data _null_;
x = 91234567890.0E-25;
format x 32.30;
put x=;
run;

Once you "add" that single digit of 9.0E-15 to any
integer (a digit to the left of the decimal point)
such as:

data _null_;
x = 9.0E-18;
x = x + 1;
format x 30.20;
put x=;
run;

... you *now* have to start counting at the integer,
which means counting those 15 good digits is the "1"
plus the next 14 digits of "0" (remember ignoring the
decimal point) and you find that your 9 digit is in
the 16th position and is going to go poof (get truncated)

The moral of the story is that "nothing is wrong".

(optical illusion by the format routine! har har)

Hope this is helpful,
Mark Terjeson
Northwest Crime and Social Research, Inc.
A SAS Alliance Partner
215 Legion Way SW
Olympia, WA 98501
360.870.2581 - voice,cell
360.570.7533 - fax
mailto:mark.terjeson@nwcsr.com
www.nwcsr.com

"Nothing is particularly hard
if you divide it into small jobs."
- Henry Ford, Industrialist

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ian
Whitlock
Sent: Monday, January 27, 2003 11:20 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Numeric format ranges query

Although Simon has lost interest in his question, it does have an
interesting form, in which I am stuck. How good is a SAS format at
specifying a "Dedikind" cut of computer decimals? The following report
is
the result of running the program below. Each line increments X
(originally
49) by adding another 9 to the decimal part of X. As the amount added
gets
smaller and smaller there is a point at which it no longer matters, the
number doesn't change. SMALL and BIG come from a formatted value of x.
Note the last 3 lines X is slightly bigger than 50, but the format
reports
it as being smaller. Why is the value of X captured as small when
difference from 50 is negative? (I don't see the question as making a
practical difference, but it still bothers me.)

p=9.0E-01 x=49.90000000000000000 x=4048F33333333333 small diff=0.1
p=9.0E-02 x=49.99000000000000000 x=4048FEB851EB851F small diff=0.01
p=9.0E-03 x=49.99900000000000000 x=4048FFDF3B645A1D small diff=0.001
p=9.0E-04 x=49.99990000000000000 x=4048FFFCB923A29D small diff=0.0001
p=9.0E-05 x=49.99999000000000000 x=4048FFFFAC1D29DD small diff=1E-5
p=9.0E-06 x=49.99999900000000000 x=4048FFFFF79C8430 small diff=1E-6
p=9.0E-07 x=49.99999990000000000 x=4048FFFFFF29406C small
diff=9.9999994E-8
p=9.0E-08 x=49.99999999000000000 x=4048FFFFFFEA8672 small
diff=9.9999937E-9
p=9.0E-09 x=49.99999999900000000 x=4048FFFFFFFDDA3F small
diff=9.999965E-10
p=9.0E-10 x=49.99999999990000000 x=4048FFFFFFFFC907 small
diff=9.999468E-11
p=9.0E-11 x=49.99999999999000000 x=4048FFFFFFFFFA81 small
diff=9.997336E-12
p=9.0E-12 x=49.99999999999900000 x=4048FFFFFFFFFF74 small
diff=9.947598E-13
p=9.0E-13 x=49.99999999999990000 x=4048FFFFFFFFFFF3 small
diff=9.237056E-14
p=9.0E-14 x=50.00000000000000000 x=4049000000000000 big diff=0
p=9.0E-15 x=50.00000000000000000 x=4049000000000001 small
diff=-7.10543E-15
p=9.0E-16 x=50.00000000000000000 x=4049000000000001 small
diff=-7.10543E-15
p=9.0E-17 x=50.00000000000000000 x=4049000000000001 small
diff=-7.10543E-15
======
x=50.00000000000000000 x=4049000000000000 big

Here is the program with parameter N to allow investigation at other
points
on the line. Numbers of the form (2**i)-1 (i=1,2,3,...) form an
interesting
series.

%let n = 49 ;
proc format ;
value test
low - < %eval(&n+1) = "small"
other = "big"
;
run ;

data _null_ ;
goal = %eval(&n+1) ;
x = &n ;
do i = 1 to 17 ;
x = x + 9/10**i ;
diff = goal - x ;
p = 9/10**i ;
put p= e7. x= 30.17 +1 x=hex16. +1 x test. +1 diff = ;
end ;
x = %eval(&n+1) ;
put "======" ;
put x= 30.17 +1 x=hex16. +1 x test. ;
run ;

If one adds

check = ( goal > x ) ;
put check= ;

after the calculation of X, then one can see that the value of CHECK is
faithful to the situation, so why isn't the format? Unfortunately
either my
computer arithmetic or knowledge of the internal working of formats is
too
weak to provide an explanation.

Perhaps it indicates a small inconsistency in the handling of fuzzing
between the calculation of CHECK and the way formatted values are
calculated.

IanWhitlock@westat.com

-----Original Message-----
From: Simon Gillow [mailto:Simon.Gillow@BBG.CO.UK]
Sent: Monday, January 27, 2003 6:07 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Numeric format ranges query

I want a format that does what I would expect the following to do:

proc format;
value ltv
low - < 50 = '0-50'
50 > - 55 = '50-55'
55 > - 60 = '55-60'
60 > - 65 = '60-65'
65 > - 70 = '65-70'
70 > - 75 = '70-75'
75 > - 80 = '75-80'
80 > - 85 = '80-85'
85 > - 90 = '85-90'
90 > - 95 = '90-95'
95 > - 100 = '95-100'
100 > - 105 = '100-105'
105 > - 110 = '105-110'
110 > - high = '110+'
other = 'Unknown';
run;

Unfortunately this code doesnt work in SAS 8.2 (Windows NT based).

Basically is a value is 50 I want it in the '50-55' category,
49.9999999999999 should be in '0-50'
and 50.0000000000000000001 should be in '50-55'. It appears that you
cannot
put > before the -, am I right?

Will I have to resort to:
50.000000001 - < 55
55.000000001 -< 60
etc.....

Sorry if this is a really dumb question but I have searched the web and
the
archives of this list but got nothing.....

Thanks, Simon

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

Bradford & Bingley plc
Registered Office: PO Box 88, Croft Road, Crossflatts, Bingley, West
Yorkshire, BD16 2UA
Registered in England No. 3938288
Regulated by the Financial Services Authority and a Member of the
General
Insurance Standards Council.

http://www.bbg.co.uk
Bradford Bingley plc
**********************************************************************