LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Jan 2003 15:28:43 -0800
Reply-To:     Mark Terjeson <mark.terjeson@NWCSR.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mark Terjeson <mark.terjeson@NWCSR.COM>
Subject:      Re: Proc Format comparison differences
Comments: To: Ian Whitlock <WHITLOI1@WESTAT.com>
In-Reply-To:  <9B501B3774931C469BCCCC021BE537223D4358@remailnt2-re01.westat.com>
Content-Type: text/plain; charset="us-ascii"

Hi Ian,

>>To be explicit - how is the format handled that >>it provides a descrepancy with everything else?

While I do think you'll agree with me that the precision boundary is definately where the problem becomes invoked, I do now see your question.

The things we do know are that the Xgt50 Xeq50 Xlt50 and Ygt50 Yeq50 Ylt50 are calling the same GT/EQ/LT routine and most likely using the same logarithm tables for the internal floating point conversions and indeed do yield the same results. I see your point that the comparison operation in the proc format yields a slightly different result that the GT/EQ/LT in the datastep.

We know that with floating point number that beyond the 15th digit is going to be random digits due to the internal floating point conversions to and from via internal logarithm table. In playing with the 15th-16th-17th digits, such as ...

proc format ; value testx low - < 50.0 = "small" other = "big" ; run;

data _null_; x = 49.999999999999990; put '90 ' x=32.28 x=test.; x = 49.999999999999991; put '91 ' x=32.28 x=test.; x = 49.999999999999992; put '92 ' x=32.28 x=test.; x = 49.999999999999993; put '93 ' x=32.28 x=test.; x = 49.999999999999994; put '94 ' x=32.28 x=test.; x = 49.999999999999995; put '95 ' x=32.28 x=test.; x = 49.999999999999996; put '96 ' x=32.28 x=test.; x = 49.999999999999997; put '97 ' x=32.28 x=test.; x = 49.999999999999998; put '98 ' x=32.28 x=test.; x = 49.999999999999999; put '99 ' x=32.28 x=test.; run;

...we see above a selected choice of digit-16 of 9 happens to work up through subdividing it by one more digit up through 6 and failing at 7. Taking this one step further and subdividing the tenths between that 6 and 7 we get the fine tuning point at 65...

data _null_; x = 49.9999999999999960; put '960 ' x=32.28 x=test.; x = 49.9999999999999961; put '961 ' x=32.28 x=test.; x = 49.9999999999999962; put '962 ' x=32.28 x=test.; x = 49.9999999999999963; put '963 ' x=32.28 x=test.; x = 49.9999999999999964; put '964 ' x=32.28 x=test.; x = 49.9999999999999965; put '965 ' x=32.28 x=test.; x = 49.9999999999999966; put '966 ' x=32.28 x=test.; x = 49.9999999999999967; put '967 ' x=32.28 x=test.; x = 49.9999999999999968; put '968 ' x=32.28 x=test.; x = 49.9999999999999969; put '969 ' x=32.28 x=test.; run;

...granted we know that digit 16-17-18+ are gonna be spurious and we can't plan what they are. Is it coincidental that at some point (as we've seen) the turning point toggles on a 5? I agree with you that the question is still 'live' that "Is the proc format comparison logic slightly different and incorporating a rounding somewhere? or some other difference?"

I did see something similar years ago when I was porting languages across plaforms and the floating point routines internally processed on Motorola and Intel CPU's for PC and Unix o/s's would match and the new (at the time) RISC for IBM's AIX o/s yielded different results (at the precision boundary) and I had to custom patch in the language (for the comparison routines to match) so that the comparison routines would match across all platforms. The culprit was the RISC/AIX platform had a *different* internal logarithm table, thus the floating point conversion would yield subtle differences way down in the mantissa.

While I doubt that these routines being on same hdwr and o/s would be using different log tables, your identification that the proc format routine for handling comparisions of floating point numbers next to or beyond the precision boundary sure seems to be different indeed.

Most problems are that "we know the *what*, we just need to find out the *why*. I think it's humorous that in this case "we know the *why*, we need to find out the *what*.

Good find!, Mark

-----Original Message----- From: Ian Whitlock [mailto:WHITLOI1@WESTAT.com] Sent: Monday, January 27, 2003 2:00 PM To: 'Mark Terjeson'; SAS-L@LISTSERV.UGA.EDU Subject: RE: Numeric format ranges query

Mark,

I do not think you understand my problem. So let me try again. Consider the following.

i=15 goal=4049000000000000 x=4049000000000001 x=small Xgt50=1 Xeq50=0 Xlt50=0 y=4049000000000000 y=big XgtY=1 XeqY=0 XltY=0

The program to create the report is shown below.

In decimal goal is 50 it is shown HEX16. to show it as stored in on the PC.

Now look at X also shown HEX16. as stored. How does it compare with 50? You can see that they differ only in the last bit. What about computer comparison? Assume for the moment that the names involving comparisons were created as indicated and are truth values. Hence X > 50 is true, and consistently not(X = 50) and not(X < 50).

Now look at Y. We can see it is stored the same way as 50. And correspondingly X > Y and not(X = Y) and not(X < Y). All as expected.

Finally let's look at the formatted values X=SMALL and Y=BIG. The format was written to make anything less than 50 small and otherwise big. Hence Y is classified correctly? But what happen to X? It is classified as SMALL. That is the inconsistency that I wish to be explained.

Now I perfectly well understand that the creation of X may lead to a number a little bit smaller or bigger than 50. That doesn't bother me. As you say it is a question of handling arithmetic beyond the precision of the machine. What I am bothered by is the consistency of the comparison test with what I think are the facts at odds with the comparison as reported by the format. Can you explain that by referring to the format processing? To be explicit - how is the format handled that it provides a descrepancy with everything else?

Here is the program to reproduce the report.

%let n = 49 ; proc format ; value test low - < %eval(&n+1) = "small" other = "big" ; run ;

data _null_ ; array q ( 17 ) q1-q17 ( &n.. &n..9 &n..99 &n..999 &n..9999 &n..99999 &n..999999 &n..9999999 &n..99999999 &n..999999999 &n..9999999999 &n..99999999999 &n..999999999999 &n..9999999999999 &n..99999999999999 &n..999999999999999 &n..9999999999999999 ) ;

goal = %eval(&n+1) ; x = &n ; do i = 1 to 17 ; x = x + 9/10**i ; diff = goal - x ; p = 9/10**i ; if i = 15 then do ; put i= goal= hex16. ; Xgt50 = ( x > 50 ) ; Xeq50 = ( x = 50 ) ; Xlt50 = ( x < 50 ) ; put x=hex16. x= test. xgt50= xeq50= xlt50= ; y = q[i] + p ; XgtY = ( x > y ) ; XeqY = ( x = y ) ; XltY = ( x < y ) ; put y=hex16. y= test. xgty= xeqy= xlty= ; end ; end ; run ;

IanWhitlock@westat.com

-----Original Message----- From: Mark Terjeson [mailto:mark.terjeson@NWCSR.COM] Sent: Monday, January 27, 2003 3:18 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Numeric format ranges query

Hi Ian,

Good ol' precision. (so you probably remember now)

Many systems only have 11 to 13 digits of precision and SAS has 15 digits of precision. So if you ignore the decimal point you only get 15 valid digits. Anything after 15 digits is either random garbage or usually truncated.

For STORAGE (or actual value) of a number -- remember to ignore the decimal point, this is the key. DISPLAYING that number is a different thing in displaying a bunch of characters.

Your typical floating point number is going to store 15 good digits plus knowledge of the position where the decimal point is supposed to go.

When you have just a single digit of 9, but you move it 15 or more places to the right, such as:

data _null_; x = 9.0E-15; format x 30.20; put x=; run;

data _null_; x = 9.0E-18; * more than 15 ; format x 30.20; put x=; run;

The single digit of 9 shows up good (more than 15 digits) because the format routine is merely concatenating the leading zero string character "0" together plus a period symbol in the string somewhere. So what you see is not a numeric value but a character string of a single digit of 9 plus the routine adding zeros and period. The counting of 15 digits of precision starts at the 9 and you have room for 14 more good digits, such as:

data _null_; x = 91234567890.0E-25; format x 32.30; put x=; run;

Once you "add" that single digit of 9.0E-15 to any integer (a digit to the left of the decimal point) such as:

data _null_; x = 9.0E-18; x = x + 1; format x 30.20; put x=; run;

... you *now* have to start counting at the integer, which means counting those 15 good digits is the "1" plus the next 14 digits of "0" (remember ignoring the decimal point) and you find that your 9 digit is in the 16th position and is going to go poof (get truncated)

The moral of the story is that "nothing is wrong".

(optical illusion by the format routine! har har)

Hope this is helpful, Mark Terjeson Northwest Crime and Social Research, Inc. A SAS Alliance Partner 215 Legion Way SW Olympia, WA 98501 360.870.2581 - voice,cell 360.570.7533 - fax mailto:mark.terjeson@nwcsr.com www.nwcsr.com

"Nothing is particularly hard if you divide it into small jobs." - Henry Ford, Industrialist

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ian Whitlock Sent: Monday, January 27, 2003 11:20 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Numeric format ranges query

Although Simon has lost interest in his question, it does have an interesting form, in which I am stuck. How good is a SAS format at specifying a "Dedikind" cut of computer decimals? The following report is the result of running the program below. Each line increments X (originally 49) by adding another 9 to the decimal part of X. As the amount added gets smaller and smaller there is a point at which it no longer matters, the number doesn't change. SMALL and BIG come from a formatted value of x. Note the last 3 lines X is slightly bigger than 50, but the format reports it as being smaller. Why is the value of X captured as small when difference from 50 is negative? (I don't see the question as making a practical difference, but it still bothers me.)

p=9.0E-01 x=49.90000000000000000 x=4048F33333333333 small diff=0.1 p=9.0E-02 x=49.99000000000000000 x=4048FEB851EB851F small diff=0.01 p=9.0E-03 x=49.99900000000000000 x=4048FFDF3B645A1D small diff=0.001 p=9.0E-04 x=49.99990000000000000 x=4048FFFCB923A29D small diff=0.0001 p=9.0E-05 x=49.99999000000000000 x=4048FFFFAC1D29DD small diff=1E-5 p=9.0E-06 x=49.99999900000000000 x=4048FFFFF79C8430 small diff=1E-6 p=9.0E-07 x=49.99999990000000000 x=4048FFFFFF29406C small diff=9.9999994E-8 p=9.0E-08 x=49.99999999000000000 x=4048FFFFFFEA8672 small diff=9.9999937E-9 p=9.0E-09 x=49.99999999900000000 x=4048FFFFFFFDDA3F small diff=9.999965E-10 p=9.0E-10 x=49.99999999990000000 x=4048FFFFFFFFC907 small diff=9.999468E-11 p=9.0E-11 x=49.99999999999000000 x=4048FFFFFFFFFA81 small diff=9.997336E-12 p=9.0E-12 x=49.99999999999900000 x=4048FFFFFFFFFF74 small diff=9.947598E-13 p=9.0E-13 x=49.99999999999990000 x=4048FFFFFFFFFFF3 small diff=9.237056E-14 p=9.0E-14 x=50.00000000000000000 x=4049000000000000 big diff=0 p=9.0E-15 x=50.00000000000000000 x=4049000000000001 small diff=-7.10543E-15 p=9.0E-16 x=50.00000000000000000 x=4049000000000001 small diff=-7.10543E-15 p=9.0E-17 x=50.00000000000000000 x=4049000000000001 small diff=-7.10543E-15 ====== x=50.00000000000000000 x=4049000000000000 big

Here is the program with parameter N to allow investigation at other points on the line. Numbers of the form (2**i)-1 (i=1,2,3,...) form an interesting series.

%let n = 49 ; proc format ; value test low - < %eval(&n+1) = "small" other = "big" ; run ;

data _null_ ; goal = %eval(&n+1) ; x = &n ; do i = 1 to 17 ; x = x + 9/10**i ; diff = goal - x ; p = 9/10**i ; put p= e7. x= 30.17 +1 x=hex16. +1 x test. +1 diff = ; end ; x = %eval(&n+1) ; put "======" ; put x= 30.17 +1 x=hex16. +1 x test. ; run ;

If one adds

check = ( goal > x ) ; put check= ;

after the calculation of X, then one can see that the value of CHECK is faithful to the situation, so why isn't the format? Unfortunately either my computer arithmetic or knowledge of the internal working of formats is too weak to provide an explanation.

Perhaps it indicates a small inconsistency in the handling of fuzzing between the calculation of CHECK and the way formatted values are calculated.

IanWhitlock@westat.com

-----Original Message----- From: Simon Gillow [mailto:Simon.Gillow@BBG.CO.UK] Sent: Monday, January 27, 2003 6:07 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Numeric format ranges query

I want a format that does what I would expect the following to do:

proc format; value ltv low - < 50 = '0-50' 50 > - 55 = '50-55' 55 > - 60 = '55-60' 60 > - 65 = '60-65' 65 > - 70 = '65-70' 70 > - 75 = '70-75' 75 > - 80 = '75-80' 80 > - 85 = '80-85' 85 > - 90 = '85-90' 90 > - 95 = '90-95' 95 > - 100 = '95-100' 100 > - 105 = '100-105' 105 > - 110 = '105-110' 110 > - high = '110+' other = 'Unknown'; run;

Unfortunately this code doesnt work in SAS 8.2 (Windows NT based).

Basically is a value is 50 I want it in the '50-55' category, 49.9999999999999 should be in '0-50' and 50.0000000000000000001 should be in '50-55'. It appears that you cannot put > before the -, am I right?

Will I have to resort to: 50.000000001 - < 55 55.000000001 -< 60 etc.....

Sorry if this is a really dumb question but I have searched the web and the archives of this list but got nothing.....

Thanks, Simon

********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager.

This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses.

Bradford & Bingley plc Registered Office: PO Box 88, Croft Road, Crossflatts, Bingley, West Yorkshire, BD16 2UA Registered in England No. 3938288 Regulated by the Financial Services Authority and a Member of the General Insurance Standards Council.

http://www.bbg.co.uk Bradford Bingley plc **********************************************************************


Back to: Top of message | Previous page | Main SAS-L page