[squeakdev] Re: [ANN] Number comparison, hash, NaN, Point,
and other partially ordered sets
Tim Olson
tim_olson at att.net
Fri Jan 9 15:10:41 UTC 2009
On Jan 9, 2009, at 2:29 AM, HansMartin Mosner wrote:
> First of all, does it matter? If I understand correctly, this behavior
> is only present for denormalized numbers.
If you set the internal roundingprecision mode in the x86 control
register to doubleprecision, then yes, the multiple rounding issue
goes away for most computations. It only remains when generating
denormal results because the exponent field size still remains at
extended precision during the computation, so the denormalized result
is only generated during the conversion back to doubleprecision
format, leading to multiple rounding operations.
> Do these appear in realworld
> cases?
That's hard to say. It might be interesting to instrument the VM to
check for denormal operands or results on the float operations to get a
feel for how often (if ever) they are occurring.
> I've tried to analyze the case in question and came to the following
> results:
> The exact mantissa after multiplication is 3B16EF930A76E.80002C69F96C2
> (the hex digits after the point are those that should be rounded off
> when going to a 52bit mantissa). The result with "A76F" as the last
> hex
> digits would therefore be the correct value for an IEEE754 double
> precision multiplication (rounding to the nearest representable
> number),
> so the PPC implementation does it right.
> When doing an extended double precision multiplication, there are some
> more bits in the mantissa, and the mantissa of the intermediate result
> looks like ...A76E.800 which is exacly in the middle between two
> representable numbers. Converting to double precision involves rounding
> the mantissa again, and the rounding rule for this case (exact middle)
> says to round to the nearest even number, which is ...A76E.
That's sort of what is going on, but it is complicated here due to the
way denorms are handled. What is actually happening is:
1st operand (binary):
1.0011001000001000101000100101111000000100111010000111
(note "hidden" 1 bit added to the left of the radix point because it
is a normalized value)
2nd operand (binary):
0.0011000101101101110100011101000000101101000110101110
(note no "hidden" 1 bit because it is a denormalized value)
Before multiplication, the 2nd operand is renormalized, adjusting the
exponent accordingly:
1.1000101101101110100011101000000101101000110101110000
The exact product is:
1.1101100010110111011111001001100001010011101101110100,0000000000000001.
..
The comma (,) is at the doubleprecision rounding position.
PPC operation:
The product exponent is smaller than can be represented in a normalized
doubleprecision format, so the result is first denormalized:
0.001101100010110111011111001001100001010011101101110,100000000000000000
1...
Then roundtonearest rounding adds an ULP because the bits to the
right of the rounding position are greater than halfway:
0.001101100010110111011111001001100001010011101101111
x86 (extended with doublerounding mode):
Because the exponent field is still sized for 80bit extended floats,
the exact product is still representable as a normalized number:
1.1101100010110111011111001001100001010011101101110100,0000000000000001.
..
Then roundtonearest drops the bits to the right of the
doubleprecision rounding point without adding an ULP, because the bits
to the right are less than halfway:
1.1101100010110111011111001001100001010011101101110100
Then the result is converted to a doubleprecision representation when
storing to memory, which causes it to become denormalized:
0.0011101100010110111011111001001100001010011101101110,100
Roundtonearest rounding does not add an ULP because the bits to the
right of the rounding position are exactly halfway, and in that case
the nearest even result is selected:
0.0011101100010110111011111001001100001010011101101110
> Is it at all possible to get the x86 FPU to produce the correct result
> for this situation?
Not using the extendedprecision FPU, without an extreme performance
loss. You basically would have to cause an exception for all imprecise
results (lots of them!) and check them in software. If you perform the
operations in the SSE unit, then that will work, provided all the x86
platforms you run on support SSE2 with doubleprecision scalars.
> Interestingly, this article
> (http://www.vinc17.org/research/extended.en.html) claims that the FPU
> is
> set up for incorrect rounding under Linux only, but I could not
> reproduce the test case given there with Squeak (which probably means
> that I mistranslated the Java example).
That example shows the effect for normalized intermediate results when
the intermediate result is rounded to extended precision, then
doubleprecision rounding occurs when converting to doubleprecision
format. That can be fixed by setting the roundingprecision mode to
double in the fpu control register, but, as shown in the example above,
it does not work in the case of denormalized intermediate results.
I suspect that your Squeak VM has the roundingprecision mode set to
double, which fixes most of the cases.
 tim
More information about the Squeakdev
mailing list
