Berkeley TestFloat Release 3e: General Documentation

John R. Hauser
2018 January 20

1. Introduction

2. Limitations

3. Acknowledgments and License

4. What TestFloat Does

5. Executing TestFloat

6. Operations Tested by TestFloat

6.1. Conversion Operations

6.2. Basic Arithmetic Operations

6.3. Fused Multiply-Add Operations

6.4. Remainder Operations

6.5. Round-to-Integer Operations

6.6. Comparison Operations

7. Interpreting TestFloat Output

8. Variations Allowed by the IEEE Floating-Point Standard

8.1. Underflow

8.2. NaNs

8.3. Conversions to Integer

9. Contact Information

1. Introduction

Berkeley TestFloat is a small collection of programs for testing that an implementation of binary floating-point conforms to the IEEE Standard for Floating-Point Arithmetic. All operations required by the original 1985 version of the IEEE Floating-Point Standard can be tested, except for conversions to and from decimal. With the current release, the following binary formats can be tested: 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, 80-bit double-extended-precision, and/or 128-bit quadruple-precision. TestFloat cannot test decimal floating-point.

Included in the TestFloat package are the testsoftfloat and timesoftfloat programs for testing the Berkeley SoftFloat software implementation of floating-point and for measuring its speed. Information about SoftFloat can be found at the SoftFloat Web page, http://www.jhauser.us/arithmetic/SoftFloat.html. The testsoftfloat and timesoftfloat programs are expected to be of interest only to people compiling the SoftFloat sources.

This document explains how to use the TestFloat programs. It does not attempt to define or explain much of the IEEE Floating-Point Standard. Details about the standard are available elsewhere.

The current version of TestFloat is Release 3e. This version differs from earlier releases 3b through 3d in only minor ways. Compared to the original Release 3:

Release 3b added the ability to test the 16-bit half-precision format.
Release 3c added the ability to test a rarely used rounding mode, round to odd, also known as jamming.
Release 3d modified the code for testing C arithmetic to potentially include testing newer library functions sqrtf, sqrtl, fmaf, fma, and fmal.

This release adds a few more small improvements, including modifying the expected behavior of rounding mode odd and fixing a minor bug in the all-in-one testfloat program.

Compared to Release 2c and earlier, the set of TestFloat programs, as well as the programs’ arguments and behavior, changed some with Release 3. For more about the evolution of TestFloat releases, see TestFloat-history.html.

2. Limitations

TestFloat output is not always easily interpreted. Detailed knowledge of the IEEE Floating-Point Standard and its vagaries is needed to use TestFloat responsibly.

TestFloat performs relatively simple tests designed to check the fundamental soundness of the floating-point under test. TestFloat may also at times manage to find rarer and more subtle bugs, but it will probably only find such bugs by chance. Software that purposefully seeks out various kinds of subtle floating-point bugs can be found through links posted on the TestFloat Web page, http://www.jhauser.us/arithmetic/TestFloat.html.

3. Acknowledgments and License

The TestFloat package was written by me, John R. Hauser. Release 3 of TestFloat was a completely new implementation supplanting earlier releases. The project to create Release 3 (now through 3e) was done in the employ of the University of California, Berkeley, within the Department of Electrical Engineering and Computer Sciences, first for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab. The work was officially overseen by Prof. Krste Asanovic, with funding provided by these sources:

Par Lab: Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, NVIDIA, Oracle, and Samsung.

ASPIRE Lab: DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, Oracle, and Samsung.

The following applies to the whole of TestFloat Release 3e as well as to each source file individually.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS “AS IS”, AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

4. What TestFloat Does

TestFloat is designed to test a floating-point implementation by comparing its behavior with that of TestFloat’s own internal floating-point implemented in software. For each operation to be tested, the TestFloat programs can generate a large number of test cases, made up of simple pattern tests intermixed with weighted random inputs. The cases generated should be adequate for testing carry chain propagations, and the rounding of addition, subtraction, multiplication, and simple operations like conversions. TestFloat makes a point of checking all boundary cases of the arithmetic, including underflows, overflows, invalid operations, subnormal inputs, zeros (positive and negative), infinities, and NaNs. For the interesting operations like addition and multiplication, millions of test cases may be checked.

TestFloat is not remarkably good at testing difficult rounding cases for division and square root. It also makes no attempt to find bugs specific to SRT division and the like (such as the infamous Pentium division bug). Software that tests for such failures can be found through links on the TestFloat Web page, http://www.jhauser.us/arithmetic/TestFloat.html.

NOTE!
It is the responsibility of the user to verify that the discrepancies TestFloat finds actually represent faults in the implementation being tested. Advice to help with this task is provided later in this document. Furthermore, even if TestFloat finds no fault with a floating-point implementation, that in no way guarantees that the implementation is bug-free.

For each operation, TestFloat can test all five rounding modes defined by the IEEE Floating-Point Standard, plus possibly a sixth mode, round to odd (depending on the options selected when TestFloat was built). TestFloat verifies not only that the numeric results of an operation are correct, but also that the proper floating-point exception flags are raised. All five exception flags are tested, including the inexact flag. TestFloat does not attempt to verify that the floating-point exception flags are actually implemented as sticky flags.

For the 80-bit double-extended-precision format, TestFloat can test the addition, subtraction, multiplication, division, and square root operations at all three of the standard rounding precisions. The rounding precision can be set to 32 bits, equivalent to single-precision, to 64 bits, equivalent to double-precision, or to the full 80 bits of the double-extended-precision. Rounding precision control can be applied only to the double-extended-precision format and only for the five basic arithmetic operations: addition, subtraction, multiplication, division, and square root. Other operations can be tested only at full precision.

As a rule, TestFloat is not particular about the bit patterns of NaNs that appear as operation results. Any NaN is considered as good a result as another. This laxness can be overridden so that TestFloat checks for particular bit patterns within NaN results. See section 8 below, Variations Allowed by the IEEE Floating-Point Standard, plus the -checkNaNs and -checkInvInts options documented for programs testfloat_ver and testfloat.

TestFloat normally compares an implementation of floating-point against the Berkeley SoftFloat software implementation of floating-point, also created by me. The SoftFloat functions are linked into each TestFloat program’s executable. Information about SoftFloat can be found at the Web page http://www.jhauser.us/arithmetic/SoftFloat.html.

For testing SoftFloat itself, the TestFloat package includes a testsoftfloat program that compares SoftFloat’s floating-point against another software floating-point implementation. The second software floating-point is simpler and slower than SoftFloat, and is completely independent of SoftFloat. Although the second software floating-point cannot be guaranteed to be bug-free, the chance that it would mimic any of SoftFloat’s bugs is low. Consequently, an error in one or the other floating-point version should appear as an unexpected difference between the two implementations. Note that testing SoftFloat should be necessary only when compiling a new TestFloat executable or when compiling SoftFloat for some other reason.

5. Executing TestFloat

The TestFloat package consists of five programs, all intended to be executed from a command-line interpreter:

testfloat_gen    Generates test cases for a specific floating-point operation.

testfloat_ver Verifies whether the results from executing a floating-point operation are as expected.

testfloat An all-in-one program that generates test cases, executes floating-point operations, and verifies whether the results match expectations.

testsoftfloat    Like testfloat, but for testing SoftFloat.

timesoftfloat    A program for measuring the speed of SoftFloat (included in the TestFloat package for convenience).

Each program has its own page of documentation that can be opened through the links in the table above.

To test a floating-point implementation other than SoftFloat, one of three different methods can be used. The first method pipes output from testfloat_gen to a program that: (a) reads the incoming test cases, (b) invokes the floating-point operation being tested, and (c) writes the operation results to output. These results can then be piped to testfloat_ver to be checked for correctness. Assuming a vertical bar (|) indicates a pipe between programs, the complete process could be written as a single command like so:

testfloat_gen ... <type> | <program-that-invokes-op> | testfloat_ver ... <function>

The program in the middle is not supplied by TestFloat but must be created independently. If for some reason this program cannot take command-line arguments, the -prefix option of testfloat_gen can communicate parameters through the pipe.

A second method for running TestFloat is similar but has testfloat_gen supply not only the test inputs but also the expected results for each case. With this additional information, the job done by testfloat_ver can be folded into the invoking program to give the following command:

testfloat_gen ... <function> | <program-that-invokes-op-and-compares-results>

Again, the program that actually invokes the floating-point operation is not supplied by TestFloat but must be created independently. Depending on circumstance, it may be preferable either to let testfloat_ver check and report suspected errors (first method) or to include this step in the invoking program (second method).

The third way to use TestFloat is the all-in-one testfloat program. This program can perform all the steps of creating test cases, invoking the floating-point operation, checking the results, and reporting suspected errors. However, for this to be possible, testfloat must be compiled to contain the method for invoking the floating-point operations to test. Each build of testfloat is therefore capable of testing only the floating-point implementation it was built to invoke. To test a new implementation of floating-point, a new testfloat must be created, linked to that specific implementation. By comparison, the testfloat_gen and testfloat_ver programs are entirely generic; one instance is usable for testing any floating-point implementation, because implementation-specific details are segregated in the custom program that follows testfloat_gen.

Program testsoftfloat is another all-in-one program specifically for testing SoftFloat.

Programs testfloat_ver, testfloat, and testsoftfloat all report status and error information in a common way. As it executes, each of these programs writes status information to the standard error output, which should be the screen by default. In order for this status to be displayed properly, the standard error stream should not be redirected to a file. Any discrepancies that are found are written to the standard output stream, which is easily redirected to a file if desired. Unless redirected, reported errors will appear intermixed with the ongoing status information in the output.

6. Operations Tested by TestFloat

TestFloat can test all operations required by the original 1985 IEEE Floating-Point Standard except for conversions to and from decimal. These operations are:

conversions among the supported floating-point formats, and also between integers (32-bit and 64-bit, signed and unsigned) and any of the floating-point formats;
for each floating-point format, the usual addition, subtraction, multiplication, division, and square root operations;
for each format, the floating-point remainder operation defined by the IEEE Standard;
for each format, a “round to integer” operation that rounds to the nearest integer value in the same format; and
comparisons between two values in the same floating-point format.

In addition, TestFloat can also test

for each floating-point format except 80-bit double-extended-precision, the fused multiply-add operation defined by the 2008 IEEE Standard.

More information about all these operations is given below. In the operation names used by TestFloat, 16-bit half-precision is called f16, 32-bit single-precision is f32, 64-bit double-precision is f64, 80-bit double-extended-precision is extF80, and 128-bit quadruple-precision is f128. TestFloat generally uses the same names for operations as Berkeley SoftFloat, except that TestFloat’s names never include the M that SoftFloat uses to indicate that values are passed through pointers.

6.1. Conversion Operations

All conversions among the floating-point formats and all conversions between a floating-point format and 32-bit and 64-bit integers can be tested. The conversion operations are:

ui32_to_f16      ui64_to_f16      i32_to_f16       i64_to_f16
ui32_to_f32      ui64_to_f32      i32_to_f32       i64_to_f32
ui32_to_f64      ui64_to_f64      i32_to_f64       i64_to_f64
ui32_to_extF80   ui64_to_extF80   i32_to_extF80    i64_to_extF80
ui32_to_f128     ui64_to_f128     i32_to_f128      i64_to_f128

f16_to_ui32      f32_to_ui32      f64_to_ui32      extF80_to_ui32    f128_to_ui32
f16_to_ui64      f32_to_ui64      f64_to_ui64      extF80_to_ui64    f128_to_ui64
f16_to_i32       f32_to_i32       f64_to_i32       extF80_to_i32     f128_to_i32
f16_to_i64       f32_to_i64       f64_to_i64       extF80_to_i64     f128_to_i64

f16_to_f32       f32_to_f16       f64_to_f16       extF80_to_f16     f128_to_f16
f16_to_f64       f32_to_f64       f64_to_f32       extF80_to_f32     f128_to_f32
f16_to_extF80    f32_to_extF80    f64_to_extF80    extF80_to_f64     f128_to_f64
f16_to_f128      f32_to_f128      f64_to_f128      extF80_to_f128    f128_to_extF80

Abbreviations ui32 and ui64 indicate 32-bit and 64-bit unsigned integer types, while i32 and i64 indicate their signed counterparts. These conversions all round according to the current rounding mode as relevant. Conversions from a smaller to a larger floating-point format are always exact and so require no rounding. Likewise, conversions from 32-bit integers to 64-bit double-precision or to any larger floating-point format are also exact, as are conversions from 64-bit integers to 80-bit double-extended-precision and 128-bit quadruple-precision.

For the all-in-one testfloat program, this list of conversion operations requires amendment. For testfloat only, conversions to an integer type have names that explicitly specify the rounding mode and treatment of inexactness. Thus, instead of

<float>_to_<int>

as listed above, operations converting to integer type have names of these forms:

<float>_to_<int>_r_<round>
<float>_to_<int>_rx_<round>

The <round> component is one of ‘near_even’, ‘near_maxMag’, ‘minMag’, ‘min’, or ‘max’, choosing the rounding mode. Any other indication of rounding mode is ignored. The operations with ‘_r_’ in their names never raise the inexact exception, while those with ‘_rx_’ raise the inexact exception whenever the result is not exact.

TestFloat assumes that conversions from floating-point to an integer type should raise the invalid exception if the input cannot be rounded to an integer representable in the result format. In such a circumstance:

If the result type is an unsigned integer, TestFloat normally expects the result of the operation to be the type’s largest integer value. In the case that the input is a negative number (not a NaN), a zero result may also be accepted.
If the result type is a signed integer and the input is a number (not a NaN), TestFloat expects the result to be the largest-magnitude integer with the same sign as the input. When a NaN is converted to a signed integer type, TestFloat allows either the largest postive or largest-magnitude negative integer to be returned.

Conversions to integer types are expected never to raise the overflow exception.

6.2. Basic Arithmetic Operations

The following standard arithmetic operations can be tested:

f16_add      f16_sub      f16_mul      f16_div      f16_sqrt
f32_add      f32_sub      f32_mul      f32_div      f32_sqrt
f64_add      f64_sub      f64_mul      f64_div      f64_sqrt
extF80_add   extF80_sub   extF80_mul   extF80_div   extF80_sqrt
f128_add     f128_sub     f128_mul     f128_div     f128_sqrt

The double-extended-precision (extF80) operations can be rounded to reduced precision under rounding precision control.

6.3. Fused Multiply-Add Operations

For all floating-point formats except 80-bit double-extended-precision, TestFloat can test the fused multiply-add operation defined by the 2008 IEEE Floating-Point Standard. The fused multiply-add operations are:

f16_mulAdd
f32_mulAdd
f64_mulAdd
f128_mulAdd

If one of the multiplication operands is infinite and the other is zero, TestFloat expects the fused multiply-add operation to raise the invalid exception even if the third operand is a quiet NaN.

6.4. Remainder Operations

For each format, TestFloat can test the IEEE Standard’s remainder operation. These operations are:

f16_rem
f32_rem
f64_rem
extF80_rem
f128_rem

The remainder operations are always exact and so require no rounding.

6.5. Round-to-Integer Operations

For each format, TestFloat can test the IEEE Standard’s round-to-integer operation. For most TestFloat programs, these operations are:

f16_roundToInt
f32_roundToInt
f64_roundToInt
extF80_roundToInt
f128_roundToInt

Just as for conversions to integer types (section 6.1 above), the all-in-one testfloat program is again an exception. For testfloat only, the round-to-integer operations have names of these forms:

<float>_roundToInt_r_<round>
<float>_roundToInt_x

For the ‘_r_’ versions, the inexact exception is never raised, and the <round> component specifies the rounding mode as one of ‘near_even’, ‘near_maxMag’, ‘minMag’, ‘min’, or ‘max’. The usual indication of rounding mode is ignored. In contrast, the ‘_x’ versions accept the usual indication of rounding mode and raise the inexact exception whenever the result is not exact. This irregular system follows the IEEE Standard’s particular specification for the round-to-integer operations.

6.6. Comparison Operations

The following floating-point comparison operations can be tested:

f16_eq      f16_le      f16_lt
f32_eq      f32_le      f32_lt
f64_eq      f64_le      f64_lt
extF80_eq   extF80_le   extF80_lt
f128_eq     f128_le     f128_lt

The abbreviation eq stands for “equal” (=), le stands for “less than or equal” (≤), and lt stands for “less than” (<).

The IEEE Standard specifies that, by default, the less-than-or-equal and less-than comparisons raise the invalid exception if either input is any kind of NaN. The equality comparisons, on the other hand, are defined by default to raise the invalid exception only for signaling NaNs, not for quiet NaNs. For completeness, the following additional operations can be tested if supported:

f16_eq_signaling      f16_le_quiet      f16_lt_quiet
f32_eq_signaling      f32_le_quiet      f32_lt_quiet
f64_eq_signaling      f64_le_quiet      f64_lt_quiet
extF80_eq_signaling   extF80_le_quiet   extF80_lt_quiet
f128_eq_signaling     f128_le_quiet     f128_lt_quiet

The signaling equality comparisons are identical to the standard operations except that the invalid exception should be raised for any NaN input. Similarly, the quiet comparison operations should be identical to their counterparts except that the invalid exception is not raised for quiet NaNs.

Obviously, no comparison operations ever require rounding. Any rounding mode is ignored.

7. Interpreting TestFloat Output

The “errors” reported by TestFloat programs may or may not really represent errors in the system being tested. For each test case tried, the results from the floating-point implementation being tested could differ from the expected results for several reasons:

The IEEE Floating-Point Standard allows for some variation in how conforming floating-point behaves. Two implementations can sometimes give different results without either being incorrect.
The trusted floating-point emulation could be faulty. This could be because there is a bug in the way the emulation is coded, or because a mistake was made when the code was compiled for the current system.
The TestFloat program may not work properly, reporting differences that do not exist.
Lastly, the floating-point being tested could actually be faulty.

It is the responsibility of the user to determine the causes for the discrepancies that are reported. Making this determination can require detailed knowledge about the IEEE Standard. Assuming TestFloat is working properly, any differences found will be due to either the first or last of the reasons above. Variations in the IEEE Standard that could lead to false error reports are discussed in section 8, Variations Allowed by the IEEE Floating-Point Standard.

For each reported error (or apparent error), a line of text is written to the default output. If a line would be longer than 79 characters, it is divided. The first part of each error line begins in the leftmost column, and any subsequent “continuation” lines are indented with a tab.

Each error reported is of the form:

<inputs>  => <observed-output>  expected: <expected-output>

The <inputs> are the inputs to the operation. Each output (observed or expected) is shown as a pair: the result value first, followed by the exception flags.

For example, two typical error lines could be

-00.7FFF00  -7F.000100  => +01.000000 ...ux  expected: +01.000000 ....x
+81.000004  +00.1FFFFF  => +01.000000 ...ux  expected: +01.000000 ....x

In the first line, the inputs are -00.7FFF00 and -7F.000100, and the observed result is +01.000000 with flags ...ux. The trusted emulation result is the same but with different flags, ....x. Items such as -00.7FFF00 composed of a sign character (+/-), hexadecimal digits, and a single period represent floating-point values (here 32-bit single-precision). The two instances above were reported as errors because the exception flag results differ.

Aside from the exception flags, there are ten data types that may be represented. Five are floating-point types: 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, 80-bit double-extended-precision, and 128-bit quadruple-precision. The remaining five types are 32-bit and 64-bit unsigned integers, 32-bit and 64-bit two’s-complement signed integers, and Boolean values (the results of comparison operations). Boolean values are represented as a single character, either a 0 (false) or a 1 (true). A 32-bit integer is represented as 8 hexadecimal digits. Thus, for a signed 32-bit integer, FFFFFFFF is −1, and 7FFFFFFF is the largest positive value. 64-bit integers are the same except with 16 hexadecimal digits.

Floating-point values are written decomposed into their sign, encoded exponent, and encoded significand. First is the sign character (+ or -), followed by the encoded exponent in hexadecimal, then a period (.), and lastly the encoded significand in hexadecimal.

For 16-bit half-precision, notable values include:

+00.000 +0

+0F.000 1

+10.000 2

+1E.3FF maximum finite value

+1F.000 +infinity

-00.000 −0

-0F.000 −1

-10.000 −2

-1E.3FF minimum finite value (largest magnitude, but negative)

-1F.000 −infinity

Certain categories are easily distinguished (assuming the xs are not all 0):

+00.xxx positive subnormal numbers

+1F.xxx positive NaNs

-00.xxx negative subnormal numbers

-1F.xxx negative NaNs

Likewise for other formats:

32-bit single 64-bit double 128-bit quadruple

+00.000000 +000.0000000000000 +0000.0000000000000000000000000000 +0

+7F.000000 +3FF.0000000000000 +3FFF.0000000000000000000000000000 1

+80.000000 +400.0000000000000 +4000.0000000000000000000000000000 2

+FE.7FFFFF +7FE.FFFFFFFFFFFFF +7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF maximum finite value

+FF.000000 +7FF.0000000000000 +7FFF.0000000000000000000000000000 +infinity

-00.000000 -000.0000000000000 -0000.0000000000000000000000000000 −0

-7F.000000 -3FF.0000000000000 -3FFF.0000000000000000000000000000 −1

-80.000000 -400.0000000000000 -4000.0000000000000000000000000000 −2

-FE.7FFFFF -7FE.FFFFFFFFFFFFF -7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF minimum finite value

-FF.000000 -7FF.0000000000000 -7FFF.0000000000000000000000000000 −infinity

+00.xxxxxx +000.xxxxxxxxxxxxx +0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx positive subnormals

+FF.xxxxxx +7FF.xxxxxxxxxxxxx +7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx positive NaNs

-00.xxxxxx -000.xxxxxxxxxxxxx -0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx negative subnormals

-FF.xxxxxx -7FF.xxxxxxxxxxxxx -7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx negative NaNs

The 80-bit double-extended-precision values are a little unusual in that the leading bit of precision is not hidden as with other formats. When canonically encoded, the leading significand bit of an 80-bit double-extended-precision value will be 0 if the value is zero or subnormal, and will be 1 otherwise. Hence, the same values listed above appear in 80-bit double-extended-precision as follows (note the leading 8 digit in the significands):

+0000.0000000000000000 +0

+3FFF.8000000000000000 1

+4000.8000000000000000 2

+7FFE.FFFFFFFFFFFFFFFF maximum finite value

+7FFF.8000000000000000 +infinity

-0000.0000000000000000 −0

-3FFF.8000000000000000 −1

-4000.8000000000000000 −2

-7FFE.FFFFFFFFFFFFFFFF minimum finite value

-7FFF.8000000000000000 −infinity

Lastly, exception flag values are represented by five characters, one character per flag. Each flag is written as either a letter or a period (.) according to whether the flag was set or not by the operation. A period indicates the flag was not set. The letter used to indicate a set flag depends on the flag:

v invalid exception

i infinite exception (“divide by zero”)

o overflow exception

u underflow exception

x inexact exception

For example, the notation ...ux indicates that the underflow and inexact exception flags were set and that the other three flags (invalid, infinite, and overflow) were not set. The exception flags are always written following the value returned as the result of the operation.

8. Variations Allowed by the IEEE Floating-Point Standard

The IEEE Floating-Point Standard admits some variation among conforming implementations. Because TestFloat expects the two implementations being compared to deliver bit-for-bit identical results under most circumstances, this leeway in the standard can result in false errors being reported if the two implementations do not make the same choices everywhere the standard provides an option.

8.1. Underflow

The standard specifies that the underflow exception flag is to be raised when two conditions are met simultaneously: (1) tininess and (2) loss of accuracy.

A result is tiny when its magnitude is nonzero yet smaller than any normalized floating-point number. The standard allows tininess to be determined either before or after a result is rounded to the destination precision. If tininess is detected before rounding, some borderline cases will be flagged as underflows even though the result after rounding actually lies within the normal floating-point range. By detecting tininess after rounding, a system can avoid some unnecessary signaling of underflow. All the TestFloat programs support options -tininessbefore and -tininessafter to control whether TestFloat expects tininess on underflow to be detected before or after rounding. One or the other is selected as the default when TestFloat is compiled, but these command options allow the default to be overridden.

Loss of accuracy occurs when the subnormal format is not sufficient to represent an underflowed result accurately. The original 1985 version of the IEEE Standard allowed loss of accuracy to be detected either as an inexact result or as a denormalization loss; however, few if any systems ever chose the latter. The latest standard requires that loss of accuracy be detected as an inexact result, and TestFloat can test only for this case.

8.2. NaNs

The IEEE Standard gives the floating-point formats a large number of NaN encodings and specifies that NaNs are to be returned as results under certain conditions. However, the standard allows an implementation almost complete freedom over which NaN to return in each situation.

By default, TestFloat does not check the bit patterns of NaN results. When the result of an operation should be a NaN, any NaN is considered as good as another. This laxness can be overridden with the -checkNaNs option of programs testfloat_ver and testfloat. In order for this option to be sensible, TestFloat must have been compiled so that its internal floating-point implementation (SoftFloat) generates the proper NaN results for the system being tested.

8.3. Conversions to Integer

Conversion of a floating-point value to an integer format will fail if the source value is a NaN or if it is too large. The IEEE Standard does not specify what value should be returned as the integer result in these cases. Moreover, according to the standard, the invalid exception can be raised or an unspecified alternative mechanism may be used to signal such cases.

TestFloat assumes that conversions to integer will raise the invalid exception if the source value cannot be rounded to a representable integer. In such cases, TestFloat expects the result value to be the largest-magnitude positive or negative integer or zero, as detailed earlier in section 6.1, Conversion Operations. If option -checkInvInts is selected with programs testfloat_ver and testfloat, integer results of invalid operations are checked for an exact match. In order for this option to be sensible, TestFloat must have been compiled so that its internal floating-point implementation (SoftFloat) generates the proper integer results for the system being tested.

9. Contact Information

At the time of this writing, the most up-to-date information about TestFloat and the latest release can be found at the Web page http://www.jhauser.us/arithmetic/TestFloat.html.

Par Lab:		Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, NVIDIA, Oracle, and Samsung.
ASPIRE Lab:		DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, Oracle, and Samsung.

`testfloat_gen`	Generates test cases for a specific floating-point operation.
`testfloat_ver`	Verifies whether the results from executing a floating-point operation are as expected.
`testfloat`	An all-in-one program that generates test cases, executes floating-point operations, and verifies whether the results match expectations.
`testsoftfloat`	Like `testfloat`, but for testing SoftFloat.
`timesoftfloat`	A program for measuring the speed of SoftFloat (included in the TestFloat package for convenience).

`+00.000`	+0
`+0F.000`	1
`+10.000`	2
`+1E.3FF`	maximum finite value
`+1F.000`	+infinity

`-00.000`	−0
`-0F.000`	−1
`-10.000`	−2
`-1E.3FF`	minimum finite value (largest magnitude, but negative)
`-1F.000`	−infinity

`+00.xxx`	positive subnormal numbers
`+1F.xxx`	positive NaNs
`-00.xxx`	negative subnormal numbers
`-1F.xxx`	negative NaNs

32-bit single	64-bit double	128-bit quadruple

`+00.000000`	`+000.0000000000000`	`+0000.0000000000000000000000000000`	+0
`+7F.000000`	`+3FF.0000000000000`	`+3FFF.0000000000000000000000000000`	1
`+80.000000`	`+400.0000000000000`	`+4000.0000000000000000000000000000`	2
`+FE.7FFFFF`	`+7FE.FFFFFFFFFFFFF`	`+7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF`	maximum finite value
`+FF.000000`	`+7FF.0000000000000`	`+7FFF.0000000000000000000000000000`	+infinity

`-00.000000`	`-000.0000000000000`	`-0000.0000000000000000000000000000`	−0
`-7F.000000`	`-3FF.0000000000000`	`-3FFF.0000000000000000000000000000`	−1
`-80.000000`	`-400.0000000000000`	`-4000.0000000000000000000000000000`	−2
`-FE.7FFFFF`	`-7FE.FFFFFFFFFFFFF`	`-7FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF`	minimum finite value
`-FF.000000`	`-7FF.0000000000000`	`-7FFF.0000000000000000000000000000`	−infinity

`+00.xxxxxx`	`+000.xxxxxxxxxxxxx`	`+0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx`	positive subnormals
`+FF.xxxxxx`	`+7FF.xxxxxxxxxxxxx`	`+7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx`	positive NaNs
`-00.xxxxxx`	`-000.xxxxxxxxxxxxx`	`-0000.xxxxxxxxxxxxxxxxxxxxxxxxxxxx`	negative subnormals
`-FF.xxxxxx`	`-7FF.xxxxxxxxxxxxx`	`-7FFF.xxxxxxxxxxxxxxxxxxxxxxxxxxxx`	negative NaNs

`+0000.0000000000000000`	+0
`+3FFF.8000000000000000`	1
`+4000.8000000000000000`	2
`+7FFE.FFFFFFFFFFFFFFFF`	maximum finite value
`+7FFF.8000000000000000`	+infinity

`-0000.0000000000000000`	−0
`-3FFF.8000000000000000`	−1
`-4000.8000000000000000`	−2
`-7FFE.FFFFFFFFFFFFFFFF`	minimum finite value
`-7FFF.8000000000000000`	−infinity

`v`	invalid exception
`i`	infinite exception (“divide by zero”)
`o`	overflow exception
`u`	underflow exception
`x`	inexact exception