On the Precision Attainable with Various Floating-Point Number Systems

1 June 1973

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. C-22 (6) , 601-607
https://doi.org/10.1109/tc.1973.5009113

Abstract

For scientific computations on a digital computer the set of real numbers is usually approximated by a finite set F of ``floating-point'' numbers. We compare the numerical accuracy possible with different choices of F having approximately the same range and requiring the same word length. In particular, we compare different choices of base (or radix) in the usual floating-point systems. The emphasis is on the choice of F, not on the details of the number representation or the arithmetic, but both rounded and truncated arithmetic are considered. Theoretical results are given, and some simulations of typical floating-point computations (forming sums, solving systems of linear equations, finding eigenvalues) are described. If the leading fraction bit of a normalized base-2 number is not stored explicitly (saving a bit), and the criterion is to minimize the mean square roundoff error, then base 2 is best. If unnormalized numbers are allowed, so the first bit must be stored explicitly, then base 4 (or sometimes base 8) is the best of the usual systems.

Keywords

All Related Versions

Version 1, 2010-04-20, ArXiv (Unconfirmed version)

This publication has 19 references indexed in Scilit:

A statistical study of the accuracy of floating point number systems
Communications of the ACM, 1983
A Mean Square Estimate of the Generated Roundoff Error in Constant Matrix Iterative Processes
Journal of the ACM, 1971
Accumulation of Round-Off Error in Fast Fourier Transforms
Journal of the ACM, 1970
On the Distribution of Numbers
Bell System Technical Journal, 1970
Roundoff noise in floating point fast Fourier transform computation
IEEE Transactions on Audio and Electroacoustics, 1969
TheQR andQL algorithms for symmetric matrices
Numerische Mathematik, 1968
Householder's tridiagonalization of a symmetric matrix
Numerische Mathematik, 1968
27 bits are not enough for 8-digit accuracy
Communications of the ACM, 1967
Test of probabilistic models for the propagation of roundoff errors
Communications of the ACM, 1966
Tests of probabilistic models for propagation of roundoff errors
Communications of the ACM, 1966