Misunderstanding Go Language specification on floating-point rounding


The Go language specification on the section about Constant expressions states:

A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.

Does the sentence

This rounding may cause a floating-point constant expression to be invalid in an integer context

point to something like the following:

func main() {
    a := 853784574674.23846278367
    fmt.Println(int8(a)) // output: 0


The quoted part from the spec does not apply to your example, as a is not a constant expression but a variable, so int8(a) is converting a non-constant expression. This conversion is covered by Spec: Conversions, Conversions between numeric types:

When converting a floating-point number to an integer, the fraction is discarded (truncation towards zero).

[…] In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent.

Since you convert a non-constant expression a being 853784574674.23846278367 to an integer, the fraction part is discarded, and since the result does not fit into an int8, the result is not specified, it’s implementation-dependent.

The quoted part means that while constants are represented with a lot higher precision than the builtin types (eg. float64 or int64), the precision that a compiler (have to) implement is not infinite (for practical reasons), and even if a floating point literal is representable precisely, performing operations on them may be carried out with intermediate roundings and may not give mathematically correct result.

The spec includes the minimum supportable precision:

Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:

  • Represent integer constants with at least 256 bits.
  • Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
  • Give an error if unable to represent an integer constant precisely.
  • Give an error if unable to represent a floating-point or complex constant due to overflow.
  • Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.

For example:

const (
    x = 1e100000 + 1
    y = 1e100000

func main() {
    fmt.Println(x - y)

This code should output 1 as x is being 1 larger than y. Running it on the Go Playground outputs 0 because the constant expression x - y is executed with roundings, and the +1 is lost as a result. Both x and y are integers (have no fraction part), so in integer context the result should be 1. But the number being 1e100000, representing it requires around ~333000 bits, which is not a valid requirement from a compiler (according to the spec, 256 bit mantissa is sufficient).

If we lower the constants, we get correct result:

const (
    x = 1e1000 + 1
    y = 1e1000

func main() {
    fmt.Println(x - y)

This outputs the mathematically correct 1 result. Try it on the Go Playground. Representing the number 1e1000 requires around ~3333 bits which seems to be supported (and it’s way above the minimum 256 bit requirement).

Answered By – icza

Answer Checked By – Mary Flores (GoLangFix Volunteer)

Leave a Reply

Your email address will not be published.