Those languages which print 0.30000000000000004 are doing something which is sil...

StefanKarpinski · on Nov 13, 2015

This is a common but fundamentally wrong view of floating-point numbers. You are not "making shit up" by printing more that 15 digits of a floating-point number. Floats, doubles, etc. have precise values. As I said in another comment, the double represented by the literal `0.1` is precisely 0.1000000000000000055511151231257827021181583404541015625; similarly, `0.2` is precisely 0.200000000000000011102230246251565404236316680908203125 and `0.3` is precisely 0.299999999999999988897769753748434595763683319091796875. From these exact values, you can see why `0.1 + 0.2 != 0.3` – the left hand side is greater than 0.3 while the right hand side is smaller than 0.3. Printing only 15 digits is not doing programmers any favors: even though `0.1 + 0.2` will print as "0.3", that just makes the lie even worse – they will be even more confused when `0.1 + 0.2 != 0.3` even though both values print identically.

kazinator · on Nov 14, 2015

Your claim is fallacious, because a floating-point value denotes a range of the real number line, whereas you're insisting that, no, it stands for its literal value: the absolutely precise rational number that lies at the center of that range.

That is no more true than my current body temperature being precisely 36 4/10 degrees Celsius because the thermometer reads 36.1.

All numbers in that range alias to the same double, yet differ wildly in their decimal digits beyond the fifteenth. That "long tail" of digits is a meaningless residue arising from the arbitrary difference between the chosen center-of-range point and the actual number it approximates.

> Printing only 15 digits is not doing programmers any favors

Note that the ISO C function printf uses 6 digits of precision under the %g conversion specifier by default. So a team of experts can find it justifiable to severly truncate precision on printing. I find that justifiable also, like this: printing is not only for constants, and trivial expressions like 0.1 + 0.2, but for the results of complex calculations, which accumulate significant rounding errors. Six digits of precision is prudent: for many complex calculations, it will avoid misleading the user with too much precision. Fifteen decimal digits will be wrong after just a few operations; there is hardly any "headroom" to absorb error.

> just makes the lie even worse

If merely neglecting to reveal some aspect of the stark truth is tantamount to lying, then you're also lying when you pull a number with 54 significant decimal figures out of a 64 bit double. Revealing some truth without an explanation can also be construed as "lying", as in "[N]one of woman born shall harm Macbeth".

I personally find it convenient that when I bang the token 0.3 into my REPL, it comes back with a tidy 0.3. I know that there are several values of double which will print as 0.3, and don't compare with exact equality except in special circumstances (like when deliberately using integral values not far from zero).

acidflask · on Nov 15, 2015

> Your claim is fallacious, because a floating-point value denotes a range of the real number line, whereas you're insisting that, no, it stands for its literal value: the absolutely precise rational number that lies at the center of that range

See Misconception #2 of http://lipforge.ens-lyon.fr/www/crlibm/documents/cern.pdf.

Floating point numbers are _not_ intervals. If you read carefully any formal definition of floating point numbers (The IEEE standards, TAoCP Chapter 4, or Higham's _Accuracy and Stability of Numerical Algorithms_, to name just three possible references), you will see that floating point numbers by definition form an exact rational subset of the extended real line.

> All numbers in that range alias to the same double, yet differ wildly in their decimal digits beyond the fifteenth. That "long tail" of digits is a meaningless residue arising from the arbitrary difference between the chosen center-of-range point and the actual number it approximates.

See Misconceptions #1 and #2 on Kahan's list (https://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf).

You are not being clear about the set of real numbers a user may input and their ultimate representation as a floating point number. While it's true that many real numbers round to the same floating point number _f_, that's not the same as then saying that _f_ carries that interval around with it. The latter is false, since the intervals do _not_ propagate in floating point arithmetic. It's also false that the floating point number _f_ is the midpoint of the set of numbers that round to _f_; the precise set depends on the rounding mode and the granularity of the set of floating point numbers around _f_.

kazinator · on Nov 15, 2015

I didn't say they were intervals (I'm well aware of interval representations for numbers, which single point rationals are not).

> that floating point numbers by definition form an exact rational subset of the extended real line.

Sure, that's what they form; it's not what they (usually) denote.

> It's also false that the floating point number _f_ is the midpoint of the set of numbers that round to _f_;

It is close enough to being the case for my purpose in the grandparent article. Whether the cluster of numbers is lopsided one way or the other doesn't really detract from my point.

StefanKarpinski · on Nov 15, 2015

Are they specific values or are they intervals? You can't have it both ways.

StefanKarpinski · on Nov 14, 2015

Your view has been repeatedly refuted by Kahan and others. See page 25 of [1], page 1 of [2], page 22 of [3] for just a few examples. Conceptually, the reason the interval view is wrong is that if floating point values represented intervals, then their arithmetic would be incorrect. For example, the sum of an interval of width w1 and an interval of width w2 should be an interval of width w1 + w2. That, of course, is not how floats work at all. The correct view is that floats represent exact rational values and operations on them, instead of being the true mathematical ones – under which floats are not closed – compute the closest float to the true result of each operation.

[1] https://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

[2] https://math.mit.edu/~stevenj/18.335/fp-myths.pdf

[3] http://lyoncalcul.univ-lyon1.fr/IMG/pdf/FloatingPoint.pdf

crpatino · on Nov 13, 2015

Or, for that matter, 1.0 + 2.0 != 3.0

Stratoscope · on Nov 13, 2015

1.0, 2.0, and 3.0 are all represented perfectly and precisely in IEE754 floating point, as are all integers up through 2^24 (for 32-bit floats) or 2^53 (for 64-bit doubles).

So 1.0 + 2.0 == 3.0.

crpatino · on Nov 13, 2015

How is that? I assume the mantissa is chosen to represent the full number, but how is the exponent set so that the granularity is of exactly 1.0?

And of course 1x10^7 + 2x10^7 != 3x10^7, right?

Stratoscope · on Nov 13, 2015

2^0 is 1, so if the exponent is 0 then the mantissa times one is your value, so every integer has an exact representation.

If the exponent is 1, then the mantissa is multiplied by 2 to get the actual value. Put another way, the mantissa is the value divided by 2 and rounded, i.e. shifted right one bit.

> 1x10^7 + 2x10^7 != 3x10^7, right?

This particular case is interesting.

If we're talking about 64-bit doubles, then you have 53 bits of precision for integers. All those values are well within that range, so this arithmetic and comparison are done with precise integer values, and the expression will compare equal.

If we're talking about 32-bit floats, the range of precise integer values is -16,777,216 through 16,777,216, or -2^24 through 2^24.

1x10^7 is 10,000,000, within that range, but the other two values are outside the range.

So you might expect that != would be the answer here.

But if you test it, that isn't the case: the expression compares equal!

The reason: although 20,000,000 and 30,000,000 are outside the range where every integer has a precise representation, they are within the range where every even integer is precise: -33,554,432 to 33,554,432. Values within this range but outside the range of precise integers are rounded to a multiple of 2.

Similarly in the range -67,108,864 to 67,108,864, all integers which are a multiple of 4 are represented precisely.

Basically, as you go outside the range of precise integers, values get rounded to a multiple of 2, 4, 8, etc. as required.

When the mantissa overflows the available precision, it is shifted to the right enough so that it fits without losing the most significant bits. Instead, the least significant bits are discarded, and the exponent is incremented by the number of discarded bits.

Of course you could choose other values where the rounding doesn't work out in your favor, and then you'd get the unequal comparison you expect. A simple example:

33333333.0f + 1.0f != 33333334.0f

Here, the value 33,333,333 is rounded down to 33,333,332. Add 1 to that and you get 33,333,333, which is again rounded down to 33,333,332.

33,333,334 is represented precisely, so the comparison fails.

https://en.wikipedia.org/wiki/Single-precision_floating-poin...

dnautics · on Nov 13, 2015

Because integers are an integer multiple of a clean power of 2 (2^0 == 1)

crpatino · on Nov 13, 2015

Stratoscope was already done a superb job explaining. But just to show why my question was not that dumb...

Let's say you want to represent "two-million" in 32bit float:

Your answer is: 0x1e8480 * 2^0.

However, I was expecting something like: 0x1.e848 * 2^X. But then "125-thousand" is 0x1.e848 * 2^Y, so how do you standardize what is the correct exponent for integers, X or Y? And how do you know there is not a pair (M,Z) != (0x1.e848, X) such that M * 2^Z is also two-million?

dnautics · on Nov 14, 2015

the fraction of a floating point is always "starts with on invisible 1.xxxx" so that forces unique representations for your exponent for any given number. Is it not clear why this is the case?

let's take a simple case where you have one bit of fraction.

   exponent 0, 2^0 = 1, fraction (1)0 = 1.0b = 1

   exponent 0, 2^0 = 1, fraction (1)1 = 1.1b = 1.5

   exponent 1, 2^1 = 2, fraction (1)0 = 10b = 2

   exponent 1, 2^1 = 2, fraction (1)1 = 11b = 3

if we have more fraction bits it looks like this:

   exponent 0, 2^0 = 1, fraction (1)00 = 1.00b = 1

   exponent 0, 2^0 = 1, fraction (1)01 = 1.01b = 1.25

   exponent 0, 2^0 = 1, fraction (1)10 = 1.10b = 1.5

   exponent 0, 2^0 = 1, fraction (1)11 = 1.11b = 1.75

   exponent 1, 2^1 = 2, fraction (1)00 = 10.0b = 2

   exponent 1, 2^1 = 2, fraction (1)01 = 10.1b = 2.5

   exponent 1, 2^1 = 2, fraction (1)10 = 11.0b = 3

   exponent 1, 2^1 = 2, fraction (1)11 = 11.1b = 3.5

So for any given exponent multiplication with a fraction with an invisible bit creates products that are hemmed between 1.0x2^n and 2.0*2^n so there's no collisions between any pair (e,f) in your representation.

on Nov 13, 2015

[deleted]

fryguy · on Nov 13, 2015

You quoted out of context. The full quote was:

> the double represented by the literal `0.1` is precisely

Which is completely true. 2/3 represented in a format that has 5 decimal digits is precisely 0.66667 as well.

ArchReaper · on Nov 13, 2015

I don't think you understand what he is saying here. The value is stored in memory as a precise number.

mortehu · on Nov 13, 2015

> It is misleading to print more decimal digits out of a double than DBL_DIG

If you want round-trip conversion for all `double` numbers, i.e. from `double` to text and back, you need 17 digits of precision (%.17g).

Likewise, if you want round-trip conversion for all `float` number, you need 9 digits (%.9g).

The reason for this is that even though these types can really only reliably give 15 and 6 digits of precision, some of the numbers they can represent are closer to each other than that.

In C++, you can get the former limit using std::numeric_limits<double>::digits10, and the latter using std::numeric_limits<double>::max_digits10.

http://en.cppreference.com/w/cpp/types/numeric_limits/max_di...

kazinator · on Nov 13, 2015

> In C++, you can get the former limit using std::numeric_limits<double>::digits10, and the latter using std::numeric_limits<double>::max_digits10.

Also, via GCC specific constants:

  $ echo | gcc -E -dM - | grep -E '(FLT|DBL).*DIG'
  #define __DBL_DIG__ 15
  #define __FLT_MANT_DIG__ 24
  #define __LDBL_MANT_DIG__ 64
  #define __FLT_DIG__ 6
  #define __DBL_MANT_DIG__ 53
  #define __DBL_DECIMAL_DIG__ 17
  #define __LDBL_DIG__ 18
  #define __FLT_DECIMAL_DIG__ 9

__DBL_DIG__ versus __DBL_DECIMAL_DIG__. Useful to know.

__DBL_DECIMAL_DIG__ corresponds to DBL_DECIMAL_DIG which was added to C in C11.

In C90 or C99 we can do some hack like:

  #if DBL_DECIMAL_DIG
  use this
  #elif __DBL_DECIMAL_DIG__
  use this from gcc.
  #else
  fall back on DBL_DIG + 2
  #endif

kazinator · on Nov 13, 2015

> If you want round-trip conversion for all `double` numbers, i.e. from `double` to text and back

I don't. Rather, I want round-trip conversion of all 15-digit-precision decimal numbers to the machine and back. If the machine calculates some numbers which are different from some of these but alias to them textually, I don't care.

Someone else might want the decimal text representation to provide bit-exact storage semantics for arbitrary doubles.

For that reason it might be a better design choice to obtain the default printing precision from a special variable, which itself defaults to 15, rather than a hard-coded default.

thedufer · on Nov 13, 2015

The downside to printing only 15 is that you then have numbers that print the same but compare as not equal.

mortehu · on Nov 13, 2015

If you want that, that comes at the cost of printing 0.99999999999999989 as "1".

kazinator · on Nov 13, 2015

Well, yes; 0.99999999999999989 isn't a 15 digit number, so it lies outside of the requirement.

jordigh · on Nov 13, 2015

> that only supports 15.

it supports somewhere between 15 and 16, actually. There are 52 bits of precision in a double's mantissa[1]. This gives you log_10(2^52) = 52*log(2)/log(10) ~ 15.654 digits of precision, that is, the 16th decimal place is accurate about 65% of the time.

You can also exactly represent in decimal the binary value stored, because as long as you have 52 decimal places, you have enough 2's in the denominator for all of the bits that you can represent (each 10 in the denominator pairs up with each 2 from each bit).

--

[1] Plus an implicit bit, but not in the fractional part of the mantissa.

kazinator · on Nov 13, 2015

Better all-round solution:

  $ ./txr
  This is the TXR Lisp interactive listener of TXR 123.
  Use the :quit command or type Ctrl-D on empty line to exit.
  1> (tostring (+ 0.1 0.2))
  "0.3"
  2> (let ((*flo-print-precision* flo-max-dig)) (tostring (+ 0.1 0.2)))
  "0.30000000000000004"
  3> *flo-print-precision*
  15
  4> flo-dig
  15
  5> flo-max-dig
  17

- 15 by default is reasonable for everyday programming; we don't need results like 0.30000....4 popping up in our faces.

- provide the constant which indicates how many decimal digits are needed to preserve the binary value exactly: this is needed for faithful storage and communication --- we wouldn't want a Sexp-based RPC call to behave differently from a local computation.

- provide the printing precision default as a special variable which can be overridden over a dynamic scope.

jordigh · on Nov 13, 2015

   15 by default is reasonable for everyday programming; 
   we don't need results like 0.30000....4 popping up in
   our faces.

That just changes the problem. You can't get rid of rounding error by representing decimal numbers in binary. 0.3 - 0.2 - 0.1 still won't show up as zero even if you only display 15 decimal digits.

kazinator · on Nov 13, 2015

I'm not about to change the floating-point type to a non-binary support.

I think that 15 digits is the "right" value for ordinary display purposes. If you present a literal value of up to 15 digits to the machine, it gets converted to some approximation which, when printed back with 15 digits of precision, gives you the same digits.

By using 15 we avoid this:

  1> (let ((*flo-print-precision* flo-max-dig)) (tostring 0.3))
  "0.29999999999999999"
  2> (tostring 0.3)
  "0.3"

I have no "rounding error" because I haven't performed a calculation. I understand that when I enter 0.3 into the machine, the floating-point value isn't exactly 0.3. I have an error, because the representation can only provide a close approximation of 0.3.

Yet I know that this approximation is so close to 0.3 that I would like it printed that way. Now of course it will be printed that way if I round to, say, four places. But that will throw away precision for other numbers. If I round to DBL_DIG digits (15), I get the maximum precision, without having my input numbers altered into something else.

It is useful and elegant for the default printing precision to be as high as possible, yet such that numbers that I enter into a REPL are echoed back at me pretty as I entered them, modulo choice of notation.

No, of course, DBL_DIG doesn't guarantee that I can do any sequence of calculations, like 0.3 - 0.2 - 0.1 and still get back a result which is exact to DBL_DIG digits! There is no lower bound on how inaccurate a floating-point calculation can be, if it is carried through enough iterations!

ant6n · on Nov 13, 2015

Nitpick: the 64 bit double type has a mantissa of 52 bits, but there's a 1 implied in front, so the precision is 53 bits.

kazinator · on Nov 13, 2015

Implied fixed material doesn't contribute precision. Otherwise we could pretend that there are twenty-four implied 1's and the precision is 76 bits.

Because the implied 1 cannot be zero, it isn't a bit; it doesn't carry information.

For instance if we define a four bit binary number type which has an implied 1, it can only represent values from 16 to 31. That's four bits of precision; the range is just displaced.

ant6n · on Nov 13, 2015

The information of the first bit is implied via the exponent.

Twenty four implied 1's would give you a precision of 76 bits, but you would restrict yourself to a very small subset of all possible numbers with that precision.

EDIT: Let's put it another way. Assume you have 3 decimal digits of precision for a float. The first digit cannot be zero. Would you then claim that you don't have 3 digits of precision, but rather log(10x10x9,10) = 2.95.., because the first digit only carries 0.95.. digits of information?

kazinator · on Nov 13, 2015

Strictly speaking, the 2.95 is right; but there is justification in rounding it up to 3. There is much less justification for rounding up 2.0 to 3.

dragonwriter · on Nov 13, 2015

> Those languages which print 0.30000000000000004 are doing something which is silly as a default

Yes, but the silly thing isn't what you say.

The silly thing is having binary floating points as the default representation for literals expressed as precise decimals.

kazinator · on Nov 13, 2015

> binary floating points as the default representation for literals expressed as precise decimals.

For better or worse, it is a widespread practice appearing in numerous programming languages, including some widely used popular ones.

To conform with the practice, if you want, in your language, a way to use decimals to write exact fractional numbers, it's better to have an explicit notation for that like, say, 10.5r (rational number, another spelling for 21/2). Or hey: 10R5. Electrical Engineers will love you to death. :)

dragonwriter · on Nov 14, 2015

> For better or worse, it is a widespread practice appearing in numerous programming languages, including some widely used popular ones.

Well, yeah, I've been programming since the 1980s. I know that.

> To conform with the practice, if you want, in your language, a way to use decimals to write exact fractional numbers, it's better to have an explicit notation for that

Sure, given that your goal is to conform with that practice. I'm just saying its a bad practice that reflects a generally premature optimization that often harms correctness, so we shouldn't conform to it in new languages.

raiph · on Nov 14, 2015

> [implicitly approximating numbers like `0.1`] is a widespread practice

Sure, but the key point is that it leads to approximation that may cause problems.

It makes much more sense to interpret `0.1` as meaning exactly that number. Same with, say, `29/7`. If someone wants approximation, then have an explicit notation for that, like, say, `1.23e45`.

dnautics · on Nov 13, 2015

Defaulting to a Binary Encoded Decimal is also silly because its not as performant and loses compactness. Fundamentally. (Having a composite base comes with problems). What is your suggestion, having a default floating point have a Binary or hexadecimal representation for literals? Let's say I want to input a planet's mass as 6e24 kg - quick what is that in your literals?

kazinator · on Nov 13, 2015

It doesn't have to lose performance; it's just the default representation. For instance, if you have

   a + 0.1

where a is of binary floating point type, the 0.1 constant could denote an exact number. Because of the addition with a float, it gets coerced to the float's type. But that can be done at compile time. So effectively 0.1 is like a binary floating point constant in that situation.

But in another situation it remains exact, such as:

  0.1 + 0.2

this gets folded by the compiler at compile time, and uses exact math producing an exact 0.3.

Exactness is just the default.

If your language has rationals expressed as digs/digs, you can imagine that those are used in their place:

  a + 1/10;

  1/10 + 2/10;

In other words, suppose that 0.1 is just another "spelling" for 1/10.

This will be no less performant than using integer constants in floating contexts, e.g:

   double x;

   // ...

   x += 1;  // not slower than x += 1.0!

dnautics · on Nov 14, 2015

Here's a question: How do you code an arbitrary-precision-result (i.e. not an interpolated pade approximant) BCD log10 function?

The algorithm in binary is simple, because of special properties of two. It is harder for base-p, and base-composite, is very tough.

dragonwriter · on Nov 13, 2015

> Defaulting to a Binary Encoded Decimal is also silly because its not as performant and loses compactness.

Defaulting to correctness makes a lot more sense than often premature time and space optimizations by default.

> What is your suggestion, having a default floating point have a Binary or hexadecimal representation for literals?

There's lots of good solutions (Scheme's, of having a numeric tower with literals using the minimal-scope exact representation -- e.g., integer if there is no fractional part, decimal if its a decimal literal, rational if it is rational literal not expressed as a decimal -- by default unless they have an modifier specifying intent to use an inexact representation) is the best.

dnautics · on Nov 14, 2015

the point is, there are still unrepresentables (1/3 e.g.) in binary encoded decimal. If it's all about literals, just make the user put the correct literal in, and make them understand that floats are imperfect anyways. If you're doing anything remotely useful with Binary Coded Decimal, probably about two or three operations in, you're going to lose resolution on your exactness. It doesn't buy you much. GIGO.

kamaal · on Nov 13, 2015

Your comment made me look up 'txr'. Looks like a wonderful tool.

Do you use that often for data munging tasks? How big is this tool in that area of work?

kazinator · on Nov 13, 2015

Since I'm the author, I absolutely use it as a matter of "eat your own dogfood" and because it is actually convenient.

TXR is not "big" in this area because it is very difficult to get people to notice new things like this.

It is being rapidly developed. OpenHub stats: https://www.openhub.net/p/txr

In the last half a year or so, I added major features such as: structs with good OOP support, an interactive REPL (based on "linenoise" with a lot of my hacks applied), and delimited continuations.