Those languages which print 0.30000000000000004 are doing something which is silly as a default: they are printing 17 decimal digits, out of a type that only supports 15.
$ txr -p '(+ .1 .2)'
0.3
Why is that? Here, the underlying type is the C double type. The printed representation is obtained according to a default precision. That default is taken from the C constant DBL_DIG. For an IEE754 double, that constant is 15.
It is misleading to print more decimal digits out of a double than DBL_DIG; that is the constant which tells you how many decimal digits of precision double can reliably store. Thus I chose that constant as the default printing precision for floats: to give the programmer/user the maximum realistic precision. That is to say, print the decimal digits which are plausibly there, and not any fictional ones.
17 decimal digits requires a 57 bit mantissa -- ceil(log 10 / log 2) * 17). The 64 bit double type has only 52. So if you print 17, you're "making shit up".
This is a common but fundamentally wrong view of floating-point numbers. You are not "making shit up" by printing more that 15 digits of a floating-point number. Floats, doubles, etc. have precise values. As I said in another comment, the double represented by the literal `0.1` is precisely 0.1000000000000000055511151231257827021181583404541015625; similarly, `0.2` is precisely 0.200000000000000011102230246251565404236316680908203125 and `0.3` is precisely 0.299999999999999988897769753748434595763683319091796875. From these exact values, you can see why `0.1 + 0.2 != 0.3` – the left hand side is greater than 0.3 while the right hand side is smaller than 0.3. Printing only 15 digits is not doing programmers any favors: even though `0.1 + 0.2` will print as "0.3", that just makes the lie even worse – they will be even more confused when `0.1 + 0.2 != 0.3` even though both values print identically.
Your claim is fallacious, because a floating-point value denotes a range of the real number line, whereas you're insisting that, no, it stands for its literal value: the absolutely precise rational number that lies at the center of that range.
That is no more true than my current body temperature being precisely 36 4/10 degrees Celsius because the thermometer reads 36.1.
All numbers in that range alias to the same double, yet differ wildly in their decimal digits beyond the fifteenth. That "long tail" of digits is a meaningless residue arising from the arbitrary difference between the chosen center-of-range point and the actual number it approximates.
> Printing only 15 digits is not doing programmers any favors
Note that the ISO C function printf uses 6 digits of precision under the %g conversion specifier by default. So a team of experts can find it justifiable to severly truncate precision on printing. I find that justifiable also, like this: printing is not only for constants, and trivial expressions like 0.1 + 0.2, but for the results of complex calculations, which accumulate significant rounding errors. Six digits of precision is prudent: for many complex calculations, it will avoid misleading the user with too much precision. Fifteen decimal digits will be wrong after just a few operations; there is hardly any "headroom" to absorb error.
> just makes the lie even worse
If merely neglecting to reveal some aspect of the stark truth is tantamount to lying, then you're also lying when you pull a number with 54 significant decimal figures out of a 64 bit double. Revealing some truth without an explanation can also be construed as "lying", as in "[N]one of woman born shall harm Macbeth".
I personally find it convenient that when I bang the token 0.3 into my REPL, it comes back with a tidy 0.3. I know that there are several values of double which will print as 0.3, and don't compare with exact equality except in special circumstances (like when deliberately using integral values not far from zero).
> Your claim is fallacious, because a floating-point value denotes a range of the real number line, whereas you're insisting that, no, it stands for its literal value: the absolutely precise rational number that lies at the center of that range
Floating point numbers are _not_ intervals. If you read carefully any formal definition of floating point numbers (The IEEE standards, TAoCP Chapter 4, or Higham's _Accuracy and Stability of Numerical Algorithms_, to name just three possible references), you will see that floating point numbers by definition form an exact rational subset of the extended real line.
> All numbers in that range alias to the same double, yet differ wildly in their decimal digits beyond the fifteenth. That "long tail" of digits is a meaningless residue arising from the arbitrary difference between the chosen center-of-range point and the actual number it approximates.
You are not being clear about the set of real numbers a user may input and their ultimate representation as a floating point number. While it's true that many real numbers round to the same floating point number _f_, that's not the same as then saying that _f_ carries that interval around with it. The latter is false, since the intervals do _not_ propagate in floating point arithmetic. It's also false that the floating point number _f_ is the midpoint of the set of numbers that round to _f_; the precise set depends on the rounding mode and the granularity of the set of floating point numbers around _f_.
I didn't say they were intervals (I'm well aware of interval representations for numbers, which single point rationals are not).
> that floating point numbers by definition form an exact rational subset of the extended real line.
Sure, that's what they form; it's not what they (usually) denote.
> It's also false that the floating point number _f_ is the midpoint of the set of numbers that round to _f_;
It is close enough to being the case for my purpose in the grandparent article. Whether the cluster of numbers is lopsided one way or the other doesn't really detract from my point.
Your view has been repeatedly refuted by Kahan and others. See page 25 of [1], page 1 of [2], page 22 of [3] for just a few examples. Conceptually, the reason the interval view is wrong is that if floating point values represented intervals, then their arithmetic would be incorrect. For example, the sum of an interval of width w1 and an interval of width w2 should be an interval of width w1 + w2. That, of course, is not how floats work at all. The correct view is that floats represent exact rational values and operations on them, instead of being the true mathematical ones – under which floats are not closed – compute the closest float to the true result of each operation.
1.0, 2.0, and 3.0 are all represented perfectly and precisely in IEE754 floating point, as are all integers up through 2^24 (for 32-bit floats) or 2^53 (for 64-bit doubles).
2^0 is 1, so if the exponent is 0 then the mantissa times one is your value, so every integer has an exact representation.
If the exponent is 1, then the mantissa is multiplied by 2 to get the actual value. Put another way, the mantissa is the value divided by 2 and rounded, i.e. shifted right one bit.
> 1x10^7 + 2x10^7 != 3x10^7, right?
This particular case is interesting.
If we're talking about 64-bit doubles, then you have 53 bits of precision for integers. All those values are well within that range, so this arithmetic and comparison are done with precise integer values, and the expression will compare equal.
If we're talking about 32-bit floats, the range of precise integer values is -16,777,216 through 16,777,216, or -2^24 through 2^24.
1x10^7 is 10,000,000, within that range, but the other two values are outside the range.
So you might expect that != would be the answer here.
But if you test it, that isn't the case: the expression compares equal!
The reason: although 20,000,000 and 30,000,000 are outside the range where every integer has a precise representation, they are within the range where every even integer is precise: -33,554,432 to 33,554,432. Values within this range but outside the range of precise integers are rounded to a multiple of 2.
Similarly in the range -67,108,864 to 67,108,864, all integers which are a multiple of 4 are represented precisely.
Basically, as you go outside the range of precise integers, values get rounded to a multiple of 2, 4, 8, etc. as required.
When the mantissa overflows the available precision, it is shifted to the right enough so that it fits without losing the most significant bits. Instead, the least significant bits are discarded, and the exponent is incremented by the number of discarded bits.
Of course you could choose other values where the rounding doesn't work out in your favor, and then you'd get the unequal comparison you expect. A simple example:
33333333.0f + 1.0f != 33333334.0f
Here, the value 33,333,333 is rounded down to 33,333,332. Add 1 to that and you get 33,333,333, which is again rounded down to 33,333,332.
33,333,334 is represented precisely, so the comparison fails.
Stratoscope was already done a superb job explaining. But just to show why my question was not that dumb...
Let's say you want to represent "two-million" in 32bit float:
Your answer is: 0x1e8480 * 2^0.
However, I was expecting something like: 0x1.e848 * 2^X. But then "125-thousand" is 0x1.e848 * 2^Y, so how do you standardize what is the correct exponent for integers, X or Y? And how do you know there is not a pair (M,Z) != (0x1.e848, X) such that M * 2^Z is also two-million?
the fraction of a floating point is always "starts with on invisible 1.xxxx" so that forces unique representations for your exponent for any given number. Is it not clear why this is the case?
let's take a simple case where you have one bit of fraction.
So for any given exponent multiplication with a fraction with an invisible bit creates products that are hemmed between 1.0x2^n and 2.0*2^n so there's no collisions between any pair (e,f) in your representation.
> It is misleading to print more decimal digits out of a double than DBL_DIG
If you want round-trip conversion for all `double` numbers, i.e. from `double` to text and back, you need 17 digits of precision (%.17g).
Likewise, if you want round-trip conversion for all `float` number, you need 9 digits (%.9g).
The reason for this is that even though these types can really only reliably give 15 and 6 digits of precision, some of the numbers they can represent are closer to each other than that.
In C++, you can get the former limit using std::numeric_limits<double>::digits10, and the latter using std::numeric_limits<double>::max_digits10.
> If you want round-trip conversion for all `double` numbers, i.e. from `double` to text and back
I don't. Rather, I want round-trip conversion of all 15-digit-precision decimal numbers to the machine and back. If the machine calculates some numbers which are different from some of these but alias to them textually, I don't care.
Someone else might want the decimal text representation to provide bit-exact storage semantics for arbitrary doubles.
For that reason it might be a better design choice to obtain the default printing precision from a special variable, which itself defaults to 15, rather than a hard-coded default.
it supports somewhere between 15 and 16, actually. There are 52 bits of precision in a double's mantissa[1]. This gives you log_10(2^52) = 52*log(2)/log(10) ~ 15.654 digits of precision, that is, the 16th decimal place is accurate about 65% of the time.
You can also exactly represent in decimal the binary value stored, because as long as you have 52 decimal places, you have enough 2's in the denominator for all of the bits that you can represent (each 10 in the denominator pairs up with each 2 from each bit).
--
[1] Plus an implicit bit, but not in the fractional part of the mantissa.
$ ./txr
This is the TXR Lisp interactive listener of TXR 123.
Use the :quit command or type Ctrl-D on empty line to exit.
1> (tostring (+ 0.1 0.2))
"0.3"
2> (let ((*flo-print-precision* flo-max-dig)) (tostring (+ 0.1 0.2)))
"0.30000000000000004"
3> *flo-print-precision*
15
4> flo-dig
15
5> flo-max-dig
17
- 15 by default is reasonable for everyday programming; we don't need results like 0.30000....4 popping up in our faces.
- provide the constant which indicates how many decimal digits are needed to preserve the binary value exactly: this is needed for faithful storage and communication --- we wouldn't want a Sexp-based RPC call to behave differently from a local computation.
- provide the printing precision default as a special variable which can be overridden over a dynamic scope.
15 by default is reasonable for everyday programming;
we don't need results like 0.30000....4 popping up in
our faces.
That just changes the problem. You can't get rid of rounding error by representing decimal numbers in binary. 0.3 - 0.2 - 0.1 still won't show up as zero even if you only display 15 decimal digits.
I'm not about to change the floating-point type to a non-binary support.
I think that 15 digits is the "right" value for ordinary display purposes. If you present a literal value of up to 15 digits to the machine, it gets converted to some approximation which, when printed back with 15 digits of precision, gives you the same digits.
I have no "rounding error" because I haven't performed a calculation. I understand that when I enter 0.3 into the machine, the floating-point value isn't exactly 0.3. I have an error, because the representation can only provide a close approximation of 0.3.
Yet I know that this approximation is so close to 0.3 that I would like it printed that way. Now of course it will be printed that way if I round to, say, four places. But that will throw away precision for other numbers. If I round to DBL_DIG digits (15), I get the maximum precision, without having my input numbers altered into something else.
It is useful and elegant for the default printing precision to be as high as possible, yet such that numbers that I enter into a REPL are echoed back at me pretty as I entered them, modulo choice of notation.
No, of course, DBL_DIG doesn't guarantee that I can do any sequence of calculations, like 0.3 - 0.2 - 0.1 and still get back a result which is exact to DBL_DIG digits! There is no lower bound on how inaccurate a floating-point calculation can be, if it is carried through enough iterations!
Implied fixed material doesn't contribute precision. Otherwise we could pretend that there are twenty-four implied 1's and the precision is 76 bits.
Because the implied 1 cannot be zero, it isn't a bit; it doesn't carry information.
For instance if we define a four bit binary number type which has an implied 1, it can only represent values from 16 to 31. That's four bits of precision; the range is just displaced.
The information of the first bit is implied via the exponent.
Twenty four implied 1's would give you a precision of 76 bits, but you would restrict yourself to a very small subset of all possible numbers with that precision.
EDIT: Let's put it another way. Assume you have 3 decimal digits of precision for a float. The first digit cannot be zero. Would you then claim that you don't have 3 digits of precision, but rather log(10x10x9,10) = 2.95.., because the first digit only carries 0.95.. digits of information?
> binary floating points as the default representation for literals expressed as precise decimals.
For better or worse, it is a widespread practice appearing in numerous programming languages, including some widely used popular ones.
To conform with the practice, if you want, in your language, a way to use decimals to write exact fractional numbers, it's better to have an explicit notation for that like, say, 10.5r (rational number, another spelling for 21/2). Or hey: 10R5. Electrical Engineers will love you to death. :)
> For better or worse, it is a widespread practice appearing in numerous programming languages, including some widely used popular ones.
Well, yeah, I've been programming since the 1980s. I know that.
> To conform with the practice, if you want, in your language, a way to use decimals to write exact fractional numbers, it's better to have an explicit notation for that
Sure, given that your goal is to conform with that practice. I'm just saying its a bad practice that reflects a generally premature optimization that often harms correctness, so we shouldn't conform to it in new languages.
> [implicitly approximating numbers like `0.1`] is a widespread practice
Sure, but the key point is that it leads to approximation that may cause problems.
It makes much more sense to interpret `0.1` as meaning exactly that number. Same with, say, `29/7`. If someone wants approximation, then have an explicit notation for that, like, say, `1.23e45`.
Defaulting to a Binary Encoded Decimal is also silly because its not as performant and loses compactness. Fundamentally. (Having a composite base comes with problems). What is your suggestion, having a default floating point have a Binary or hexadecimal representation for literals? Let's say I want to input a planet's mass as 6e24 kg - quick what is that in your literals?
It doesn't have to lose performance; it's just the default representation. For instance, if you have
a + 0.1
where a is of binary floating point type, the 0.1 constant could denote an exact number. Because of the addition with a float, it gets coerced to the float's type. But that can be done at compile time. So effectively 0.1 is like a binary floating point constant in that situation.
But in another situation it remains exact, such as:
0.1 + 0.2
this gets folded by the compiler at compile time, and uses exact math producing an exact 0.3.
Exactness is just the default.
If your language has rationals expressed as digs/digs, you can imagine that those are used in their place:
a + 1/10;
1/10 + 2/10;
In other words, suppose that 0.1 is just another "spelling" for 1/10.
This will be no less performant than using integer constants in floating contexts, e.g:
double x;
// ...
x += 1; // not slower than x += 1.0!
> Defaulting to a Binary Encoded Decimal is also silly because its not as performant and loses compactness.
Defaulting to correctness makes a lot more sense than often premature time and space optimizations by default.
> What is your suggestion, having a default floating point have a Binary or hexadecimal representation for literals?
There's lots of good solutions (Scheme's, of having a numeric tower with literals using the minimal-scope exact representation -- e.g., integer if there is no fractional part, decimal if its a decimal literal, rational if it is rational literal not expressed as a decimal -- by default unless they have an modifier specifying intent to use an inexact representation) is the best.
the point is, there are still unrepresentables (1/3 e.g.) in binary encoded decimal. If it's all about literals, just make the user put the correct literal in, and make them understand that floats are imperfect anyways. If you're doing anything remotely useful with Binary Coded Decimal, probably about two or three operations in, you're going to lose resolution on your exactness. It doesn't buy you much. GIGO.
In the last half a year or so, I added major features such as: structs with good OOP support, an interactive REPL (based on "linenoise" with a lot of my hacks applied), and delimited continuations.
It is misleading to print more decimal digits out of a double than DBL_DIG; that is the constant which tells you how many decimal digits of precision double can reliably store. Thus I chose that constant as the default printing precision for floats: to give the programmer/user the maximum realistic precision. That is to say, print the decimal digits which are plausibly there, and not any fictional ones.
17 decimal digits requires a 57 bit mantissa -- ceil(log 10 / log 2) * 17). The 64 bit double type has only 52. So if you print 17, you're "making shit up".