Tech Support > Computers & Technology > Programming > BigNum -- Floating Point
BigNum -- Floating Point
Posted by Programmer Dude on December 15th, 2003


Been moving my BigInt package into a BigNum package (that is, giving
it "floating point"), and I'm considering implementations.

I'm looking for some discussion from previous implementors and some
pointers to good documentation (Google returns a LOT of hits and
it'll take me some time to wade through them seeking what I want).

I already have a BigNum package (Digits) that uses separate digits
and does all its math in "long" form. But my BigInt package uses
a binary format internally (uses 32-bit ints as the base type),
and it's that I'd like to extend.

Thing I'm pondering is how to implement that DP. One technology
seems to keep the fractional and whole parts separate. This is
attractive in certain ways, but seems to make the math messy.

Another idea is to keep integral values and remember where the DP
goes. For example, 42.5 would be 425(dp=1). The problem here is
this makes simple operations expensive. Merely comparing 42 and
42.0000 requires multiplying the former by 10000 (granted, each
10x shift can be done with two shifts and a fast add, but....).

ANYWAY, just wanted to open a discussion thread.....

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Willem on December 15th, 2003


Programmer wrote:
) Been moving my BigInt package into a BigNum package (that is, giving
) it "floating point"), and I'm considering implementations.
)
) I'm looking for some discussion from previous implementors and some
) pointers to good documentation (Google returns a LOT of hits and
) it'll take me some time to wade through them seeking what I want).
)
) I already have a BigNum package (Digits) that uses separate digits
) and does all its math in "long" form. But my BigInt package uses
) a binary format internally (uses 32-bit ints as the base type),
) and it's that I'd like to extend.
)
) Thing I'm pondering is how to implement that DP. One technology
) seems to keep the fractional and whole parts separate. This is
) attractive in certain ways, but seems to make the math messy.

I think you can do moth math operations by assuming that the fractional and
whole parts are one array, and aligning the operands so that the place
where they are concatenated (i.e. the 'binary' point) are aligned.

) Another idea is to keep integral values and remember where the DP
) goes. For example, 42.5 would be 425(dp=1). The problem here is
) this makes simple operations expensive. Merely comparing 42 and
) 42.0000 requires multiplying the former by 10000 (granted, each
) 10x shift can be done with two shifts and a fast add, but....).

How about you use quotients ?

That has the big advantage that numbers like 1/3 are exact.

One disadvantage is when you want to display them in DP notation,
but I'm sure there is some really clever way of doing that.

Another disadvantage is that you have to decide when to normalize,
for example when comparing 1/3 and 2/6. To normalize, you need to
do a gcd algo of some sort, which may be expensive.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Posted by Arthur J. O'Dwyer on December 16th, 2003



On Mon, 15 Dec 2003, Willem wrote:
<big snip>
Snippet from my naive bigfrac implementation, which used a
BigNum numerator and a BigNum denominator for each BigFrac. The
"normalized" representation was in lowest terms, with the minus
sign (if any) in the numerator. Normalization happened pretty
much everywhere. So it was probably very slow, but hey, it wasn't
supposed to be blazingly fast anyway.
Here's one of the output routines. "places" could be set via
methods, which now seems silly, but back then it seemed like the
easiest way to get some sort of control over the output.


int BigFrac:utput_dec(ostream & out) const
{
BigNum tnum;
long int i;

if (numerator < 0) out << "-";
out << abs(numerator / denominator);

if (places == 0) return 0;
if ((numerator % denominator) == 0) return 0;

out << ".";

tnum = abs(numerator % denominator);

for (i = 0; i < places; i++)
{
tnum *= 10;
out << (tnum / denominator);
tnum %= denominator;
}

return 0;
}


my $.02,
-Arthur


Posted by Paul Hsieh on December 16th, 2003


Programmer Dude <Chris@Sonnack.com> wrote:
What's wrong with IEEE-754? I mean simply mapping those semantics to
your package. So pick up denormals, NaN, inf, -inf semantics, as well
as all the ordering flags.

A floating point number in IEEE-754 is typically represented by:

sign * (1 + mantissa / (2 ^ mSize)) * 2 ^ exponent

With special values for the exponent to represent 0, denormal, NaN and
inf. Otherwise, the exponent is just a (possibly biased) integer.
The sign bit indicates a value of -1 or 1. mantissa / (2 ^ mSize) is
a fractional number from 0 to up to but not including 1. mSize and
mantissa are just large integers, and typically mSize is fixed. The
advantage you presumably wish to bring is that mSize may change
according to the needs of the operation and operands.

Since you have an integer package, it should be fairly easy to fill
each entry as described above without too much trouble.

The *real* problem is that you cannot retain all the bits of an
operation in general. Consider the simple problem of representing
1/3. You need an infinite number of digits to accurately represent
this number. Even if you decide to make a BigRational package as an
intermediate step to avoid this, you'd still be screwed on sqrt()
which will yield irrational numbers.

Given this, it seems that you need to have a way of clamping mSize to
some maximum size. I would recommend, for maximum flexibility, that
you in fact pass in the maximum mSize that you are willing to accept
in each operation. This will allow you to be as flexible as desired
in terms of things like growable precision on the fly. Following this
reason it seems you would also want to pass in the rounding mode flag
as well.

And if you are thinking of doing crazy things like "interval
arithmetic", go look up the "Lorentz attractor" to see why such
efforts are bound to be fruitless.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Posted by Robert Wessel on December 16th, 2003


Programmer Dude <Chris@Sonnack.com> wrote in message news:<3FDE2823.BA74549F@Sonnack.com>...

Is there some reason you're tied to decimal exponents? Normally you'd
pick an exponent that was convenient with respect to the base of the
mantissa. For example, IEEE float is a binary format, and the
exponent is interpreted as 2**n. (Old) IBM mainframe float is hex and
exponents are 16**n. In your case, it would probably make sense to
make the base (2**32), and thus exponents would be interpreted as
(2**32)**n. For example, the value 5.125 (aka 5 1/8) would be
represented by a pair of 32 bit digits, something like: 0x00000005 (.)
0x02000000, with the "decimal" point between the two words. Any
alignments and normalizations required in that format are conveniently
a shift of a whole number of words.

If you want to use a decimal exponent, I'd strongly suggest changing
to a decimal(ish) base for the mantissa. 10**9 would be a good
choice, that gets you nine decimal digits per 32 bit memory word, and
you'd do all your operations in base one-billion. Of course, that
makes operations on individual words a bit more painful, since you are
always having to do the (mod (10**9)) operation manually. In many
cases that's worse than it sounds, for example, after adding two
words, you can just compare to 10**9 and subtract that amount if the
result is higher. After multiplying two words, you will have to
divide by 10**9, but the division is generally an expense of similar
magnitude as the multiplication (and if you're doing carry save, which
you should be anyway, your can delay all those divisions to a single
pass at the end of the operation).

If you insist on sticking with decimal exponents and a binary(ish)
base, you're going to be stuck doing a log of multiplications and
divisions by 10**n.

Posted by Thad Smith on December 16th, 2003


Programmer Dude wrote:
The choice is partly determined by the expected use. In general, I
would use something like standard floating point: mantissa and
exponent. You will need to decide what size to make the exponent and
what base to use. A convenient base is 2^32, so that when you normalize
numbers, you shift them by whole words. A 31 bit signed exponent would
give you an amazing range of 2^(-(2^30)*32) to 2^((2^30)*32), which
would let you place about 1e10 zeros, in decimal representation, before
or after
the mantissa. By making the exponent 31 bits, you can use the left-over
bit for the sign of the number. You might decide to use some of the
exponent space, though, to store other information.

Of course, if that exponent isn't large enough, you can make the
exponent a big int, but I suspect that would rarely be required, since
floating point values are normally used for measurable quantities, and
the number of atoms in the universe is estimated around 1e80 or so.

If you want decimal-based numbers, you can consider using base 1e9,
which will fit in a 32-bit value. It makes the arithmetic routines
slightly harder, but makes decimal conversion a breeze. If you do that,
42.5 would be represented as M = 000000042 500000000, E = -1. If you
used based 2^32, 42.5 would be represented as M=0x0000002a 0x80000000, E
= -1.

Thad

Posted by Matthieu Villeneuve on December 16th, 2003


"Programmer Dude" <Chris@Sonnack.com> wrote in message
news:3FDE2823.BA74549F@Sonnack.com...
What would you want your package to handle? Ratios, reals?
Thinking that BigNums will never represent non real numbers
accurately, I would think that a reasonable goal would be to
handle rational numbers only. Then the implementation should
be pretty straightforward (at least for a "naive" implementation,
smarter ways to do the maths may exist).

If your goal is more ambitious, then rational numbers may be
a first step for your module...


--
Matthieu Villeneuve


Posted by Robert Wessel on December 16th, 2003


robertwessel2@yahoo.com (Robert Wessel) wrote in message news:<bea2590e.0312152107.5d8a039f@posting.google. com>...

*ahem*

That should be "that sounds worse than it is,"


Posted by Programmer Dude on December 16th, 2003


Willem wrote:

Yeth, thatth one approach I'm conthidering very theriously. :-|

But that's the rub. Consider converting 42.5 and 3.1415. To put
them in one array, (my approach is to) convert to (binary) 425 and
31415, but remember that dp=1 and dp=4, respectively.

Now, if you go to add, subtract, logicalOp, even compare, you need
to "align" the two values. You certainly can't do anything with
them as are. Remember, internally, they're: basically bit "strings"
of 110101001 and 111101010110111, respectively. There's no shift
of binary digits you can do to align them.

What you CAN do--and it's not *horribly* expensive--is do a fast
"multiply-by-10" on the lesser value (425, in this case). A fast
x10 is (left_shift(1) + left_shift(3)). Since dp=1 and 4, you
need to x10 three times to give you: 425000.

NOW you can treat them as aligned for operations that require it.

But it's an awfully expensive way to compare or "OR" two numbers.

It's attractive for cases, like 1/3, but seems more complex to
implement than I want. I have a BigNum package already that uses
individual digits and does all I want.... I'm just looking for a
possibly better solution that's reasonably simple to implement.

Thanks for the comments. Food for thought!
--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Willem on December 16th, 2003


Programmer wrote:
)> and aligning the operands so that the place where they are
)> concatenated (i.e. the 'binary' point) are aligned.
)
) But that's the rub. Consider converting 42.5 and 3.1415. To put
) them in one array, (my approach is to) convert to (binary) 425 and
) 31415, but remember that dp=1 and dp=4, respectively.

I said *binary* point. Which makes everything a whole lot easier. :-)

)> How about you use quotients ?
)
) It's attractive for cases, like 1/3, but seems more complex to
) implement than I want. I have a BigNum package already that uses
) individual digits and does all I want.... I'm just looking for a
) possibly better solution that's reasonably simple to implement.

You can use the bignum package "as-is" if you use quotients.
Just build a class around the bignums, with two bignums in one bigquot.

The odd thing is, adding is more expensive than multiplying... :-)

(a/b) * (c/d) = (ac) / (bd)
(a/b) + (c/d) = (ad+bc) / (bd)

But I didn't need to tell you that.

Then you need a gcd algorithm, and a way to print your bigquot in decimal
notation. The last one seems like something where you can make a really
cool clever algorithm.

Powers are easy, square roots maybe a bit more difficult, and other
operations, I don't know. Although I'm interested enough to go figure it
out.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Posted by Programmer Dude on December 16th, 2003


Matthieu Villeneuve wrote:

Ratios would be nice, but aren't currently part of the spec.
The purpose of the package is to be an underlying native numeric
type for a language I'm throwing together (two, actually, a
prototype I'll use in certain production settings and a more
advanced, debugged, featured version I'll use in a language I've
been idly designing for over a decade (BOOL--I've mentioned it
here before).

Non-real? Do you mean the transcendentals? Yeah, I recognize
I can't accurately represent, e.g. pi (no one can), but a primary
design criterion is it should support as many fractional digits
as *desired* (by the user). There is a DivPrecision global value
(as well as a matching instance variable for each number object)
that determines when an operation, like 1/3, stops.

Indeed. It is pretty straight-forward, but there are some pros
and cons to weigh about exactly how it's implemented.

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Programmer Dude on December 16th, 2003


Thad Smith wrote:

It's to be a native numeric type for a language.

I confess, I hadn't considered that route. Interesting...

In that the current implementation already uses 32-bit "digits",
this would be a decent fit.

Range is less important than precision. Precision must be as
high as (1) required OR (2) requested by the user (see previous
post about the DivPrecision setting). That means the user should
have the ability to write an algorithm that accurately generates
10,000 digits of pi if they want. The package *must* support that.

Which, when I think about it, is why I didn't consider the mantissa
and exponent technique. I don't see that it buys me anything, as
the real "meat" is all in the mantissa.

Number objects currently have instance variables for sign, dp
and length. Space is not really a design criterion at all in
this, so I'm not concerned with using bits.

Understood. And for that sort of thing, the values of the lowest
digits don't much matter. For my project, they do (the user might
want to play with writing RSA, for example, or might want to write
something else that canNOT be "lossy").

(Conversion to decimal is fairly trivial, so is not a consideration
one way or other.) I think my example wasn't clear. The storage
representation is *binary*, not decimal. But the binary value is
(in one design I'm considering) the "normalized integer value" of
the number, by which I mean that, e.g., 3.14159 would be stored as
the binary value the largish integer, 314159. The system simply
remembers (as a sort of exponent, if you will) that five of the
digits are right of the DP.

If I were to add/subtract/logicalOp/compareOp that value of pi with,
e.g., 42.5--stored as the integer 425 (with one decimal digit)--I'd
need to multiply the 425 by 10,000 (giving 4250000). Then the
values have equal "rank" and those operations will work as expected.

Multiplication and division don't require those games, just keeping
track of--and adjusting in the result--the new number of fractional
digits. For example, 425 * 314159 = 133517575. And the dp=1 and
dp=5 are here summed, so the result is 133517575 (dp=6): 133.517575

I was rather bemused to realize that this design makes multiplication
and division *less* expensive than the "simpler" operations!

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Willem on December 16th, 2003


Programmer wrote:
) (Conversion to decimal is fairly trivial, so is not a consideration
) one way or other.) I think my example wasn't clear. The storage
) representation is *binary*, not decimal. But the binary value is
) (in one design I'm considering) the "normalized integer value" of
) the number, by which I mean that, e.g., 3.14159 would be stored as
) the binary value the largish integer, 314159. The system simply
) remembers (as a sort of exponent, if you will) that five of the
) digits are right of the DP.

Why do you use a *decimal* point ?
IMHO, you'd be much better off with a *binary* point.
If you keep the binary point at multiples of 8 (or 32), you don't even have
to do binary shifts.

) I was rather bemused to realize that this design makes multiplication
) and division *less* expensive than the "simpler" operations!

Interestingly, the same goes when you use quotients...


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Posted by Programmer Dude on December 16th, 2003


Willem wrote:

Sure. The problem, then, is accuracy. These must be as "lossless"
as possible. Conversion from a fractional number in one base to
a fractional number in another base can lose precision.

Consider something as simple as "1.4".
In binary, it's: 1.0110011001100110011001100110011

Stored per the above technique, it's just 14{dp=1}. No loss.


This is an interesting idea (which allows accurate representation
of things like 1/3). I'll have to think more about doing this
(gives me some mind meat for the long dog walks).

Similar in the technique described above due to the need to
adjust the value with the lesser number of decimal digits.

Also required are the comparison and logic operations, plus the
shifts and rotates. Ratios are a little more challenging with
those factored in, but it's worth some thought.

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Joe \Nuke Me Xemu\ Foster on December 16th, 2003


"Programmer Dude" <Chris@Sonnack.com> wrote in message <news:3FDF68D7.B6001DD7@Sonnack.com>...

Sounds to me like you're reinventing MAPM:

URL:http://www.tc.umn.edu/~ringx004/mapm-main.html

--
Joe Foster <mailto:jlfoster%40znet.com> DC8s in Spaace: <http://www.xenu.net/>
WARNING: I cannot be held responsible for the above They're coming to
because my cats have apparently learned to type. take me away, ha ha!



Posted by Programmer Dude on December 16th, 2003


Robert Wessel wrote:

Primarily the high requirement of lossless performance. Converting
(fractional) decimal numbers written by the user to binary can lose
accuracy (consider 1.4 in binary: 1.0110011001100110011001100110011).

In writing to another poster, I realized the mantissa/exponent
concept probably doesn't work for me, because maintaining precision
is a primary criterion. There's not much point in an exponent if
I do need to maintain ALL digits of the mantissa.

I *think* exponents are a dead end for my purposes.

I (*think*! :-) I can get away with multiplications only. The need
to maintain precision allows me to take the tactic of always bringing
the lesser value (number of fractional digits-wise) up to the greater.
For instance, 42.5 + 3.14159 would--internally--be 4250000 + 314159
(with the result having dp=5). In generating the decimal number for
output, you just convert that large integer and stick a dp in the
appropriate place.

The thing about doing multiplications only is a x10 multiply can
be done as [left-shift(1) + left-shift(3)]. The shifts are pretty
fast, and here one can implement a fast binary adder (no need to
worry about where the dp is). Doing higher "decade multiplies"
requires trading off between looping over x10 multiplies or choosing
a similar pattern of shifts--but more are required:

10 = <<1 + <<3
100 = <<2 + <<5 + <<6
1000 = <<3 + <<5 + <<6 + <<7 + <<8 + <<9
10000 = <<4 + <<8 + <<9 + <<10 + <<13
100000 = <<5 + <<7 + <<9 + <<10 + <<15 + <<16

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Programmer Dude on December 16th, 2003


Paul Hsieh wrote:

Because I've come to believe the semantics don't provide what I want.
I require lossless precision (to the highest extent possible). This
means I need to maintain all digits in the mantissa, and that *seems*
to make an exponent superfluous.

Yes. This--from what I can tell (and IANAM!)--only becomes a major
problem with division. The other operations seem fully deterministic
from the point of view of significant digits. But, as you say, an
operation like 1/3 explodes precision.

In this case, my traditional method is to provide a mechanism of
some sort for the user to specify max division precision. My "Nbrs"
calculator, for example, has a "divp" switch. The formula:

[One Third]
\divp 32
div 1 3

Outputs: 0.33333333333333333333333333333333


The calculator mentioned above uses a Digits class that does have
a DivPrecision instance variable, which allows individual tweaking
of numbers. Never did much with it, though. The calculator uses
the global setting set by the switch.

Currently not bothering with rounding (just truncating). I should
though.

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Programmer Dude on December 16th, 2003


Willem wrote:

I think I'm not communicating this effectively. Under the current
(proposed) design, there is no "point". Internally, all numbers
are stored as binary integers. Decimal points only come into play
during input and output.

Doesn't that require converting a decimal input faction (e.g. .4)
to a binary representation? I don't want to do that, because you
lose accuracy.

Or do you mean something I'm not getting? An example might help.

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Programmer Dude on December 16th, 2003


Joe \"Nuke Me Xemu\" Foster wrote:

Yes, it sounds very much like what I'm trying to do.
Thanks for the URL!

(Since this is a hobby project, reinventing is desirable here.)

--
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL |
|_____________________________________________|___ ____________________|

Posted by Willem on December 16th, 2003


Programmer wrote:
)> Why do you use a *decimal* point ?
)
) I think I'm not communicating this effectively. Under the current
) (proposed) design, there is no "point". Internally, all numbers
) are stored as binary integers. Decimal points only come into play
) during input and output.

But when you add two numbers, you have to align their decimal points, no ?

)> IMHO, you'd be much better off with a *binary* point.
)> If you keep the binary point at multiples of 8 (or 32), you
)> don't even have to do binary shifts.
)
) Doesn't that require converting a decimal input faction (e.g. .4)
) to a binary representation? I don't want to do that, because you
) lose accuracy.
)
) Or do you mean something I'm not getting? An example might help.

Well, yes, that's what I was getting at.
My idea was to have two binary integers, one for the 'whole' part, and one
for the 'fractional' part. In binary representation, of course.

The only place you lose accuracy there is when someone inputs numbers in
decimal notation. But you're going to lose accuracy anyhow, as soon as you
start dividing, or taking roots, or whatever.


I'd still say using quotients is a good alternative, especially as you'll
have perfect accuracy with division.

You can easily limit accuracy by imposing a limit on the divider.

I don't think there's much speed difference between the two methods,
especially not on multiplication, and division suddenly becomes trivial.
You of course need a gcd algorithm to normalize your quotients, but that's
not too hard, especially if you use the optimized binary algorithm.

Addition may be a bit more expensive, especially if you can optimize
multiply-by-ten.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


Similar Posts