- Integer Square Root Algorithm for Processor with MUL, DIV
- Posted by Pubkeybreaker on May 2nd, 2008
On May 2, 11:35*am, Mark Borgerson <mborger...@comcast.net> wrote:
Newton-Raphson for isqrt can be done with only integer arithemtic.....
- Posted by Nils on May 2nd, 2008
Pubkeybreaker schrieb:
Please show me... I have to take roots of 64 bit integers on a 32 bit
CPU and I currently do the two bit per iteration method. I can do an
educated guess via tables, and the significant bit-doubling of the
Newton-Raphosn method would be great.
However, my attempts at doing reciprocal square-roots in fixed-point
always failed and never reached the exact result.
But maybe i'm just to dumb to do it..
- Posted by Virgil on May 2nd, 2008
In article <481A1D0D.8516D9DF@yahoo.com>,
CBFalconer <cbfalconer@yahoo.com> wrote:
If it is using integer multiplication and division shouldn't it be
Rn+1 = (Rn + N/Rn) / 2
or even
Rn+1 = (Rn*Rn + N) / (2 * Rn)
- Posted by Didi on May 2nd, 2008
Virgil wrote:
But if you need even only one division per calculated root
this will in most if not all cases make the algorithm slower
than mine (which uses just as many mutiplications as there
are bits in the result, I posted an example earlier to this
thread - http://groups.google.com/group/comp....b?dmode=source
)
Dimiter
------------------------------------------------------
Dimiter Popoff Transgalactic Instruments
http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tg...7600228621276/
Original message: http://groups.google.com/group/comp....8?dmode=source
- Posted by Virgil on May 3rd, 2008
In article
<0da2c6eb-aa7c-494b-9fc1-80c41f74b1f3@59g2000hsb.googlegroups.com>,
Didi <dp@tgi-sci.com> wrote:
The formula that I objected to involved a multiplication by a
non-integer in a supposedly integer only operation.
Admittedly a multiplication by 0.5 is easy enough as a binary right
shift, but that is not what indicated.
- Posted by Christopher Henrich on May 3rd, 2008
In article <a6SdnYl0KKE-ZYTVnZ2dnUVZ_uednZ2d@giganews.com>,
"David T. Ashley" <dta@e3ft.com> wrote:
Here's an algorithm for finding r = sqrt(x^2 + y^2).
Assume x > y > 0. Let
x_1 = (4 x^2 + 3)x / (4 x^2 + y^2)
y_1 = y^3/(4 x^2 + y^2).
Then x_1^2 + y_1^2 = x^2 + y^2. If the transformation x,y) \mapsto
(x_1, y_1) is iterated, then the point converges *cubically* to (r, 0).
I read this in a magazine many years ago.
Adapting it to your problem, I would first find r = sqrt(x^2 + y^2), then
s = sqrt(r^2 + z^2).
I would use scaled integer arithmetic, that is, represent x etc. by
j_x/D etc. where j_x is an integer and I choose D to be some convenient
common denominator like 65536.
--
Christopher J. Henrich
chenrich@monmouth.com
htp://www.mathinteract.com
- Posted by Paul E. Bennett on May 4th, 2008
Didi wrote:
If you had focused on the first part;
<quote>
A classical Forth puzzle is: What does the following do?
: ???? -1 TUCK DO 2 + DUP +LOOP 2/ ;
It is a standard method of calculating square root by summing
the odd integers. It works for all unsigned integers from 0 to
FFFFFFFF (or FFFF).
</quote>
You would have realised that the definition ???? is actually a sqrt by
the method stated. Even in the more spread out style used by many of us
these days this would only have been 5 lines at most.
--
************************************************** ******************
Paul E. Bennett...............<email://Paul_E.Bennett@topmail.co.uk>
Forth based HIDECS Consultancy
Mob: +44 (0)7811-639972
Tel: +44 (0)1235-811095
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
************************************************** ******************
- Posted by Didi on May 4th, 2008
Paul E. Bennett wrote:
Are you claiming that your unreadable 5 line algorithm which uses
summing somehow (I am not able to read that line and from what I
see I can say it is not worth learning how to read it) is in any way
superior to my 12 line example which finishes in a guaranteed
16 times through the loop for any 32 bit integer - and can be
read by any programmer? I might agree it is if the purpose is
to waste CPU and programmer time.
Dimiter
------------------------------------------------------
Dimiter Popoff Transgalactic Instruments
http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tg...7600228621276/
Original message: http://groups.google.com/group/comp....9?dmode=source
- Posted by CBFalconer on May 4th, 2008
"Paul E. Bennett" wrote:
Which points out that the mathematics is more important than the
coding. Three or four lines of C can easily produce the same
effect.
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
- Posted by James Waldby on May 4th, 2008
On Sat, 03 May 2008 20:22:47 +0000, Christopher Henrich wrote:
It probably would be a little better to find r = sqrt(x^2 + z^2),
then s = sqrt(r^2 + y^2), if x > y > z. On a processor with fast
multiply and divide relative to comparison and swap operations it
would make no difference, but on small cpu's the extra ops to put
x,y,z in order would cost much less on average than the potentially
avoided iteration steps.
- Posted by 42Bastian Schick on May 5th, 2008
On Thu, 1 May 2008 13:34:52 -0700 (PDT), Didi <dp@tgi-sci.com> wrote:
I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or
cntlzw (PowerPC) opcodes, which return the position of the first 1
bit, and could be used to setup the loop counter (ff1 / 2).
--
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Use <same-name>@monlynx.de instead !
- Posted by Didi on May 5th, 2008
42Bastian Schick wrote:
It will indeed - it is really good to have the ff1 opcode back (as in
the 020,
there was a bfff1 there).
I am a frequent cntlzw user on the PPC, but have not used it on this
algorithm yet (never thought of it, I did it first for the CPU32 - or
was it the
HC11 - where there was no such opcode and have not changed it since).
I may do so next time I dig in that vicinity.
Dimiter
------------------------------------------------------
Dimiter Popoff Transgalactic Instruments
http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tg...7600228621276/
Original message: http://groups.google.com/group/comp....b?dmode=source
- Posted by Paul Carpenter on May 6th, 2008
"David Brown" <david.brown@hesbynett.removethisbit.no> wrote in message
news:d_ydnXqQPZd614fVnZ2dnUVZ8umdnZ2d@lyse.net...
I am glad someone else said that. as often I look at algorithms and then
determine
what results are needed and if some of the final satges are actually needed!
Too often the mathematicians and straight coders are looking for the perfect
mousetrap where a sufficient one will do. By this I mean
Are our input or output ranges limited anyway
(for _most_ light reading applications negative light is not
possible!)
Do we need to process using the result in a way that a partial result
will
give the same relationship, i.e. could we just use the sum of
squares
or is the data being passed to a PC or higher computational device
better suited to do the intensive calculations.
What is out native (8/16/32/64bit) and software (emulated floating
point) ranges
and limitations on what we are doing.
How often are we doing this (once a day, down to millions a second)
How fast do we have to do this (closely associated with previous and
total system
and application
requirements)
How accurate is the response needed and implications of any inaccuracies
(e.g. image processing will calculating image position results to 10
decimal places
help us draw a circle on a fixed pixel array - integer).
--
Paul Carpenter | paul@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate
- Posted by lowcost on May 7th, 2008
Vladimir Vassilevsky ha scritto:
good question.
if you can tolerate +3 lsb or +7 % (that's better) accuracy, you can
convert polynomial to exponent (approx),
to halve exponent,
convert exponent to polynomial (approx).
none MUL_DIV: INC_DEC_SHIFT_ROT only.
best accuracy is at input values = 2^even.
worst accuracy is at input values = 2^odd.
regards
- Posted by Everett M. Greene on May 9th, 2008
lowcost <die.spam@invalid.com> writes:
You can use the common binary form of the square root process
and be accurate to 1 LSB for any value. The process involves
only shifts and conditional subtraction, a variation of
division. With some pre- and post-processing, it works for
floating point values as well.