- Integer Square Root Algorithm for Processor with MUL, DIV
- Posted by David T. Ashley on May 1st, 2008
What is the best integer square root algorithm for a processor with MUL and
DIV (integer multiplication and division) built-in?
For one without MUL and DIV (or where the instructions operate on shorter
operands than would be required to form the square in question)?
Thanks for all ideas and thoughts.
I will check TOACP, of course.
The application is a low-cost microcontroller with a 3-axis accelerometer.
To get the magnitude of the acceleration, we'd want to use
sqrt(x^2+y^2+z^2). The squaring operations are cheap (MUL). The addition
is cheap (ADD, ADC). The square root extraction is what I'm not sure how
best to do.
The algorithm that comes to mind is just to do a binary search by squaring
potential solutions and comparing them to the quantity whose square root is
to be extracted.
For example, if the sum of the squares is 16 bits or less, and since the
processor has a 8 x 8 = 16 MUL instruction, it should be possible to iterate
to the solution in no more than 16 square/compare cycles.
But maybe there is a better way?
And it isn't clear what to do if the sum of the squares is a bit larger.
Thanks for all.
--
David T. Ashley (dta@e3ft.com)
http://www.e3ft.com (Consulting Home Page)
http://www.dtashley.com (Personal Home Page)
http://gpl.e3ft.com (GPL Publications and Projects)
- Posted by Hans-Bernhard Bröker on May 1st, 2008
David T. Ashley wrote:
There's never such a thing as "the best" way to do something that's been
described so vaguely. Different circumstances render different aspects
important, and thus change what "good" actually means.
There's one exception to the above rule: not doing something at all, if
you can get away, is always the best way of doing it. So: do you really
*need* that sqrt() taken on your microcontroller?
Failing that you can still use use good old Newton-Raphson iteration to
locate the zero of (x^2 - input).
x |--> x - (x^2 - input) / (2*x)
= (x + input/x) / 2
- Posted by Vladimir Vassilevsky on May 1st, 2008
David T. Ashley wrote:
Here is the simplest (but not the very fastest) sqrt algorithm for the
CPU with cheap multiplication:
//-------------------------
s16 sqrt_s16(s32 x)
{
s32 tmp_result;
s16 root, tmp_root, tmp_add;
root = 0;
tmp_add = 0x4000;
while(tmp_add)
{
tmp_root = root + tmp_add;
tmp_result = ((s32)tmp_root)*tmp_root;
if(tmp_result <= x) root = tmp_root;
tmp_add >>= 1;
}
return root;
}
//-----------------------
Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
- Posted by Dann Corbit on May 1st, 2008
"David T. Ashley" <dta@e3ft.com> wrote in message
news:a6SdnYl0KKE-ZYTVnZ2dnUVZ_uednZ2d@giganews.com...
Take a look here:
http://www.azillionmonkeys.com/qed/sqroot.html
A zillion monkeys can't all be wrong.
** Posted from http://www.teranews.com **
- Posted by Pubkeybreaker on May 1st, 2008
On May 1, 1:06*pm, "David T. Ashley" <d...@e3ft.com> wrote:
Approximate the square root by the lowest half of the bit
representation
of x^2 + y^2 (or any other easy, convenient approximation), and then
use Newton-Raphson.
- Posted by Rich Webb on May 1st, 2008
On Thu, 1 May 2008 13:06:42 -0400, "David T. Ashley" <dta@e3ft.com>
wrote:
Jack Crenshaw had a series of columns on numerical methods in
"Embedded Systems" a few years ago. Has a good book on the subject, as
well. http://www.powells.com/biblio/72-9781929629091-0
(The past tense isn't meant to imply that his current columns aren't
great too... ;-)
As far as the columns, the one on integer square roots is at
<http://pdf.aiaa.org/jaPreview/JGCD/1997/PVJAIMP4026.pdf>
--
Rich Webb Norfolk, VA
- Posted by Vladimir Vassilevsky on May 1st, 2008
David T. Ashley wrote:
How accurate do you want to be?
let X = min(a,b)/max(a,b)
Then you can use a polynomial approximation:
sqrt(a^2 + b^2) ~ max(a,b)*(1 + P(X))
The simplest approximation of the 1st order (somewhat 10% accurate):
sqrt(a^2 + b^2) ~ max(a,b) + 0.318 * min(a,b)
In the 3-dimensional case, you have to do this procedure twice:
sqrt(x^2 + y^2 + z^2) = sqrt(max(x,y,z)^2 + sqrt(mid(x,y,z)^2 +
min(x,y,z)^2))^2
Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
- Posted by Walter Banks on May 1st, 2008
"David T. Ashley" wrote:
For no MUL and DIV search for Dann Corbit's postings. Dann
posted one a few years ago with N/2 interations. If you can't find it
contact me off line.
The following square root routine was originally done for
the 6805. It is probably not the fastest
The basic loop spits out one bit of result per pass. This
routine takes a 16 bit argument and returns an eight bit
integer and 8 bit fraction.
Walter Banks
http://www.bytecraft.com
================================================== ==========
#pragma option v;
#pragma has MUL;
long SQRT (long l)
/*
Square root V1.0.
Hayward/Banks.
This routine generates a square root to
eight bit resolution taking a 16 bit argument
normalizing the argument and returning a 16 bit
real number that consists of 8 bits integer and
8 bit fractional parts. The MS8bits are the integer
part.
i i i i i i i i . f f f f f f f f
| | | | | | | | | | | | | | | |
128 |32 | 8 4 2 1 .5 | | | | | | .00390625
64 16 .25 | | | | --.0078125
.125- | | ----.015625
.0625-- ------.03125
These routines may be adapted for any purpose. No
warranty is implied or given as to their usability for
any purpose.
(c) Byte Craft Limited
Waterloo, Ontario
Canada N2J 4E4
(519) 888-6911
Walter Banks/Gordon Hayward
*/
{
unsigned int n,j,k;
unsigned long t;
n = 0;
while( (l & 0xc000) == 0)
{
n++;
l <<= 2;
}
for (j = 0, k = 0x80; k != 0; k >>=1)
{
j = j | k;
t = j * j;
if (t > l) j = j & (~k);
}
t = j;
return (t << (8-n));
}
unsigned long result;
unsigned int int_part, frac_part;
void main(void)
{
result = SQRT(0x4000);
int_part = result >> 8;
frac_part = result & 0xff;
}
- Posted by Didi on May 1st, 2008
David T. Ashley wrote:
Actually you can do this in 8 multiplies. This is how I usually
implement
square root - I have probably reinvented it some 15+ years ago for
myself.
You just start with $80, square and compare to the number you want
the root of, and do successive approximation down to bit 0 (i.e.
second time
with $40+result so far, then $20 etc.).
Dimiter
------------------------------------------------------
Dimiter Popoff Transgalactic Instruments
http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tg...7600228621276/
Original message: http://groups.google.com/group/comp....6?dmode=source
- Posted by Arlet on May 1st, 2008
On May 1, 8:51 pm, Pubkeybreaker <pubkeybrea...@aol.com> wrote:
If the values of x, y, z don't change too much from sample to sample
you could use the previous value as the starting point for Newton-
Raphson iteration.
For a different application (not involving acceleration or square
roots), I performed only a single Newton-Raphson iteration per sample.
For slow moving inputs, this gave perfect results, and for fast moving
inputs, I got a free low-pass filter 
- Posted by CBFalconer on May 1st, 2008
"David T. Ashley" wrote:
Depends on needs. Probably your best bet is Newton approximation:
Rn+1 = 0.5 * (Rn + N/Rn)
and repeat until the error is negligible. Start with R == 1.0.
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
- Posted by user923005 on May 1st, 2008
On May 1, 10:06*am, "David T. Ashley" <d...@e3ft.com> wrote:
(I posted this also from TerraNews, but sometimes those posts take a
few days to show up):
Take a look here:
http://www.azillionmonkeys.com/qed/sqroot.html
A zillion monkeys can't all be wrong.
- Posted by Paul E. Bennett on May 1st, 2008
David T. Ashley wrote:
See:-
<http://groups.google.co.uk/group/comp.lang.forth/browse_thread/thread/67130b2beb453d93/51fa6f17a670fe31?hl=en&lnk=st&q=Forth+Square+Root# 51fa6f17a670fe31>
Nice and simple method that works quite well.
--
************************************************** ******************
Paul E. Bennett...............<email://Paul_E.Bennett@topmail.co.uk>
Forth based HIDECS Consultancy
Mob: +44 (0)7811-639972
Tel: +44 (0)1235-811095
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
************************************************** ******************
- Posted by Didi on May 1st, 2008
Paul E. Bennett wrote:
Well my algorithm takes less lines and is a lot easier to read than
those
in HLL which were suggested, but I hope it counts in spite of
that :-).
Here is it excerpted from a post I have made many years ago,
calculates a 16 bit squre root of a 32 bit number.
The example is in CPU32 (or CF) assembly.
*
* calc. the square root of d1.l; return it in d1.l
* destroys d2,d3,d4
*
sqrtd1
moveq #15,d2 counter
clr.l d3 colect result here
set loop,*
bset d2,d3 try this
move.w d3,d4 extra copy result so far
mulu.w d4,d4 square
cmp.l d1,d4 higher?
bls keepbit branch not
bclr d2,d3 else bit back down
keepbit
dbra d2,loop process all bits
move.l d3,d1 update in d1
rts
*
Dimiter
------------------------------------------------------
Dimiter Popoff Transgalactic Instruments
http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tg...7600228621276/
Original message: http://groups.google.com/group/comp....9?dmode=source
- Posted by user923005 on May 1st, 2008
On May 1, 12:05*pm, Rich Webb <bbew...@mapson.nozirev.ten> wrote:
I have Mr. Crenshaw's book "Math Toolkit for REAL-TIME Programming"
"Square Root" runs from pages 43-87 and he discusses a boatload of
methods.
Unfortunately, I have misplaced the CD. Looking for it now.
- Posted by David Brown on May 1st, 2008
David T. Ashley wrote:
First make sure you really need to find the square root. For many uses,
you could just use the calculated square of the magnitude.
- Posted by user923005 on May 2nd, 2008
On May 1, 1:59*pm, user923005 <dcor...@connx.com> wrote:
A benchmark shows this to be the fastest on modern hardware, but I do
not know about embedded CPUs. I guess they should be tested.
unsigned long isqrt26(unsigned long input)
{
unsigned long s = 0;
unsigned long s21 = 1073741824;
if (s21 <= input)
s += 32768, input -= s21;
s21 = (s << 15) + 268435456;
if (s21 <= input)
s += 16384, input -= s21;
s21 = (s << 14) + 67108864;
if (s21 <= input)
s += 8192, input -= s21;
s21 = (s << 13) + 16777216;
if (s21 <= input)
s += 4096, input -= s21;
s21 = (s << 12) + 4194304;
if (s21 <= input)
s += 2048, input -= s21;
s21 = (s << 11) + 1048576;
if (s21 <= input)
s += 1024, input -= s21;
s21 = (s << 10) + 262144;
if (s21 <= input)
s += 512, input -= s21;
s21 = (s << 9) + 65536;
if (s21 <= input)
s += 256, input -= s21;
s21 = (s << 8) + 16384;
if (s21 <= input)
s += 128, input -= s21;
s21 = (s << 7) + 4096;
if (s21 <= input)
s += 64, input -= s21;
s21 = (s << 6) + 1024;
if (s21 <= input)
s += 32, input -= s21;
s21 = (s << 5) + 256;
if (s21 <= input)
s += 16, input -= s21;
s21 = (s << 4) + 64;
if (s21 <= input)
s += 8, input -= s21;
s21 = (s << 3) + 16;
if (s21 <= input)
s += 4, input -= s21;
s21 = (s << 2) + 4;
if (s21 <= input)
s += 2, input -= s21;
s21 = (s << 1) + 1;
if (s21 <= input)
s++;
return s;
}
- Posted by user923005 on May 2nd, 2008
- Posted by Mark Borgerson on May 2nd, 2008
In article <481A1D0D.8516D9DF@yahoo.com>, cbfalconer@yahoo.com says...
an FPU! ;-)
Mark Borgerson
- Posted by Mark Borgerson on May 2nd, 2008
In article <3d850608-e3e7-40ee-8374-
b2f9149c95df@f63g2000hsf.googlegroups.com>, dcorbit@connx.com says...
has a 32-bit barrel shifter. Those aren't all that common on 16-bit
and smaller embedded processors. IIRC, you do get them when you get
up the the 32-bit processors such as the ARM. Would the
shift and add operations end up as a single instruction
on the ARM7?
Mark Borgerson