- code optimiation
- Posted by aamer on May 10th, 2008
Dear all,
Are there any hard and fast rules for code optimization in C targetting
processor.
Thanks and regards
- Posted by Chris H on May 10th, 2008
In message <sPCdnTW5QIonxbjVnZ2dnUVZ_o3inZ2d@giganews.com>, aamer
<raqeebhyd@yahoo.com> writes
No.
Which compiler?
Which target?
--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
- Posted by aamer on May 10th, 2008
Thanks Chris for your reply. I am working on ARM7TDMI simulator and th
basics I know about code optimisation is
1. converting floating point code to fixed point.
2. writing assembly for critical modules.
apart from this are there any methods???
do you have any idea about the text books on advanced C programmin
covering code optimization methods.
regards
- Posted by Walter Banks on May 10th, 2008
aamer wrote:
The only hard and fast rule is the "as if" rule in the C99
standards. Optimized code must function as if it were
implemented as the standard describes.
There is surprisingly little good information available on
optimization of C. Compiler texts in general devote
a lot of space to parsing an activity that takes a small
fraction of the time to implement a compiler.
The best that most texts do is describe individual optimization
techniques. As far as I know none of them deals with the
much tougher problem of managing optimization in compiled
code. Determining where to apply optimizations and more
important where not to and compiling with an application
level optimization strategy makes a big difference in the
ultimate code generated.
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com
- Posted by Walter Banks on May 10th, 2008
aamer wrote:
You may want to look carefully at both of these.
1) Fixed point is about precision and floating point is about dynamic
range.
A few months ago we did some detailed metrics between fixed and floating
point code. The biggest surprise in well implemented comparisons is
although
the fixed point was slightly smaller and faster than floating point the
conclusion was that the usage choice would be depend on other things
in the application.
2) In well implemented compilers asm is not an advantage. In most
compilers
algorithm choice in critical modules is more important that asm vs C
implementation.
What is your application area?
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com
- Posted by Chris H on May 10th, 2008
In message <OLWdnZow-bhT7LjVnZ2dnUVZ_uLinZ2d@giganews.com>, aamer
<raqeebhyd@yahoo.com> writes
Is this for yourself or something for general use? It's a big job.
What ARM compilers are you thinking of working with (and why not use
their simulators)?
Writing ASM is not always optimisation (anyone wants to argue can we do
this is a separate thread please :-)
Lots. Some depend on the compiler you are using.
Are you are trying to write optimised C code or a simulator that handles
optimised C code where you need to match the source to the object code?
--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
- Posted by Grant Edwards on May 10th, 2008
On 2008-05-10, aamer <raqeebhyd@yahoo.com> wrote:
Yes.
You're welcome.
--
Grant Edwards grante Yow! Gibble, Gobble, we
at ACCEPT YOU...
visi.com
- Posted by Hans-Bernhard Bröker on May 10th, 2008
aamer wrote:
That's quite unclear for a problem statement.
Are you working on producing yet another ARM simulator, or are you
working on a C program for some ARM chip, with a simulator as your
current platform?
If the former: it's unclear a) why you're doing that, and b) why you
think that's an embedded system, and thus on-topic in this newsgroup.
If the latter, it's unclear why you think mentioning the simulator is of
any relevance.
Either way you're missing the most important rules of "code
optimization":
1) Don't do it.
2) _Still_ don't do it.
3) If you're really sure you have to: _measure_ before you do it.
And consider algorithm changes before you invest your time into code
changes that likely as not will have no effect at all, or even make
things worse.
- Posted by Tomás Ó hÉilidhe on May 10th, 2008
On May 10, 9:11*am, "aamer" <raqeeb...@yahoo.com> wrote:
I'd advocate using types like "uint_fast8_t" instead of "unsigned
int"; that way you'll get good performance out of all kinds of
machine, whether they be 8-Bit, 16-Bit or 5-billion-Bit. For instance
if you use "unsigned int" on an 8-Bit microcontroller where an 8-Bit
integer would suffice, then your code will be at least twice as slow
because multiple instructions are used everytime you do simple
arithmetic.
Also I'd advocate using "built-in" parts of the language where
possible, e.g.:
unsigned arr[12] = {0};
instead of:
unsigned arr;
memset(arr,0,sizeof arr);
(Also the former is fully portable for dealing with types like
pointers and floating point types whose "zero value" might not be all-
bits-zero)
Another thing would be about the use of the post-increment and post-
decrement operators in a conditional. For instance:
void strcpy(char *dst, char const *src)
{
while (*dst++ = *src++);
}
The idiom of using *p++ is widespread, but unfortunately its use is no
longer advisable because hardware has moved on. I think it was the
PDP11 that had a single instruction for dereferencing a pointer and
also incrementing it at the same time, thus it was beneficial to use *p
++ wherever possible -- however modern machines don't have such an
instruction, so the assembler produced for *p++ when used as the
conditional in an if statement, for instance, might be sub-optimal. So
I'd say opt for:
for ( ; *dst = *src; ++dst, ++src) ;
Moving on...
On most machines, I would use pointers instead of element indices for
iterating thru an array. For example:
char *p = arr;
char const *const pend = arr + LENGTH;
do if ('a' == *p) return 1;
while (pend != ++p);
intead of:
unsigned i = 0;
do if ('a' == arr[i]) return 1;
while (LENGTH != ++i);
The latter, on most architectures, is a hell of a lot slower. But then
again there are some PC's that have a single instruction for "pointer
+ offset", so I can't discredit that technique altogether.
On all architectures, I advocate the use of look-up tables instead of
switch statements where applicable, especially when it's possible to
have a look-up table containing function pointers.
If you're ever dealing with a struct that has a lot of information in
it which is common to a "type", then it might be advisable to follow C+
+'s idom of removing that stuff from the struct and replacing it with
a pointer to a single object which contains all the relevant
information for that time (a V-Table, that is).
Emmm they're the main ones that come to mind right now.
- Posted by Mark Borgerson on May 10th, 2008
In article <sPCdnTW5QIonxbjVnZ2dnUVZ_o3inZ2d@giganews.com>,
raqeebhyd@yahoo.com says...
speed and optimization for space. The latter is more common
on smaller processors with limited code storage.
Here are a few tricks I've played around with to make C code
run faster:
When you have arrays of structures, pad the structures to
end up 8,16,32 or 64 bytes long. That allows the compiler
to index into the array by a left shift of the index.
This probably isn't worth the trouble on a 32-bit processor
with a hardware multiply instruction.
When traversing arrays of elements longer than one byte, the compiler
will sometimes generate faster code if you use a local pointer (which
ends up in a register) and increment the pointer rather
than incrementing an array index, then multiplying the index by
the element size.
Some things like this may still be worthwhile on 8 and
16-bit processors. With 32-bit processors like the ARM,
which can combine shifts with other functions in a single
instruction, I generally trust the compiler writer, then
verify by looking at the generated assembly code.
Since I'm still solving the same kind of data logging problems
that I was working with a decade ago, while the processors
now have about 8 times the speed and memory with about 1/2
the power consumption, I've mostly quit worrying about
optimizations. That's allowed me to concentrate on clean,
maintainable code that can be delivered on schedule. Since
I'm working in a niche market where unit cost is not the primary
constraint, I can afford to use good tools and good materials.
As others will no doubt tell you: It's better to think about
algorithms than to worry about optimizations.
Mark Borgerson
- Posted by David Brown on May 10th, 2008
Tomás Ó hÉilidhe wrote:
Using the "fast" types can make sense, especially for speed-critical
code. There are advantages in using the size-specific types, however -
specifying "uint8_t" rather than "uint_fast8_t" may let the compiler (or
linter) spot range errors that would not be found if "uint_fast8_t"
boils down to a 32-bit value. Given that the compiler can often
optimise the generated code to use the best sized types available, it's
seldom worth specifying "fast" types explicitly.
That's good advice, except that using "unsigned" contradicts your
previous advice. Personally, I dislike abbreviated types like
"unsigned" - I always write the implicit "int" explicitly.
I presume you meant "unsigned arr[12];" here.
The main reason for using the {} initialiser rather than memset() or
other methods is that it gives clearer and shorter source code - smaller
and faster object code is a bonus (in some circumstances, compilers will
optimise the memset() call to the same code anyway).
It is virtually impossible to write fully portable code - and totally
impossible within the world of embedded programming. Forget the
machines that have weird values for zeros, or bizarre numbers of bits
(although some DSP's have 16-bit or 32-bit chars), or something other
than two's complement arithmetic, or non-ASCII for their basic character
set. It's not worth it - code suitable for an ARM is not suitable for
running on a 1970's mainframe anyway.
In any code review, that form would be taken out and shot. Just because
it is legal in C to write an ugly mess inside a for() statement, does
not mean that it is sensible to write it. It's not even going to
produce smaller or faster code - any compiler that can't produce tight
code for the original while() will produce poor code from this construct
too.
The first idiom is so commonly used that it is clear to any reader -
although I'd have two sets of parenthesis (gcc convention to disable a
warning) and perhaps a comment to say that I really meant a single "=".
For less capable compilers, you are probably better with:
while ((*dst = *src)) {
dst++;
src++;
}
That's far clearer to the reader, and easier for a less sophisticated
compiler.
It's always important to examine the generated assembly code, and learn
to know your target architecture and your compiler's idiosyncrasies if
you want to get the best from it - don't guess randomly at the most
obfuscated expression you can think of.
First off, get yourself a decent compiler. It will do the same job, and
let you write the source code using proper array constructs.
Secondly, don't write a loop like that (first or second forms) without
using brackets - it's unclear, and it changes can easily break the code.
Third, forget the silly "if (constant == variable)" form of expression
unless you are working for MISRA nazis (i.e., those that think the rules
are unbendable). The logical and sensible ordering when reading such a
comparison is normally "if (variable == constant)". If your compiler
does not spot mistakes such as using a single "=" when you meant "==",
get a better compiler or a better linter.
Do you have any evidence whatsoever for such a wild claim? A good
compiler will use pointer instructions for array access, and will do the
strength reduction turning the array loop into an incrementing pointer.
Also, there are plenty of current modern architectures that have array
memory modes that will be used as appropriate.
You can advocate all you want - fortunately most people will ignore you.
The compiler will almost always generate better code for common switch
cases than a lookup table - and will generate a jump table automatically
as necessary. This will be significantly smaller and faster than a
lookup table of function pointers. (There are plenty of good reasons
for using a table of function pointers as a code construct - it's just
that replacing switch statements is not one of them.)
It *might* be, but it sounds very unlikely. What you describe is not a
C++ idiom, and it's not a vtable - you are describing static data members.
- Posted by Nils on May 10th, 2008
My main-points (for speed and size) are:
1. Benchmark what's worth optimizting
2. Do all algorithmic optimizations first.
3. Get a good compiler. If you're using GCC consider compiling a new one.
4. Learn what the restrict-keyword from C99 does. Most compiles support
it these days. Use restrict whenever possible, but never if you're not
sure if it can be applied.
5. Don't use unsigned integers for loop-variables unless you need the
wrap-around feature.
6. Let the compiler decide what to inline and what not. Don't inline
functions just because you think the code will benefit from it.
7. Embedded CPU's often have small caches and slow external memory. Try
to keep your working-set small. Packing multiple booleans or enums in a
single integer may look dirty (less so if you hide the dirty details
with macros), but if it can increase the cache efficiency a lot.
And last: It's not worth to outsmart the compiler. Changing loops from
indexing to pointer increment style is not worth it anymore. The
compiler will do this job for you.
- Posted by Dombo on May 10th, 2008
Nils wrote:
0. Starting with clear, well structured, code will help if you need to
optimize it later.
Also use the benchmark to determine if there is a performance issue in
the first place. Though aiming for efficient code is a lofty goal, other
goals like correctness, robustness, maintainability, clarity...etc are
often at least as (and usually more) important. No one will care how
fast your code can produce incorrect results.
Algorithmic optimizations can improve performance by orders of
magnitude, code optimizations rarely improve performance by more than
30% and usually much less than that.
Or more general: never assume some 'clever trick' will generate faster
or smaller code - instead prove that the 'clever trick' will yield the
desired effect. With prove I mean measure (before and after) and/or
check the compiler output (which also helps to develop a feel what is
expensive and what not). Also remember that not all compilers are alike;
some compilers optimize certain code sequences better than others.
I have seen too many examples of people obfuscating the source code
assuming they are helping the compiler to generate more efficient code,
while in reality they made things performance wise no better and
sometimes even worse.
- Posted by Tom on May 10th, 2008
In article <68mc12F2u0qngU1@mid.uni-berlin.de>, Nils <n.pipenbrinck@cubic.org> wrote:
Not necessarily true:
.................... while (my_unsigned8var < 5) {}
1BF0: MOVF x85,W
1BF2: SUBLW 04
1BF4: BNC 1BF8
1BF6: BRA 1BF0
.................... while (my_signed8var < 5) {}
1BF8: BTFSC x86.7
1BFA: BRA 1C02
1BFC: MOVF x86,W
1BFE: SUBLW 04
1C00: BNC 1C04
1C02: BRA 1BF8
....................
On this particular combination of target and compiler (PIC18 with CCS C)
unsigned in always faster than signed.
- Posted by Grant Edwards on May 10th, 2008
On 2008-05-10, Tom <tom@nospam.com> wrote:
I've seen various other compilers/targets where use of unsigned
loop indexes is faster. For example, one of the tips/tricks
listed when using GCC for the MSP430 target:
Tips and trick for efficient programming
[...]
10. Use unsigned int for indices - the compiler will snip
_lots_ of code.
On second thought, that might be refering to array indexes
instead of loop indexes. Hmm...
--
Grant Edwards grante Yow! I want to read my new
at poem about pork brains and
visi.com outer space ...
- Posted by Tomás Ó hÉilidhe on May 11th, 2008
On May 10, 8:19*pm, David Brown
<david.br...@hesbynett.removethisbit.no> wrote:
I write fully-portable code all the time and I find it to be a simple
task a lot of the time. The C Standard provides you with plenty of
information to write fully-portable algorithms and programs.
- Posted by Tomás Ó hÉilidhe on May 11th, 2008
I myself only use signed integer types when I need to store negative
numbers. Other reasons for going with unsigned are:
1) With signed integer types, you get undefined behaviour upon
overflow.
2) On machines other than two's complement, arithmetic can be less
efficient with signed.
3) You can be left with a trap representation if you play around with
the bits of a signed, depending on the system.
I see signed integer types as nasty and so I only use them when I
really have to.
- Posted by Tim Wescott on May 11th, 2008
On Sat, 10 May 2008 19:56:03 -0700, Tomás Ó hÉilidhe wrote:
"I'm only 21 years of age".
Chances are good that you're pontificating to someone who's been earning
money at this game since before you were an orgasm.
So is your experience vast, or your statement half-vast?
--
Tim Wescott
Control systems and communications consulting
http://www.wescottdesign.com
Need to learn how to apply control theory in your embedded system?
"Applied Control Theory for Embedded Systems" by Tim Wescott
Elsevier/Newnes, http://www.wescottdesign.com/actfes/actfes.html
- Posted by Eric Smith on May 11th, 2008
Walter Banks <walter@bytecraft.com> writes:
Most compilers don't need optimization, since most compilers are for
things other than general-purpose programming languages. Therefore
far more compiler writers need to know about parsing than need to
know about optimization.
Try books like _Advanced Compiler Design and Implementation_ by
Steven Muchnick.
Eric
- Posted by Walter Banks on May 11th, 2008
Eric Smith wrote:
This is a good point.
It was one of the books I was referring to that has
good descriptions of individual optimization techniques but
deal with optimization management and application
level optimization strategy very well.
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com