- mixing C and assembly
- Posted by David Brown on April 27th, 2008
cbarn24050@aol.com wrote:
I don't think generalisations like that are appropriate, no matter how
you phrase them. But don't worry - no offence taken.
As I say, you are not likely to get useful direct evidence here. Even
if someone posts some code (such as Walter's white paper), it's easy to
say "that's just a test case". The same applies the other way too. So
all you get is witness declarations, which you can credit or not.
I *could*, but I won't. Again, it would be unprofessional.
Choose the tools that work for *you*.
But for most people (even good, experienced assembly programmers) and
most applications, C is normally a better choice of development language.
I'm not making claims that C compilers always generate smaller and
faster code than assembly - just that with a good C compiler, C is an
appropriate choice of language even on small devices like AVR Tinys.
There are some sorts of code that can be coded more compactly in
assembly, and other sorts of code where the compiler can optimise better
than an assembler programmer writing clear and maintainable assembly code.
What you are saying, in a somewhat exaggerated fashion, is that C is a
procedural programming language (although it is more correct to say that
C *supports* procedural programming). That's true, but it is also an
imperative programming language. That is to say, most lines of C code
are declarations or statements saying how a task is to be accomplished -
function calls are only one type of statement or expression available to
the programmer (compare this to Forth, in which a much higher percentage
of statements are calls). Splitting your code into separate functions
is an important aspect of structured programming in C - but it is not
the "whole point of the language". Most assembly programmers divide up
their code into functions and procedures in a similar manner to C
programming.
- Posted by Chris H on April 27th, 2008
In message <4813CE3E.A2ECC0FD@yahoo.com>, CBFalconer
<cbfalconer@yahoo.com> writes
Possibly but The C programmer will beat that assembler programmer on all
other MCU...
He will also be a close second MCU A... More to the point he will be
able to turn out reliable, more easily maintainable, applications
faster.
--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
- Posted by David Brown on April 27th, 2008
CBFalconer wrote:
It took a bit of convincing to make me understand how C could work
better than assembly for small processors. Part of that was from my own
experience, and part of it was that, until fairly recently, C compilers
were not good enough to compete with an experienced assembly programmer.
But let me give you a brief example of the sort of thing a C compiler
can easily do, that an assembly programmer cannot do while still writing
maintainable and legible code.
On the COP8 processor, the two most important addressing modes for
arithmetic instructions are direct access (in which the memory address
is specified in the instruction), and indirect via the B register.
Direct access instructions take 3 bytes and 4 cycles, indirect take 1
byte and 1 cycle. (This is from memory, so I might make a few errors
here.) Suppose you have a function that adds two global variables, and
stores the result in a third. The natural assembly code is something
like this:
.sect .data
var1: .dsb 1
var2: .dsb 1
sum: .dsb 1
.endsect
.sect .code
AddNumbers:
ld a, var1 ; 3 bytes, 4 cycles
add a, var2 ; 3 bytes, 4 cycles
x a, sum ; 3 bytes, 4 cycles
ret ; 1 byte, x cycles
.endsect
Total: 10 bytes, 12 cycles + ret (I can't remember how many cycles ret
takes).
C code:
uint8_t var1, var2, sum;
void AddNumbers(void) {
sum = var1 + var2;
}
Possible compiler-generated assembly code:
.sect .data
var1: .dsb 1
var2: .dsb 1
sum: .dsb 1
.endsect
.sect .code
AddNumbers:
ld b, #var1 ; 2 bytes, 2 cycles IIRC
ld a, [b+] ; 1 bytes, 1 cycles
add a, [b+] ; 1 bytes, 1 cycles
x a, [b] ; 1 bytes, 2 cycles IIRC
ret ; 1 bytes, x cycles
.endsect
Total: 6 bytes, 6 cycles + ret
Obviously, an assembler programmer could write this code directly as
well. But it only works as long as var1, var2 and sum are ordered in
this manner. If they were spilt up, the assembly code would break -
maintainance and legibility suffer greatly. Perhaps you have other
routines that could be optimised using [b] mode if the data were in a
different order. Writing the assembly by hand, you've got to figure out
which ordering works best - and re-write your functions to take
advantage of the ordering. A small change to one part of the code means
a re-write for other parts of the code - that's not a good plan for
software development. Thus in realistic programs, the programmer will
go for the pessimistic code that works regardless of the orderings. A
compiler, on the other hand, can pick a reasonable (not *optimal* - that
is not achievable in polynomial time, but pretty good nonetheless)
ordering based on variable usage, and it will make use of that ordering
when generating function code.
- Posted by David Brown on April 27th, 2008
Walter Banks wrote:
That's just tail call elimination (changing a "call X; ret" into a "jmp
X"), which is a standard optimisation technique (some assemblers will do
that for you).
A better example would be:
WriteSpace:
ld a, #' '
WriteChar:
st a, outputCharacter
ret
with C code:
extern volatile char outputCharacter;
void WriteChar(char c) {
outputCharacter = c;
}
void WriteSpace(void) {
WriteChar(' ');
}
- Posted by David Brown on April 27th, 2008
Robert Adsett wrote:
A quick test on avr-gcc 4.2.2, using 16-bit and 8-bit ints rather than
32-bit and 16-bit (since it's an 8-bit cpu) reveals that avr-gcc is
smart enough to do a 8-bit x 8-bit -> 16-bit multiply as desired. It's
a little harder to see exactly what is happening for bigger numbers and
for division, since these use library calls - certainly the compiler
will generalise some of these functions. But for the very common case
of the multiply like this, you get optimal code.
- Posted by Walter Banks on April 27th, 2008
Robert Adsett wrote:
Robert,
A lot of approach depends on processor. We use the "as if"
rule a lot in code generation. In general 8*8->16 bits will
use a processor 8*8 if we can. Similarly we grab the MS 8bits
when we multiply two 8 bit fracts rather than casting and using
a 32 bit multiply.
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com
- Posted by Walter Banks on April 27th, 2008
CBFalconer wrote:
http://www.bytecraft.com/C_versus_Assembly
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com
- Posted by Walter Banks on April 27th, 2008
CBFalconer wrote:
I should have used fixed point type to make the listing fragment
clearer. This is the source used in the example.
void bar (void);
void foo (void)
{
NOP();
bar();
}
void bar (void)
{
NOP();
}
void main (void)
{
foo();
bar();
}
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com
- Posted by Hans-Bernhard Bröker on April 27th, 2008
Walter Banks wrote:
Huh? Is something wrong with my writing or with your reading? Where in
the above did you see me talking about maintainability or difficulty?
The issue at hand is _speed_ and _size_. No more, no less.
That's why the prudent assembly programmer would secure such tricks with
assemlby-time assertions. I.e. make the assumptions explicity, and make
sure that the code fails to translate if any of them is no longer true.
Agreed. But you're still missing the point under discussion.
- Posted by cbarn24050@aol.com on April 27th, 2008
On Apr 27, 10:23�am, David Brown
Walters paper isn't even a test case. Your right that I wont get
evidence here, that would require some effort rather than just waffle
Thats a different claim from Walter, not being familiar with AVRs I
couldnt say one way or the other but it's no good on small PICs.
�Splitting your code into separate functions
A C function is not just a subroutine, It's much more than that. It's
a complete stand alone program, has no dependencies on either the
calling program or programs it calls. The idea is that each function
can be developed, tested and debuged independently.
Ive seen that one before gforth! wont run on windows.
They cant sell this one any more, some european directive, they dont
say which one.
So I'm still waiting.
- Posted by Chris H on April 27th, 2008
In message <fv1ut9$a74$02$1@news.t-online.com>, Hans-Bernhard Bröker
<HBBroeker@t-online.de> writes
In which case you loose... I can read the C. I cant read the ASM so I
won't be able to see that what you have done is the same as the C or
even correct.... :-)
The whole point is that the C can be as fast and as small as the ASM but
MUCH easier to read, debug and maintain. Certainly far faster to write.
(BTW I do enjoy writing in asm but that is not the point)
Also the compilers can do some optimisations that humans find difficult
to do. Some optimisations involve the linker, not just the compiler so I
am told be a compiler writer (no, it was not Walter).
So in SOME cases an experienced asm writer MIGHT be able to do smaller
faster code than the compiler but certainly NOT in the same time frame.
Also that particular experienced ASM programmer can probably only do
that for one or two MCU and not for all types of program.
--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
- Posted by CBFalconer on April 27th, 2008
David Brown wrote:
Defining 'optimal' is a varying target. Among others, see Knuth.
In particular, in the past I have compromised on an 8 * 16 -> 24
bit heart, two of which, with an addition, produced a 16 * 16 -> 32
multiplication. This had, on the machine of interest (an 8080),
significant advantages, i.e. about a 50% decrease in multiplication
times. Other games are available at the compile stage where one
operand is constant, especially those where the multiplier consists
of some solid string of 1 bits.
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
- Posted by CBFalconer on April 27th, 2008
Walter Banks wrote:
Well, that executes foo (and thus bar), followed by bar. I see no
savings there from fall-thru. See my message of Sat. 11:13 am EDT
-0400.
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
- Posted by CBFalconer on April 27th, 2008
David Brown wrote:
But that doesn't do anything, because normal C executes a return on
the closing brace. Am I missing something?
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
- Posted by Walter Banks on April 27th, 2008
Hans-Bernhard Bröker wrote:
It is this type of check that is already embedded in C compilers.
Programming in asm is both an exercise in application programming
and implementation. C the focus is about application algothrims
with an implementation outline.
I don't think so. Most of what I have been saying is use the correct
tool for the job. This is not an asm vs C issue. The importance
of the work we did that created the white paper is proof that
C did not have to be at a performance disadvantage to asm.
That said, lets look at the other issues and see where C has
an advantage.
We are increasingly seeing ISA's that were designed specifically
for machine generated code. Our focus has always been on
making the code generation process easier.
Regards
--
Walter Banks
Byte Craft Limited
Tel. (519) 888-6911
http://www.bytecraft.com
walter@bytecraft.com
- Posted by Robert Adsett on April 27th, 2008
In article <6fmdnRynGbho0onVRVnyjAA@lyse.net>, David Brown says...
So at least some compilers do so. Thanks.
Robert
** Posted from http://www.teranews.com **
- Posted by Walter Banks on April 27th, 2008
CBFalconer wrote:
There is a savings
Look at the listing I posted before. It follows in fixed point type.
Don't start a rant about html please
w..
void bar (void);
void foo (void)
{
0100 9D NOP NOP();
bar();
}
void bar (void)
{
0101 9D NOP NOP();
0102 81 RTS }
void main (void)
{
0103 AD FB BSR $0100 foo();
0105 20 FA BRA $0101 bar();
}
__MAIN:
FFFE 01 03
- Posted by David Brown on April 27th, 2008
CBFalconer wrote:
You must be missing something :-) Your example code was not very
helpful, because your first version implied that foo is a callable
function in its own right - making a combined fall-through foobar would
require duplicating the code for foo. Thus Walter did a direct
translation to C and generated code that was slightly better than your
first assembly code. In the code I've given, I wrote an assembly
function with two distinct entry points, and the typical equivalent C
code for it. The question is, will Walter's C compiler generate a
fall-through here?
- Posted by David Brown on April 27th, 2008
CBFalconer wrote:
Yes, "optimal" can mean different things - code size, speed, stack use
and ram size being the most common points. "optimal" also depends on
things like shared library code, and any other information that the
compiler may have. That's why I restricted my test to a simple 8x8->16
multiply on the AVR - the generated code is simple enough to be optimal
in every way.
- Posted by Robert Adsett on April 27th, 2008
In article <481452FF.C82B6E1C@bytecraft.com>, Walter Banks says...
Good to know, thanks Walter.
Robert
** Posted from http://www.teranews.com **