Tech Support > Operating Systems > MacOS > Warning for scientists: G4, gcc and trigonometric math functionsare dog slow
Warning for scientists: G4, gcc and trigonometric math functionsare dog slow
Posted by Siegfried Gonzi on March 7th, 2004


I can now show you at least 3 real life problems where a G4 falls behind the
Celeron by great margins. When I was researching for the ibook I had always
in mind whether my Linux programs will work correctly on them but I hadn't
expected that gcc performs that worse on Mac OSX. It is unbelieveable how
bad the gcc compiler actually performs.

Today, additionally to the matrix-matrix multiplication and almabech
(coyote) I tested my Mie scattering code. The code is common in atmospheric
science and describes the scattering of radiation due to a particle. And it
turned out that also in that case the Celeron outperforms the G4.

I mean the difference between 5 seconds (for the G4 ibook 800 MHz) and the
2.5 seconds for the 1000 MHz Celeron laptop is more than embarassing.

Hey, we are speakig here of an inferior Celeron laptop vs. G4 processor.

Retrospectively, I am not sure whether I had bought the ibook if I had been
aware of that thriller.

From scientist to scientist, I will tell you the following:

a) Buy the C/C++ $500.- IBM compiler for Mac OSX
b) Buy a Celeron 2.4 GHz laptop and you will get the power of a G4 ibook
with 5 GHz, provided you use gcc
c) Buy a 400 MHZ Celeron laptop and you will get the performance of a G4 800
MHz ibook.
d) do not fall for and believe that a stationary machine will perfrom
better: G4/G5 in combination with gcc cannot do it better - no chance.

[That all does not take into account INTEL its own C compiler which by
itself could make the situation even worse for the G4].

A good rule of thumb: Celeron for power.

I am by no means against commercial software, but paying additionally $500.-
is not very consumer friendly. What are your arguments at this point. Now,
you succeeded in convincing your colleagues that the higher prices of the
Apple are worth the money and the G4/G5 has the speed of a INTEL processor
wich must have higher frequencies. But do not forget to tell him the goody
that only IBM its $500.- compiler will be his friend.

What does that mean? Suggest INTEL to your friends and suggest Apple to your
enemies. Again, we are speaking here: Celeron vs. G4. And Apple does not
educate people at their www.apple.com:

==
gcc 3.3 ‹ Generate code using the latest C, C++, Objective-C open source
compiler with optimizations for the G5 processor.
==

What are that optimizations? Again: we are speaking of a Celeron vs. G4. The
IBM compiler cannot be an excuse for that.

Fensterbrett

Posted by Jim Polaski on March 7th, 2004


In article <BC712AE6.4FF%siegfried.gonzi@stud.uni-graz.at>,
Siegfried Gonzi <siegfried.gonzi@stud.uni-graz.at> wrote:

Do you have anything positive to say?

--
Jim Polaski
"The measure of a man is what he will do
knowing he will get nothing in return."

Posted by Woofbert on March 7th, 2004


In article <BC712AE6.4FF%siegfried.gonzi@stud.uni-graz.at>,
Siegfried Gonzi <siegfried.gonzi@stud.uni-graz.at> wrote:

<snip>

Are you a scientist?
Why are you comparing the IBM C/C++ compiler on PPC with gcc on
Celeron? IT would be interesting to see what happens when the same
comparison is made with gcc on both platforms.

The argument that "The IBM compiler cannot be an excuse for that" is
really just an appeal to authority, and your whole point is incomplete
until the rest of the research is done.

--
Woofbert, Chief Rocket Surgeon, Infernosoft
Woofbert's Law on Learning Linux: When attempting to learn Linux,
study it thoroughly before you begin.

Posted by Carsten Hansen on March 7th, 2004



"Siegfried Gonzi" <siegfried.gonzi@stud.uni-graz.at> wrote in message
news:BC712AE6.4FF%siegfried.gonzi@stud.uni-graz.at...
You are providing no proof that the IBM compiler would do any better. That
disqualifies you as a scientist in my book.

The x86 architecture has instructions (fsin and fcos) for doing sine and
cosine directly. Without doubt the micro-code for doing these instructions
has been optimized by Intel (and AMD). They may even take advantage of SSE2
for doing things in parallel. All the data needed for the calculations is
internal to the processor.
The PowerPC on the other hand does not have transcendental instructions. So,
it has to do sine and cosine in software. The coefficients for the power
series approximations have to be read from memory. Moreover, as Altivec does
not support double precision floating-point arithmetic, the Altivec unit
cannot be used.
That stacks the deck against the PowerPC. I doubt any compiler can alleviate
the problem.
In fact it isn't really a compiler issue. It's a library issue. It would be
trivial for Apple to provide optimized math libraries for the gcc compiler.
Actually, I would be surprised if they aren't already doing that.

Carsten Hansen



Posted by Siegfried Gonzi on March 7th, 2004


in Beitrag woofbert.spam-0B56D7.11581007032004@typhoon.sonic.net schrieb
Woofbert unter woofbert.spam@infernosoft.com am 07.03.2004 20:58 Uhr:

/I actually made the two comparisons only with gcc/! See also the other two
threads on comp.sys.mac.advocacy. I do not even have the INTEL compiler.

The first two benchmarks are somewhat arbitrarily, but the latter is code
which I am using nearly daily.

My tests can be verified by anybody? If you also like to get the Mie code
drop me a note (though, you would also need Bigloo, because I call the core
from within Bigloo).


Fensterbrett


Posted by Siegfried Gonzi on March 7th, 2004


in Beitrag hVL2c.166372$hR.3088673@bgtnsc05-news.ops.worldnet.att.net
schrieb Carsten Hansen unter hansen.c@worldnet.att.net am 07.03.2004 21:49
Uhr:

I doubt you know how science works, otherwise you would know that science
does not happen in newsgroups. Or do you really believe I would claim things
in a peer-reviewed journal. However, I stand still to my timings: ibook vs.
Celeron laptop (and note: they are not silly Fibonacci benchmarks).


I did an internet search and found nothing. I tried also all the
optimizations of "man gcc".

By chance: are there any people out there who own such an IBM compiler? It
would be helpful if you pinch the code from the coyote gulch benchmark.

That gcc hangs out mostly in math functions is reasonable, especially since
due to the fact that some Bigloo codes take exactly the same time as the
original gcc code. Bigloo compiles to C via gcc and it would be really
strange that the Scheme code and C show exactly the same execution time. It
is a hint that something is going on under the hood and there is somewhere a
barrier.


Thanks,
Fensterbrett



Posted by Jim Polaski on March 7th, 2004


In article
<hVL2c.166372$hR.3088673@bgtnsc05-news.ops.worldnet.att.net>,
"Carsten Hansen" <hansen.c@worldnet.att.net> wrote:

I think you'd better run over to VT and let them know that their "Big
Mac" will be unsuitable for scientific calculations. That is what
they're planning on using it for you know.

Let us know what they say.

--
Jim Polaski
"The measure of a man is what he will do
knowing he will get nothing in return."

Posted by Snit on March 7th, 2004


"Siegfried Gonzi" <siegfried.gonzi@stud.uni-graz.at> wrote in
BC71506B.594%siegfried.gonzi@stud.uni-graz.at on 3/7/04 2:19 PM:

LOL. I think the idea that the scientific method is limited to any
particular venue to be silly. Did you really mean to imply that?

If so, how was science done before the journals existed? Or why did the
journals exist before science did?


Posted by Siegfried Gonzi on March 7th, 2004



n Beitrag jpolaski-F9CFB0.13093307032004@netnews.comcast.net schrieb Jim
Polaski unter jpolaski@NOync.net am 07.03.2004 20:09 Uhr:

Yes, no problem. I said a lot of good things (see other threads). But I also
know that only provocative posts gain attention and people will try to prove
that I am silly and wrong. I learn the most then (see the other response of
the guy with math libraries explanations; it was fairly good).

It is even this that just due to Mac OSX I went not back to the Apple dealer
and traded it in. The ibook is fairly new (a few days old). My old Linux
laptop is nearly dead and I am just migrating my programs from it to the
ibook. The ibook is only a third computer. I use Linux and the Sun OS at my
university.

Sorry, but I use my ibook a lot for off-line calculations (preparing files
for the workstations) and it is legitimate to see and look how well it
stacks up against my old (allegdly) inferior Celeron.

As I said: if I didn't like the Mac OSX I would have traded it in for an
INTEL.

Still, the performance penalty of gcc is outstanding. Okay, it sucks most on
math functions.

It sounds maybe harsher than it is, because if one would use Python on the
Mac nobody would notice a penalty, because Python itself is more than slow.

I was never impressed by benchmarks. Hey I use Bigloo, which is always 2 to
3 times slower than good C. But it is resonable to expect that an ibook G4
800MHz is at least on par with a Celeron 1000 MHz laptop.

Peace brother,
Fensterbrett


Posted by Carsten Hansen on March 7th, 2004



"Siegfried Gonzi" <siegfried.gonzi@stud.uni-graz.at> wrote in message
news:BC71506B.594%siegfried.gonzi@stud.uni-graz.at...
I have not questioned your timings. I believe them. I actually gave a
plausible explanation (that you snipped).
But you gave no justification for recommending people to buy the IBM
compiler. As it clear from your statement below, you have no idea how it
would perform. That is science???


I have no idea what you searched for or what you expected to find.


If the bottleneck is the transcendental functions, then any programming
language would show similar execution time.

Carsten Hansen



Posted by Siegfried Gonzi on March 7th, 2004



in Beitrag AHM2c.166636$hR.3092940@bgtnsc05-news.ops.worldnet.att.net
schrieb Carsten Hansen unter hansen.c@worldnet.att.net am 07.03.2004 22:42
Uhr:


Oh wait friend. If you believe I am an authority and you should buy what I
say oh man. I got the compiler advice in another thread. However, since I
did not try it myslef I should better shut up, but on the other side IBM
must be aware of the slow math functions; otherwise what are they benching?

Still all that has nothing to do with science (not my first post and not any
post which will follow in that thread).

I do not know what you mean by that, but do the following: pinch up the code
for the coyote gulch benchmarks and let it run on a Mac OSX (gcc) and an
INTEL (gcc) and then report your results.

Fensterbrett


Posted by Herb Singleton on March 7th, 2004


In article <BC712AE6.4FF%siegfried.gonzi@stud.uni-graz.at>,
Siegfried Gonzi <siegfried.gonzi@stud.uni-graz.at> wrote:

[snip]

Have you tried voicing your concerns on the Apple SciTech mailing list?

<http://lists.apple.com/mailman/listinfo/scitech>

Posted by Siegfried Gonzi on March 7th, 2004


in Beitrag usenet2-ECF4EE.17335707032004@comcast.ash.giganews.com schrieb
Herb Singleton unter usenet2@cross-spectrum.com am 07.03.2004 23:33 Uhr:

Thanks for the reference. I posted my concerns on Apple it's Unix list. And
clearly stated that I would be more than happy if an Apple engineer will
read it. I mean they are not stupid there and things will go gangbusters,
though, often slowly. But hey I am young (30 years old):

http://discussions.info.apple.com/We...ec5.0@.eeeb8d4

Fensterbrett



Posted by Carsten Hansen on March 7th, 2004



"Jim Polaski" <jpolaski@NOync.net> wrote in message
news:jpolaski-9FF960.15390307032004@netnews.comcast.net...

I have made no judgement about the suitability of the PowerPC chip for
scientific calculations. I gave an explanation for why transcendental
functions may be slower on the PowerPC than on x86. There is a lot more to
scientific calculations than just using sine and cosine (e.g. Linpack used
for solving linear equations; "The best performance on the Linpack benchmark
is used as performance measure for ranking the computer systems",
www.top500.org).
But even if that turns out to be true, there are ways around it. You
structure your code differently, e.g. using lookup tables (and possible
interpolation) instead of recalculating sine and cosine over and over again.
My hunch is that code originally written for x86 using transcendental
functions will not do especially well on the PowerPC if you just do a
recompile.


Carsten Hansen





Posted by GreyCloud on March 7th, 2004


Siegfried Gonzi wrote:

Now this is real funny. How come a 2.7Ghz laptop can't play a dvd
without skipping or stalling while trying to go do something else on it,
like check my email, while on a 1Gz G4 I watch the same DVD and check my
email and experience smooth operation?

Maybe you should go back to college and learn about how to optimize code
properly and to use the right tools for the job. After all, IBM did
license the means to AMD to manufacture their new AMD64.


Posted by Woofbert on March 7th, 2004


In article <BC714CDB.593%siegfried.gonzi@stud.uni-graz.at>,
Siegfried Gonzi <siegfried.gonzi@stud.uni-graz.at> wrote:

So in other words, in this thread you reported only on how the $500 IBM
compiler sucks compated to gcc on an Intel box, and if we want to see
the whole picture, we have to do some tedious research. Thank you ever
so much.

--
Woofbert, Chief Rocket Surgeon, Infernosoft
Woofbert's Law on Learning Linux: When attempting to learn Linux,
study it thoroughly before you begin.

Posted by Siegfried Gonzi on March 7th, 2004


in Beitrag pNN2c.166996$hR.3101468@bgtnsc05-news.ops.worldnet.att.net
schrieb Carsten Hansen unter hansen.c@worldnet.att.net am 07.03.2004 23:57
Uhr:

As I said in another post. Your reasoning and explanations are very
valuable.

In the afternoon I found the following in the internet. Some researchers
complaint that under OS 9 their scientific C code is 4 times faster than
when compiled under Mac OSX! Their reasoning than was that this is due to
Mac OSX its "native" math function library or the like.

You are right with your math functions. For example as posted in the coyote
gulch benchmarks thread my profiling of gmon.out cleary shows that most of
the time is spent into cosine and sine functions. The coyote gulch benchmark
is just for that.

However, my Mie code is real code and I use it quite often and it also has a
lot of math functions to perform. It is not just that 5 sec vs 2.5 sec of
the Celeron for one Mie calculation - no problem. However, if I use that Mie
code on a whole data set I have to wait 1 hour or the like on the Celeron.
In the end it would mean I have to wait 2 hours on the ibooks.


Posted by Tim Smith on March 7th, 2004


In article <woofbert.spam-0B56D7.11581007032004@typhoon.sonic.net>, Woofbert wrote:
He did do that comparison. Read the thread again.

--
--Tim Smith

Posted by Tim Smith on March 7th, 2004


In article <jpolaski-9FF960.15390307032004@netnews.comcast.net>, Jim Polaski wrote:
I am unaware of any G4 supercomputer at VT. Can you give a reference?

--
--Tim Smith

Posted by Siegfried Gonzi on March 7th, 2004


in Beitrag yNN2c.747$Wc4.1590@bcandid.telisphere.com schrieb GreyCloud unter
mist@Cumulus.com am 07.03.2004 23:57 Uhr:


Oh man, Oh man. So clever guy. I will now go to bed. But in the meantime
write a short essay on improving code and post it here. A lot of people will
take profit from your suggestions and tutorials.

Your code improvement argument is so ridiculous. Have you ever thought on
the fact that in science there exists such a thing which is called "sharing
of code". Science is not college where you get tiny tasks to perform. You
cannot start over and over again implementing the same code. I pinched the
Mie code from the freely accessible scatter library. The "Bohren Huffman"
code itself is documented.

Mankind would like to move on and there is no time for implementing code
over and over again.

Note: you would be better off if you go back to college. My English is not
that good but my English reading comprehension is okay. If I re-read my
posts I never said any words about that OSX is slow or that it sucks; which
you imply by your introduction. And your dvd does not make my calculations
any faster - do you understand?

In which word are you living guy? You read somewhere a benchmark result of a
new G5 and then you are believing that your own Macintosh renders itslef
automatically into formula one racing car?



Similar Posts