Tech Support > Computer Hardware > Microprocessors > VME Auto system controller ID issue
VME Auto system controller ID issue
Posted by Bo on January 8th, 2007


We are using Motorola MVME5100 boards and VxWorks 6.3. We modified the
delivered BSP to allow us to have shared memory windows across all processor
cards. Now, our problem appears to be that the auto syscon feature of the
board is not working properly. That is to say, if any controller, not in
slot 0, is present and has auto-config jumper set to AUTO., then we see the
following behavior:

syscontroller card0 can read/write into slave card N with no problems. Then,
slave N can read/write into syscon0 shared memory. So far so good. Now,
after a slave access of syscon slot 0 shared memory, any accesses across the
VME bus, result in a hang. ie the sys controller in slot 0 can no longer
read/write to slot N.

If the jumper on all cards (other than slot0) are set to NO SYSCON, then all
accesses across the VME bus appear to function properly and there are no
hangs. This indicates a hardware issue to us, but we are not 100% certain.

We verified the same behavior across multiple 5100 cards (with various RAM
amounts)--get the same result. We also verified the same behavior with slot
0 being a MVME6100 card and slot 1 being a MVME5100 card.

So... the questions are:

1) is this a known HW issue with MVME5100 cards?
2) if not, is there any possibility that the VxWorks BSP could cause the
behavior?
3) can we conclude that both/all boards think they are system controller?
4) is there a SW fix to make the auto-config jumper work as intended?


Thanks,

Bo


Posted by William Dennen on January 12th, 2007


On Mon, 08 Jan 2007 10:57:04 -0600, Bo opined:


I rather much wonder if this isn't some issue with the RMW mechanism being
used within VxWorks resulting in a lock on the local bus. Such nasty
behavior isn't seen if the transfers are not into shared memory spaces.
My recollection on the auto-config jumper is that it's sensed by the
Universe at initialization to determine if it needs to provide bus
arbitration and isn't used afterwards. That this problem comes and goes
depending whether auto-configuration or not is selected does suggest
otherwise; but I doubt the root cause is the jumper setting.

Regards
--
Cluelessness: There are no stupid questions,
but there are a LOT of inquisitive idiots.
(despair.com)

Posted by Bo on January 15th, 2007



"William Dennen" <wdennen@gmail.com> wrote in message
news:eo901m$vbs$1@aioe.org...
Good point Bill. Do you know how I can test/change VxWorks to confirm it is
or isn't a RMW issue?


This is what I thought as well. I do recall at a previous employer we had
similar issues with the same Tundra chip---and the workaround was extra crap
that the BSP had to perform during initialization.... but I really don't
want to go that route again if avoidable.

Thanks,

Bo



Posted by William Dennen on January 16th, 2007


On Mon, 15 Jan 2007 10:54:28 -0600, Bo queried:

I wish I did; still don't have a good handle on how shared memory is
_really_ implemented in spite of mucking with it on and off for a number
of years. The point is that if the memory spaces weren't shared the
transactions would succeed, otherwise Tundra wouldn't be able to sell chip
one. I suspect your implementation is drawing out a latent defect in the
implementation of shared memory; I'm aware of another who encountered a
similar hang using a more standard configuration (but totally weird in
other ways). That too is unresolved as far as I know.

Regards
--
Cluelessness: There are no stupid questions, but there are a LOT of inquisitive idiots.
(despair.com)

Posted by CBFalconer on January 17th, 2007


William Dennen wrote:
I have no idea whether this is applicable to the OP's problem, but
in general memory is shared as long as it is not written. If a
process wants to write in it, the page table for that process is
modified to remap that portion, a copy of the original made, and
the write then proceeds. That portion of the memory is then no
longer shared.

If the memory is truly shared, so that one processes writes show up
in other processes memory space, then various synchronization
protocols must be used. This can involves semaphores, monitors,
critical sections, etc.

Threads are generally lightweight processeses, using memory shared
with other threads in the same process, and will need the
synchronization primitives to access it.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>



Posted by William Dennen on January 19th, 2007


Bo
I've an idea, if you've got sufficient hardware, that may shed some light
on where the problem is. You need 4 boards, 2 5100s and 2 anything VME.
Call the 5100s A & B, the others C & D. Set up C & D so they can
read/write each other's memory and also either A or B. NO shared memory
configured for these two. You've got A & B already set up. Create the
hang condition and then:
(1) can C & D still read/write each other?
(2) can either C or D read/write to either A or B?
(3) can either A or B read/write to either C or D?

The essence of what you're trying to determine is whether the system
controller function is hosed or not. IF C & D can still read/write then
it is not and the hang condition is local to A/B.

Would I be correct in assuming that you've left the BSP configured for
a hardware TAS? (I can't remember the #define precisely, but if you mucked
with it, you know the one I mean).

Regards
--
Cluelessness: There are no stupid questions, but there are a LOT of inquisitive idiots.
(despair.com)

Posted by Bo on January 23rd, 2007



"William Dennen" <wdennen@gmail.com> wrote in message
news:eopafk$s74$1@aioe.org...
1) C&D cannot read/write.
2) no
3) no

ie it 'appears' to be an honest-to-God hardware lock-up--from which only a
power cycle will recover. Scary, huh?

I don't think that TAS has been changed---at least not by me.

Thanks for the suggestions and help,

Bo



Posted by William Dennen on January 25th, 2007


On Tue, 23 Jan 2007 12:46:58 -0600, Bo opined:

Indeed it's scary and smells of an errata, it appears that the system
controller has left the scene. I would recommend getting Tundra to look
at the issue. I'm sure they'll want a dump of the Universe and a trace if
you've got the capability. You can initiate the dialog at
http://www.tundra.com/support.aspx?bid=481&id=962. Hopefully they can
simulate the sequence ...

Regards
--
Cluelessness: There are no stupid questions, but there are a LOT of inquisitive idiots.
(despair.com)