Tech Support > Computer Hardware > Routers > Very Strange Network Problem HELP!!!
Very Strange Network Problem HELP!!!
Posted by Craig N. on July 16th, 2004


I have a very strange problem here, been working on it for 4 months now,
hopefully someone can give me some advice.

Anyways, I have a client with 200 users running Citrix. The network
configuration is:

all main servers are HP DL380's, Dual xeon 3.0's, 1 gig of ram
2 domain controllers - win'2000
4 Citrix servers
1 SQL server
1 Print Server
1 EMC Clarian SAN
1 Lotus
1 HP-UX running BPCS
Switches:
4 Intel Express Hubs 12 Port
2 3COM hubs
2 HP 2524 Switches
2 Cisco Catalyst 2900 Switches
1 Cisco Catalyst 3500 Switch
4 Old SMC Hubs for the phone system

Anyways, the client was running old servers, so they upgraded to HP DL380
servers, and Citrix ran great, then a few days later it went to crap. The
logins were slow, trying to open office documents through office was slow
(but opening through explorer was not), and when typing, it would lag real
bad. (This is Citrix Specific)

I came in to try to fix it, and we tried everything until we finally rebuilt
the domain controllers and Citrix farm, since the problem really seemed like
a server issue. Well, after rebuilding from scratch, we stil had the same
problem, so we tested everything, and it looked good. I then got an idea, it
felt like it might be network traffic.

I took a catalyst 2900 switch that had never been connected to the network,
and ran the servers to it, and 6 thin clients. We turned it on, and the
thing cruised. It rocked, logins were instant, it was the first time it
worked in 4 months. So, I figured I would track down the problem, I started
plugging in switches one at a time until the problem surfaced. I got all the
Intel hubs connected, nothing, then I connected the 3coms, still nothing,
and then the HP switches, so far so good.

By now, its 1:00 AM, and I decide to go home, I quickly connect in the rest
of the switches at once, and BOOM, the problem shows up, so I figure I have
it down to only a few switches, lets go home and try tomorrow. I move the
network cables over to the old servers, then I Connect this switch into the
network, and leave. (It was easier, than moving cables everywhere, since the
switch room is seperate from the server room, and I was tired)

Next Day, I'm excited, so I decide to show it off. WELL, I move my network
cables to the old servers, and unplug the switch from the rest of the
network. The problem is back!!! I reboot the switch, I reboot the servers,
nothing. I even swap network cables. This is the exact same configuration as
last night when it was running great. I dont get it. I guess maybe something
from one cisco switch has infected this one, who knows.

I do know that the HP switches didnt cause a problem, so I empty them out,
and plug in only my servers and thin clients, nothing else, completely
seperate from the network. STILL THE SAME PROBLEM!!

This is weird, and I need to fix it. Its like whenever you add a switch, it
acquires this problem, but a brand new switch runs great until it is
connected to the network for any length of time. It ran great on the other
switch for a good 5 hours, so I know its not the servers, gotta be the
switch. Does anyone have any ideas?

I am resetting my catslyst to factory default then going to try it. Someone
please help, I need this up by monday.


Posted by PES on July 16th, 2004


I would make absolutely sure that there were no smb based worms like blaster
on the network. I would then make sure that there are no duplex mismatches.
If a pc/server card is set to auto duplex, make sure the corresponding
switch port is auto. If duplex is forced full on either end (but not both)
you will have a duplex mismatch. At that point, I would use a packet
sniffer to troubleshoot the problem. If you've not done quite a bit of
this, it would be worth your money to hire someone who looks at packets on a
daily (or at least weekly basis).

"Craig N." <cnash@xxplego.com> wrote in message
news:aRJJc.262249$Gx4.6157@bgtnsc04-news.ops.worldnet.att.net...


Posted by Craig N. on July 16th, 2004


Thanks, I ran a sniffer, and I had 0 netowkr connectivity during the system
hang. Virus scans show no virus activity, but I will keep it in mind.


"PES" <NO*SPAMpestewartREMOVE*THIS@adelphia.netSPAM*SUCK S> wrote in message
news:40f7c755$1_1@news.iglou.com...


Posted by Craig N. on July 16th, 2004


Here are a few things I caught using ethereal, if anyone can tell me what it
means. Everything looks like normal traffic, except from one PC, and the
citrix boxes.I can export the file to text and e-mail it if anyone wants
some more detail, just email me at cnash@plego.com.

This is only half of it, but you get the idea, out of hundreds of pc's amd
about 15 random servers, these are the only ones doing this particulaar
thing.

Source Destination
Info

Colleen-pc.company.int 192,168.102.6 TCP
3480 > 5321 [PSH, ACK] Seq=1 Ack=0 Win=54512 Len=120.
Colleen-pc.company.int 192,168.102.6 TCP
3480 > 5321 [ACK] Seq=1 Ack=0 Win=54512 Len=120
Colleen-pc.company.int 192,168.102.6 TCP
3480 > 5321 [SYN] Seq=0 Ack=0 Win=54512 Len=120 MSS=1460
Colleen-pc.company.int 192.168.102.14 TCP
3479 > 1352 [ACK] Seq=1 Ack=0 Win=64512 Len=0
Colleen-pc.company.int 192.168.102.14 TCP
3479 > 1352 [SYN] Seq=0 Ack=0 Win=64512 Len=0 MSS=1460
---------------
Then on Citrix, I have a bunch of these, on all the servers:

Cxp03.company.int 192.168.102.150 TCP
1494 > 1041 [ACK] Seq=0 Ack=0 Win=63412 Len=0
-----------------------------------
Along with a LOT of these:

Cxp03.company.int 192.168.102.150 TCP
[TCP Previous segment lost] 1494 > 1041 [PSH, ACK] Seq=121622 Ack=4049
Win=63783 Len=1459

Cxp03.company.int 192.168.102.150 TCP
[TCP Previous segment lost] 1494 > 1041 [PSH, ACK] Seq=2045131 Ack=22451
Win=63783 Len=1459


Posted by PES on July 16th, 2004



"Craig N." <cnash@xxplego.com> wrote in message
news:GxYJc.102237$OB3.40305@bgtnsc05-news.ops.worldnet.att.net...

May be Firehotcker (a trojan). I'm not familiar with it.

May be Lotus Notes

Normal for an established citrix session


Why art the tcp segments lost? Could you not have the same sniffing
capacity as the server has connectivity? Or could the network be dropping
packets? Also, I'm not sure, but a length of 1459 seems large for citrix.
Is there a router between cxp03.company.int and 192.168.102.150? If so, is
ip unreachables enabled? How about duplex settings on the the switchport
that cxp03 and 192.168.102.150 is on. Input or output errors on the switch
ports?



Posted by Craig N. on July 17th, 2004


Problem seems to be fixed. This is very very strange. It is either the HP
Lights out board, or, it is the Kernel that HP installs if you use the HP
install disks. We disabled the board, then reinstalled the server with an
actual Win2k disk, and I cant recreate the problem. Of course, I had it
working once before and well that went to crap.

Anyways, I still dont get why it was working great the other day on that one
single switch, and bad when I plugged in the others. BUT, it works now
plugged into all switches.

Thanks for your help.


"Craig N." <cnash@xxplego.com> wrote in message
news:EUWJc.265975$Gx4.14319@bgtnsc04-news.ops.worldnet.att.net...



Similar Posts