rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#1667 — FS#5651 — rbx-31

Attached to Project— Network
Incident
Whole Network
CLOSED
100%
There is a problem on the chassis. The card seems defected and blocks the chassis.

Date:  Thursday, 04 August 2011, 21:35PM
Reason for closing:  Done
Comment by OVH - Tuesday, 02 August 2011, 22:47PM

The card #2 is out. Other cards are no longer been detected.
We are rebooting in Hardware.


Comment by OVH - Tuesday, 02 August 2011, 22:48PM

Uptime is 1051 days, 15 hours, 22 minutes


Comment by OVH - Tuesday, 02 August 2011, 22:48PM

We are preparing meanwhile a spare of the card #2.
At least one card is defected.


Comment by OVH - Tuesday, 02 August 2011, 22:49PM

The boot is performing.

We are taking out #1. putting in #2.


Comment by OVH - Tuesday, 02 August 2011, 22:52PM

#2 is dead.

we are putting in #1. We are removing other cards. We are trying to boot already #1 and check whether it works.


Comment by OVH - Tuesday, 02 August 2011, 22:53PM

The fact to remove #4 it blocked the boot. So we guess it is at the origin of the problem.


Comment by OVH - Tuesday, 02 August 2011, 22:54PM

#1 continuing to boot.

#2 is booting too.


Comment by OVH - Tuesday, 02 August 2011, 22:54PM

EOBC channel fail on #2


Comment by OVH - Tuesday, 02 August 2011, 22:55PM

We have grabbed a spare chassis and we have replaced it.


Comment by OVH - Tuesday, 02 August 2011, 22:57PM

The chassis + 2 sup are grilled. We have replaced them all and we had to entirely reset the router. The service is up on #1. we are finishing with #2.

sportive ...


Comment by OVH - Wednesday, 03 August 2011, 13:41PM

m2 is set.


Comment by OVH - Wednesday, 03 August 2011, 13:51PM

We need to find the problem's origin. We are testing different cards in the chassis.
http://yfrog.com/kl9ambvj

then, we will test cards of the router in another chassis.
http://yfrog.com/kiwgtqrj

We put a new card in #2, the card is not switched on. we are changing the slot's power in the chassis, it is switching on:
It's aright it's the chassis. there we go we will change it, we taking out the card of the chassis, we remove the chassis of the rack from the back,
we take the cards then we reinsert the chassis from the back then we reinsert the cards.
http://yfrog.com/kepqdsyj

It's all green, it is working. nothing but to drop the backup setting.
http://yfrog.com/gzk7nftsj

it will go to bin with 2 grilled cards
http://yfrog.com/kexkduxj


Comment by OVH - Wednesday, 03 August 2011, 13:51PM

It is going to be alright tonight :)


Comment by OVH - Wednesday, 03 August 2011, 13:52PM

Card #1 just been crashed again


Comment by OVH - Wednesday, 03 August 2011, 13:52PM

Aug 3 13:15:38 rbx-31-c1.routers.ovh.net 2011 Aug 03 11:15:17 %SYS-4-SUPERVISOR_ERR:Forwarding engine IP checksum error counter = 6
Aug 3 13:15:35 rbx-31-c1.routers.ovh.net 2011 Aug 03 11:15:14 %SYS-5-MOD_OK:Module 16(WS-F6K-MSFC,SAD040604MY) is online
Aug 3 13:15:34 rbx-31-c1.routers.ovh.net 2011 Aug 03 11:15:13 %SYS-3-MOD_PORTINTFINSYNC:Port Interface in sync for Module 2
Aug 3 13:15:34 rbx-31-c1.routers.ovh.net 2011 Aug 03 11:15:12 %SYS-5-MOD_OK:Module 5(WS-X6408A-GBIC,SAD05030JDD) is online
Aug 3 13:15:32 rbx-31-m2.routers.ovh.net 58: Aug 3 13:15:13 GMT: %SCP-5-ONLINE: Module online (supervisor switchover)


Comment by OVH - Wednesday, 03 August 2011, 13:54PM

Card #2 took relay while rebooting #1. Another card other than the sups is probably at the origin of the encountered problems since yesterday night. We are going to replace card #5.


Comment by OVH - Wednesday, 03 August 2011, 15:13PM

A new crash. We are restarting the chassis in cold. Cards 3 and 4 are still unchanged.


Comment by OVH - Wednesday, 03 August 2011, 15:13PM

card1:
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03
PC = 0xbfc0a6f4, Cause = 0x4c00, Status Reg = 0x2441fc03


Comment by OVH - Wednesday, 03 August 2011, 15:14PM

card1:
*** Bus Timeout NMI ***
PC = 0x80b808c8, SP = 0x87fff110 frame = 0xa0005ea8

*** Unknown External Interrupt ***
Stacked Cause = 0x800, Stacked Status Reg = 0x2441fc03
Current Cause IP[7..0] = 0x8, Current SREG IP[7..0] = 0xfc


Comment by OVH - Wednesday, 03 August 2011, 15:15PM

Card #1 is not restarting:

Local Test Mode encounters Minor hardware problem in Module # 1
Supervisor module 1 encontered CRITICAL failure: 0x1e - EARL_FAILURE L3_FAILURE RWENGINE_FAILURE L2_FAILURE
Failed Module Bringup Process
Use 'show test 1' to see results of tests.
Use 'reset 1' to reset the module.

we are trying to restart the chassis without cards 3 and 4 which are the last common elements to the previous setting.


Comment by OVH - Wednesday, 03 August 2011, 15:19PM

we are restarting on a new card in slot1. One sup. We re-descend again the setting at the backup.


Comment by OVH - Wednesday, 03 August 2011, 15:35PM

We are turning actually on one sup in slot1. Apparently at least one of the spare cards inserted yesterday has been defected. We retest all the cards in lab and we are planning an intervention tonight to insert a new card sup in slot2. We are changing eventually in a preventive title the cards 3 and 4.


Comment by OVH - Thursday, 04 August 2011, 10:47AM

We insert a new card in the slot #2


Comment by OVH - Thursday, 04 August 2011, 10:52AM

The intervention did not pass properly because of the problem of versions on the new card. We were forced to restart the chassis on a new cold reboot. The chassis is in a stable condition, but we suspect another card to be the source of the problem. So, as a precaution we replace card 3.