rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#7823 — FS#11706 — pcc-27-n5

Attached to Project— Dedicated Cloud
Incident
Backend / Core
CLOSED
100%
Several FEX are down on this switch. As the pcc-26 is still configuring, certain hosts are down.
Date:  Monday, 29 September 2014, 11:10AM
Reason for closing:  Done
Comment by OVH - Monday, 29 September 2014, 11:01AM

4 FEX out of 13 are down on the pcc-27-n5 following the peak load of a process.

As the situation can not be remedied at this level, we've forced the pcc-27 to reload to remount the FEX. All the FEX are now up and the switch is running the configuration from 16:36. We will redo the changes from that.

The network is stable again. The team will work on remounting the hosts.


Comment by OVH - Monday, 29 September 2014, 11:02AM

There's no longer an issue with the switch. The configuration is now normalised.


Comment by OVH - Monday, 29 September 2014, 11:10AM

More details on this afternoon's downtime (approx. 18:30 Paris time):

Following hardware issues (fans) on the pcc-26 this morning, we replaced it with the spare and the service was maintained by only the pcc-27. Synchronisation of the configuration took a few hours, which is normal. However, one of the resync scripts seemed to have caused a CPU load peak on the pcc-27 (process ethpm). The consequence is that the pcc-27 ended up losing connection with the FEXs. At that time, around 18:15, we had an isolated, reconfiguring pcc-26 and a pcc-27 cut of from the FEX. The two hosts connected to this pair were cut off - this caused downtime until the pcc-27 came back after a forced reboot around 19:00. Only from this time did the hosts begin to remount.

We are currently finishing to get the pcc-26 back up so that this pair is completely redundant.