rssLink RSS for all categories
 
icon_red
icon_green
icon_red
icon_red
icon_blue
icon_green
icon_green
icon_red
icon_red
icon_red
icon_orange
icon_green
icon_green
icon_green
icon_green
icon_blue
icon_green
icon_orange
icon_red
icon_green
icon_red
icon_red
icon_green
icon_red
icon_red
icon_red
icon_red
icon_orange
icon_green
 

FS#1608 — FS#5592 — rbx-g1/g2

Attached to Project— Network
Incident
Whole Network
CLOSED
100%
We have a problem on the ASR9000

Jul 6 12:58:05 rbx-g1-a9.fr.eu 5919: LC/0/0/CPU0:Jul 6 10:57:46 UTC: fib_mgr[161]: %ROUTING-FIB-4-RSRC_LOW : CEF running low on DATA_TYPE_TABLE_SET resource memory. CEF will nowbegin resource constrained forwarding. Only route deletes will behandled in this state, which may result in mismatch between RIB/CEF.Traffic loss on certain prefixes can be expected. The CEF will automatically resume normal operation, once the resource utilizationreturns to normal level
Jul 6 12:57:42 rbx-g2-a9.fr.eu 15654: LC/0/3/CPU0:Jul 6 10:57:23 UTC: fib_mgr[161]: %PLATFORM-PLAT_FIB-6-INFO : PD FIB object LEAF OOR state changed to GREEN
Jul 6 12:57:42 rbx-g2-a9.fr.eu 15655: LC/0/3/CPU0:Jul 6 10:57:23 UTC: fib_mgr[161]: %ROUTING-FIB-6-RSRC_OK : CEF resource state has returned to normal. CEF hasexited resource constrained operation and normal forwarding has been restored

Date:  Thursday, 07 July 2011, 14:51PM
Reason for closing:  Done
Comment by OVH - Wednesday, 06 July 2011, 16:41PM

The problem resembles to this one
http://status.ovh.net/?do=details&id=752
but not quite the same.


Comment by OVH - Wednesday, 06 July 2011, 16:42PM

We have added the next-hop-self on IPv6.

The same thing.

We have just opened a TAC at Cisco


Comment by OVH - Wednesday, 06 July 2011, 16:43PM

RP/0/RSP1/CPU0:rbx-g2-a9# show bgp nexthops statistics
Wed Jul 6 12:34:19.284 UTC
Total Nexthop Processing
Time Spent: 871.632 secs

Maximum Nexthop Processing
Received: 6w3d
Bestpaths Deleted: 0
Bestpaths Changed: 144079
Time Spent: 2.918 secs

Last Notification Processing
Received: 1d14h
Time Spent: 0.021 secs

Gateway Address Family: IPv4 Unicast
Table ID: 0xe0000000
Nexthop Count: 147
Critical Trigger Delay: 3000msec
Non-critical Trigger Delay: 10000msec

Nexthop Version: 1, RIB version: 1

Total Critical Notifications Received: 119
Total Non-critical Notifications Received: 11570
Bestpaths Deleted After Last Walk: 0
Bestpaths Changed After Last Walk: 1961
Nexthop register:
Sync calls: 426747, last sync call: 00:15:14
Async calls: 1697, last async call: 14w6d
Nexthop unregister:
Async calls: 426603, last async call: 00:14:38
Nexthop batch finish:
Calls: 947770, last finish call: 00:14:37
Nexthop flush timer:
Times started: 853358, last time flush timer started: 00:14:38
RIB update: 0 rib update runs, last update: 00:00:00
0 prefixes installed, 0 modified, 0 removed

RP/0/RSP1/CPU0:rbx-g2-a9#show controller np struct 6 summary location 0/0/cpu0
Wed Jul 6 12:34:29.161 UTC

Node: 0/0/CPU0:
----------------------------------------------------------------
NP: 0 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 1 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 2 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 3 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 4 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 5 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 6 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0

NP: 7 Struct 6: R_LDI
1685 of 65536 entries in use (1685 reserved)
Buddy allocator information:
Block Size : 1 2 4 8 16 32
Free Blocks: 288 57 8 1 1 1981
Used Blocks: 1673 0 3 0 0 0


Comment by OVH - Wednesday, 06 July 2011, 16:44PM

RP/0/RSP1/CPU0:rbx-g2-a9#sh cef resource detail location 0/0/cpu0
Wed Jul 6 12:35:19.098 UTC
CEF resource availability summary state: YELLOW
CEF will drop route updates
No. of times HW caused oor: 26
CEF entered oor at : Jul 6 12:30:33.573
CEF came out of oor at : Jul 6 12:29:48.370
ipv4 shared memory resource:
CurrMode GREEN, CurrAvail 866398208 bytes, MaxAvail 984129536 bytes
ipv6 shared memory resource:
CurrMode GREEN, CurrAvail 866398208 bytes, MaxAvail 984129536 bytes
mpls shared memory resource:
CurrMode GREEN, CurrAvail 866398208 bytes, MaxAvail 984129536 bytes
common shared memory resource:
CurrMode GREEN, CurrAvail 866398208 bytes, MaxAvail 984129536 bytes
DATA_TYPE_TABLE_SET hardware resource: YELLOW
DATA_TYPE_TABLE hardware resource: YELLOW
DATA_TYPE_IDB hardware resource: YELLOW
DATA_TYPE_IDB_EXT hardware resource: YELLOW
DATA_TYPE_LEAF hardware resource: YELLOW
DATA_TYPE_LOADINFO hardware resource: YELLOW
DATA_TYPE_PATH_LIST hardware resource: YELLOW
DATA_TYPE_NHINFO hardware resource: YELLOW
DATA_TYPE_LABEL_INFO hardware resource: YELLOW
DATA_TYPE_FRR_NHINFO hardware resource: YELLOW
DATA_TYPE_ECD hardware resource: YELLOW
DATA_TYPE_RECURSIVE_NH hardware resource: YELLOW
DATA_TYPE_TUNNEL_ENDPOINT hardware resource: YELLOW
DATA_TYPE_LOCAL_TUNNEL_INTF hardware resource: YELLOW
DATA_TYPE_ECD_TRACKER hardware resource: YELLOW
DATA_TYPE_ECD_V2 hardware resource: YELLOW
DATA_TYPE_ATTRIBUTE hardware resource: YELLOW
DATA_TYPE_LSPA hardware resource: YELLOW
DATA_TYPE_LDI_LW hardware resource: YELLOW
DATA_TYPE_LDSH_ARRAY hardware resource: YELLOW
DATA_TYPE_TE_TUN_INFO hardware resource: YELLOW
DATA_TYPE_DUMMY hardware resource: YELLOW
DATA_TYPE_IDB_VRF_LCL_CEF hardware resource: YELLOW
DATA_TYPE_TABLE_UNRESOLVED hardware resource: YELLOW
DATA_TYPE_MOL hardware resource: YELLOW
DATA_TYPE_MPI hardware resource: YELLOW
DATA_TYPE_SUBS_INFO hardware resource: YELLOW
DATA_TYPE_GRE_TUNNEL_INFO hardware resource: YELLOW
RP/0/RSP1/CPU0:rbx-g2-a9#


Comment by OVH - Thursday, 07 July 2011, 00:48AM

The registration of the new IPs is not done.

We are in contact with TAC CISCO in order to fix the problem.


Comment by OVH - Thursday, 07 July 2011, 00:50AM

It is turning in a loop for the new IPs in the network.
We are waiting for CISCO.

6 th2-1-6k.fr.eu (213.186.32.181) 55.409 ms * 50.620 ms
7 th1-1-6k.fr.eu (213.186.32.165) 58.132 ms * 50.333 ms
8 rbx-g2-a9.fr.eu (91.121.131.141) 55.075 ms 53.812 ms 54.613 ms
9 gsw-2-6k.fr.eu (91.121.131.214) 77.756 ms * *
10 rbx-g1-a9.fr.eu (91.121.131.33) 57.627 ms 57.028 ms 57.390 ms
11 gsw-2-6k.fr.eu (91.121.131.38) 263.777 ms
gsw-2-6k.fr.eu (91.121.131.34) 205.179 ms
gsw-2-6k.fr.eu (213.251.128.106) 209.499 ms
12 rbx-g1-a9.fr.eu (91.121.131.33) 62.124 ms 59.690 ms 62.422 ms
13 gsw-2-6k.fr.eu (91.121.131.38) 62.392 ms *
gsw-2-6k.fr.eu (213.251.128.106) 61.387 ms
14 rbx-g1-a9.fr.eu (91.121.131.33) 65.804 ms 65.402 ms 65.773 ms
15 gsw-2-6k.fr.eu (91.121.131.38) 65.205 ms *
gsw-2-6k.fr.eu (213.251.128.106) 64.206 ms
16 rbx-g1-a9.fr.eu (91.121.131.33) 69.591 ms 67.366 ms 68.669 ms
17 * * gsw-2-6k.fr.eu (213.251.128.106) 220.553 ms
18 rbx-g1-a9.fr.eu (91.121.131.33) 71.096 ms 73.312 ms 71.266 ms
19 gsw-2-6k.fr.eu (91.121.131.38) 70.817 ms
gsw-2-6k.fr.eu (91.121.131.34) 70.360 ms
gsw-2-6k.fr.eu (213.251.128.106) 71.530 ms


Comment by OVH - Thursday, 07 July 2011, 00:50AM

RP/0/RSP1/CPU0:rbx-g2-a9(admin-config)#hw-module profile scale l3xl
Wed Jul 6 18:50:16.520 UTC
In order to activate this new memory resource profile, you must manually reboot the system.

We have to restart the router.


Comment by OVH - Thursday, 07 July 2011, 00:51AM

All routage is going through g1 currently.
We are ready for g2.


Comment by OVH - Thursday, 07 July 2011, 00:51AM

RP/0/RSP1/CPU0:rbx-g2-a9(admin)#reload location all
Wed Jul 6 18:58:42.597 UTC

Preparing system for backup. This may take a few minutes especially for large configurations.
Status report: node0_RSP1_CPU0: START TO BACKUP
Status report: node0_RSP1_CPU0: BACKUP HAS COMPLETED SUCCESSFULLY
[Done]
Proceed with reload? [confirm]RP/0/RSP1/CPU0::This node received reload command. Reloading in 5 secs


Comment by OVH - Thursday, 07 July 2011, 00:52AM

g2 is UP.

We are checking it.


Comment by OVH - Thursday, 07 July 2011, 00:53AM

g2 is OK.
We set it in the routage,is is on the loop.


Comment by OVH - Thursday, 07 July 2011, 00:54AM

We will set g1 off the routage.


Comment by OVH - Thursday, 07 July 2011, 00:55AM

g1 is off the loop, all is rooted on g2.
We are ready to restart.


Comment by OVH - Thursday, 07 July 2011, 00:56AM

RP/0/RSP0/CPU0:rbx-g1-a9(admin)#reload location all
Wed Jul 6 19:13:11.504 UTC

Preparing system for backup. This may take a few minutes especially for large configurations.
Status report: node0_RSP0_CPU0: START TO BACKUP
Status report: node0_RSP0_CPU0: BACKUP HAS COMPLETED SUCCESSFULLY
[Done]
Proceed with reload? [confirm]RP/0/RSP0/CPU0::This node received reload command. Reloading in 5 secs

Restarting in process.


Comment by OVH - Thursday, 07 July 2011, 00:56AM

g1 is up.
We will check it now.


Comment by OVH - Thursday, 07 July 2011, 00:57AM

The card 0/4 died.


Comment by OVH - Thursday, 07 July 2011, 01:01AM

We started replacing the card with Cisco via hardware support T+2H,this means that Cisco will give us the card which is down in less than 2 hours in case of hardware problem on one of the elements of the router .

We checked the ports down and we don't expect an impact on traffic even without the card. All ports are lined and it should not saturate.

We just set the router in routing.

Now we will check saturation of the links.


Comment by OVH - Thursday, 07 July 2011, 01:07AM

Cisco asked us to restart the card to see if it is definately dead.

RP/0/RSP0/CPU0:rbx-g1-a9(admin)#reload location 0/4/CPU0
Wed Jul 6 19:37:06.607 UTC

Preparing system for backup. This may take a few minutes especially for large configurations.
[Done]
Proceed with reload? [confirm]


Comment by OVH - Thursday, 07 July 2011, 01:10AM

Traffic was reloaded,everything is going right .

The inital problem is fixed.

Now we need to replace the card. The RMA is in progress.


Comment by OVH - Thursday, 07 July 2011, 01:15AM

Well ,this is all: Cisco bases are not updated with the contract recently signed,we will not have the card within 2hours.


Comment by OVH - Thursday, 07 July 2011, 01:19AM

Apparently the card is not in the bases.
It's probably because we've already had two broken cards and following the previews RMA it was not updated.


Comment by OVH - Thursday, 07 July 2011, 01:22AM

http://status.ovh.co.uk/?do=details&id=1154

[...]
We will replace the card #6 of g1 by the card #4 of g2 on which we have ports not used or little traffic.
[...]

That's why it does not stick with Cisco bases.


Comment by OVH - Thursday, 07 July 2011, 14:50PM

We have received the spare card of Cisco at 4H00 am.
http://yfrog.com/z/kejb0uj

The old card is still in the router.
First of all, we disconnect the optical fibres.
http://yfrog.com/z/kg4rknnj

It is done, the card is ready to get out.
http://yfrog.com/z/kl2d5jj

Ready to go ? Go ... The card is out
http://yfrog.com/z/kl1aslhj

We verify the logs and everything is OK
http://yfrog.com/z/kj1kfij

We put down the old card and unpack the new one
http://yfrog.com/z/kh47sqj

The card is ready to be inserted
http://yfrog.com/z/kiz82vtj

The card is inserted and it boots
http://yfrog.com/z/kjh2dvj

We verify the logs: the boot goes well
http://yfrog.com/z/kl42ttj

We re-connect the optical fibres.
http://yfrog.com/z/h7iialhxj

We verify the logs: everything is OK
http://yfrog.com/z/khd74nj

We verify the weathermap and the traffic movement
to Paris and Frankfurt: everything
is OK
http://weathermap.ovh.net/backbone

The old card is re-packed and will be sent to
Cisco.

We thank the Cisco team for the follow up of this
night. The internal bug was fixed at 1h am.