Problem Description
The EGS2 board in slot 30 of an OSN7500 device reports BD_STATUS and COMMUN_FAIL alarms at the same time at about half-hour intervals, the board resets, and the service is transiently disconnected.
Alarm information
BD_STATUS and COMMUN_FAIL alarms are reported on the EGS2 board at intervals of about half an hour.
Processing
1, the single board reported BD_STATUS alarm, suspected single board failure, replace the single board about half an hour after the fault again.
2、Suspecting that it is a slot problem, replace the slot of the single board, but the fault still exists after the replacement.
3, suspected of Ethernet access services there is a loopback or abnormal, collect data for analysis, through the command line : mon-show-cpu: 30 command return value can be seen in the return value of the single-board CPU occupancy is very high, in which the tRstpBpdu task CPU occupancy rate of up to 77%, while the single-board idle task VIDL CPU occupancy rate of 0%, resulting in the single-board clear dog task can not be executed! ,致使单板发生软狗复位,上报 COMMON_FAIL 告警,同时造成业务中断。
TASK-NAME SWITCH-COUNT MIN-TIME MAX-TIME RECENT-TIME TOTAL-TIME(us) PERCENT
tRstpBpdu 1181 74 10178 7556 10439226 77.82 percent
vidl 0 0 0 0 0 0 0 0.00%
4. Categorize the Ethernet input port services, and after the services with Ethernet loops or protocol messages are cut to other single board ports, the alarms are no longer reported and the problem is solved. After upgrading the software of EGS2 board to 5.53, the problem is completely solved.
Root cause
1. Single board failure.
2, slot failure.
3. There is an abnormality in the access service.
Recommendations and Summary
After the EGS2 board receives a large number of Spanning Tree Protocol messages sent from the outside world, it will cause the CPU occupancy rate of the single board to be too high, and the single board will undergo a soft-dog reset, and there are generally two situations that can cause this failure:
1. The external network sends a large number of Spanning Tree Protocol messages to the single board.
2, the external network to the single board to send a small number of spanning tree protocol messages, but the entire network there is a physical loop.
EGS2 board in version 5.53 and later versions of the protocol message speed limit operation, you can fundamentally solve the problem.


Chinese
English





