Description of the problem
The OSN3500 device was powered down due to a power outage in the server room, and after the power was restored to the 3500, a series of alarms were reported on the master, crossover, and business boards, resulting in the service being unavailable.
Host version: 5.21.20.55, single master, dual cross, 9-slot cross board is the primary at the time of failure.
Alarm information
HARD_BAD.
Slot 7 EGS2 Parameters: 0X01 0X00 0X06 0XFF 0XFF
Slot 13 EFS0 Parameters: 0X01 0X00 0X06 0XFF 0XFF
18-slot GSCC Parameters: 0X02 0XFF 0XFF 0XFF 0XFF
9-slot SXCSA Parameters: 0X02 0X00 0X04 0XFF 0XFF
CHIP_FAIL.
9-slot SXCSA Parameters: 0X00 0X00 0X00 0X01 0X00
2-slot PQ1
OOL
9-slot SXCSA Parameter: 03 00 01 ff ff
10-slot SXCSA Parameter: 01 00 01 ff ff
Temp_over
9-slot SXCSA parameters: 01 00 01 01 ff
HSC_UNAVAIL
9-slot SXCSA Parameters: 03 01 09 ff ff
Bus_err
10-slot SXCSA Parameter: 0d 01 03 01 ff
Syn_bad
10-slot SXCSA parameters: 08 01 ff ff ff ff
Processing
1. Field test voltage -54V, belongs to normal range.
2, again synchronized check the alarm, AUX does not have any alarm, combined with the site of the normal status of the single board indicator, if the AUX abnormal single board can not start.
3, the network elements reported more alarms, using the command line query single board physical board and logical board status is normal, the site feedback board indicator is also normal, taking into account the business is fully blocked, so the main control and cross board failure is the most likely. By analyzing the HARD_BAD alarm of the single master control, the parameter positioning is the 2-slot PQ1 abnormality, and the master control problem is unlikely. Continuing to analyze, it is found that there are more alarms on the 9-slot (primary) cross-board. Attempts to reverse the network management to reset the cross-board failed.
4, network management feedback 10-slot cross board active to master state, the number of alarms and parameters did not change, network management hard reset 9-slot, the number of alarms and parameters continue to be unchanged.
5, network management query cross-board temperature, command behavior (:cfg-get-bdtemp:9), the temperature is 70 degrees, has exceeded the temperature threshold, so reported temp over normal, the scene to verify that the air conditioning of the room did not work after the power outage, the temperature of the room is high. Therefore, it is suspected that the 9-slot single board is working abnormally, and the temperature is related.
5, it is recommended to pull out 9 slots on site to observe, while coordinating spare parts. On-site feedback after pulling out 9 slots and waiting for a few minutes, all the alarms gradually disappeared, and verified that the business also resumed.
6, in order to prepare for the positioning of the 9-slot cross board abnormal temperature (before the single board continues to report temp over), the single board re-inserted into the 9-slot, observe the business continues to be normal, query the cross temperature is lower than the previous 10 degrees.
Root cause
It is positioned that the high temperature causes the single board to work abnormally after re-powering up and report a series of alarms.
Suggestion and Summary
Positioning cleaning fan dust net, control the temperature and humidity of the computer room.
The temperature and humidity requirements for normal operation of OptiX OSN equipment are: (The measurement points for temperature and humidity are the values measured at 1.5m above the floor and 0.4m in front of the rack when there are no protective panels in front of or behind the rack.)
Long-term operation temperature: 0℃~45℃
Short-term operation temperature (Short-term operation is defined as no more than 96 hours of continuous operation and no more than 15 days per year cumulatively.) :-5℃~55℃
Long-term operating humidity 5%~85
Short-term operating humidity 5%~95
Meanwhile, in order to enhance the reliability of product application, the machine room should be equipped with special precision air conditioning for the machine room to control the temperature and humidity in the following range:
Air conditioning control temperature: 15-30℃.
Air conditioning control humidity: 40%-75%.
Note: Air conditioners are prohibited to be installed above the equipment, air conditioner vents should avoid blowing directly to the equipment, and air conditioners should be installed as far away as possible from the windows to avoid blowing the humidity through the windows to the equipment through the air conditioners.
The related technical information and SDH equipment troubleshooting procedures in this chapter are provided by Shenzhen Optical Transmission Network Technology Co., Ltd (www.opticaltrans.com), please retain! Huawei SDH Optical Transmission Equipment,SDH Transmission Equipment Sales


Chinese
English





