Description of the problem
There are 5 OSN7500 devices on the same ring of a bureau GSCC reporting chip_abn alarm, alarm parameter: 0x01 0x00 0x01 0x01 0xff; host software: 5.21.16.13. Each time it lasts for more than 10 seconds to less than 2 minutes, with an interval of more than 10 minutes. After replacing the main control (the same version as the original main control board), the newly replaced main control board generates the chip_abn alarm again.
Processing
1, after statistics reported that the alarm of the network elements of the main control board using the temperature chip DS18S20 for the same batch, so the replacement of one of the network elements of the main control board (version of the same, but not the same batch production), but the new replacement of the main control board soon generated chip_abn alarm, to rule out the batch of the temperature chip failure factors
2, through the reported chip_abn alarm network element master control of the operating temperature collection, found that the operating temperature were 16.5, 19, 28.5, 30, 28 and check the site environment did not find unusual factors, so exclude environmental factors
3, due to the use of version 5.21.16.13 for the mainstream shipping version, and other places have not reported this alarm, only the Bureau of the reported alarm, so exclude the software version of the problem
4, replaced the single board in the laboratory has not reproduced the failure, and finally through the R & D analysis, contact the Bureau of these sites at the same time the ECC Qufull packet loss phenomenon, and ECC packet loss is due to the large amount of data caused by the DCC communication, and the CPU's ECC processing is certainly running at full load, ECC data processing takes up a lot of CPU resources. R & D in the laboratory with SmartBits to the network management of the Ethernet port for the impact of large amounts of data, simulating the DCC large data volume communications, the CPU full-load processing of communications data, and then verify that the phenomenon is reproduced.
5, therefore, positioned as a subnet is too large, resulting in a large flow of ECC communications and additional overhead, preempting the temperature inspection of such low-priority tasks, resulting in software simulation of the temperature chip DS18S20 timing is not allowed to detect the wrong temperature value, resulting in chip_abn alarms reported!
6, divided into ecc subnet, observed that the alarm disappeared, the fault is solved!
Root cause
ECC subnet is too large, causing multiple OSN7500 network element masters to report chip_abn alarms.
Solution
Re-divide the ecc subnet, it is recommended not to exceed 64
The related technical information in this chapter and the troubleshooting process of SDH equipment are collected and organized by Shenzhen Optical Transmission Network Technology Company Limited ( www.opticaltrans.com), please retain! Our company specializes in Huawei SDH optical transmission equipment, SDH transmission equipment sales.


Chinese
English





