Problem Description
On a certain day, the SSN2EGS2 board of an OSN3500 network element reports "COMMUN_FAIL" alarm and service interruption. After checking, the SSN2EGS2 board was reset abnormally, which caused the alarm "COMMUN_FAIL" to be reported. The service was resumed after the board was reset, and the "COMMUN_FAIL" alarm and abnormal reset of the board did not occur after that.
Host version is 5.21.12.42
EGS2 board version is 2.14
Host version is 5.21.12.42
EGS2 board version is 2.14
COMMUN_FAIL
Upgrade the SSN2EGS2 board to R6C02B014 (board software version 3.15) to solve the problem.
According to the feedback from the site, this problem is a sudden interruption of single-board reset service without any operation during the normal operation of the service.
Analyzing the black box data, it is found that when processing Hello messages, the first two processes are successful, and when it comes to the third process, there is a failure to release memory exception, which leads to a single board reset.
Log message:
bb1.log 2008-10-14 20:19:33 D:/3500prj/public/HardDrv/NP3454Drv/NP3454PktMng.cpp,813, Hardware operation failed ,(null)
Processing Hello telegrams Error, need to free memory space. bb1.log 2008-10-14 20:19:33 Error:0x70008, freeAddr=0x1cc8070, BufSize=0x90, dmm_intf.cpp, line:1439, ,
01cc80f0: 00000000 00000000 64656164 64656164
bb1.log 2008-10-14 20:19:33 Reset: File_NP3454PktMng.cpp, Line_859, Type_0xf0000010.
Error freeing message memory, memory release failed
bb1.log 2008-10-14 20:19:33 Error:0x70008, freeAddr=0x1cc7380, BufSize=0x90, NP3454PktMng.cp, line:669, ,
01cc7400: 00000000 04001200 0180c200 00000000
Error in requesting memory for message, memory request failure
bb1.log 2008-10-14 20:19:33 Error:0x70008, freeAddr=0x1cc7380, BufSize=0x90, dmm_intf.cpp, line:1439, ,
01cc7400: 00000000 04001200 0180c200 00000000
Analyzing the reset record, it is due to memory request failure or memory write out of bounds.
By analyzing the code, we found that:
When processing the protocol message, firstly, the pointer is not initialized; secondly, due to memory leakage, the application fails after running out of slice points and non-slice points, but only the black box of level3 is remembered, and the error is not returned or restarted directly; lastly, it is a mistake of the pen to judge the release of the memory in the failure of the sending, which results in the failure to release the memory;
Due to the memory leakage, the application fails in other tasks after running out and it will be restarted; And if pSendMsgBuf is a pointer that is not initialized and has not gained space, releasing it below may also lead to repeated releases and restart;
Therefore, the cause of the problem is: due to the failure to release memory after applying for memory in special scenarios, which leads to memory leakage and single board reset. Accumulation after memory leak is a long term process, so the problem disappears after resetting the single board. The problem is a single board quality issue.
Analyzing the black box data, it is found that when processing Hello messages, the first two processes are successful, and when it comes to the third process, there is a failure to release memory exception, which leads to a single board reset.
Log message:
bb1.log 2008-10-14 20:19:33 D:/3500prj/public/HardDrv/NP3454Drv/NP3454PktMng.cpp,813, Hardware operation failed ,(null)
Processing Hello telegrams Error, need to free memory space. bb1.log 2008-10-14 20:19:33 Error:0x70008, freeAddr=0x1cc8070, BufSize=0x90, dmm_intf.cpp, line:1439, ,
01cc80f0: 00000000 00000000 64656164 64656164
bb1.log 2008-10-14 20:19:33 Reset: File_NP3454PktMng.cpp, Line_859, Type_0xf0000010.
Error freeing message memory, memory release failed
bb1.log 2008-10-14 20:19:33 Error:0x70008, freeAddr=0x1cc7380, BufSize=0x90, NP3454PktMng.cp, line:669, ,
01cc7400: 00000000 04001200 0180c200 00000000
Error in requesting memory for message, memory request failure
bb1.log 2008-10-14 20:19:33 Error:0x70008, freeAddr=0x1cc7380, BufSize=0x90, dmm_intf.cpp, line:1439, ,
01cc7400: 00000000 04001200 0180c200 00000000
Analyzing the reset record, it is due to memory request failure or memory write out of bounds.
By analyzing the code, we found that:
When processing the protocol message, firstly, the pointer is not initialized; secondly, due to memory leakage, the application fails after running out of slice points and non-slice points, but only the black box of level3 is remembered, and the error is not returned or restarted directly; lastly, it is a mistake of the pen to judge the release of the memory in the failure of the sending, which results in the failure to release the memory;
Due to the memory leakage, the application fails in other tasks after running out and it will be restarted; And if pSendMsgBuf is a pointer that is not initialized and has not gained space, releasing it below may also lead to repeated releases and restart;
Therefore, the cause of the problem is: due to the failure to release memory after applying for memory in special scenarios, which leads to memory leakage and single board reset. Accumulation after memory leak is a long term process, so the problem disappears after resetting the single board. The problem is a single board quality issue.
None
END
Selling OSN3500 SSN2EGS2_Product Quotation_Sales Manufacturer_Product Characteristics_Product Description_Huawei SDH Transmission Equipment Sales
Supply OSN3500 SSN2EGS2_Troubleshooting_Installation and Tuning_Technical Specifications_Technical Parameters_Huawei SDH Transmission Equipment Sales


Chinese
English





