Problem Description
On a certain day, the customer feedbacks that a set of OSN3500 equipment keeps going offline and can log in automatically after a few minutes. This network element is not a gateway network element, and it is connected to the gateway network element through an SLQ4 board to form a 1 1LMSP, with 2SNCP access rings under it.
Alarm message
The network element is offline.
Processing
1, query the main control board reset records, found that there are a large number of reset records, network elements off the management of the cause for the main control board frequently reset, TYPE10, for hard dog reset;
2, query the network element ECC interoperability, interoperability network element only 100, and no ECC error code, exclude ECC interruption caused by too much;
3、Replace the main control board, download the data, the fault remains, to rule out the main control board failure causes;
4. Move the original 24-slotGSCC board to 25-slot, the fault remains, excluding the main control slot;
5、Replace the EOW and AUX boards, the failure is still the same, to rule out the cause of failure due to the EOW or AUX board;
6, pull out the EOW board alone, found that the main control is no longer reset, suspect that the EOW board slot problem;
7, replace the OSN7500 sub-frame, all single boards are using new spare parts, the failure remains, to rule out the motherboard and single board problems;
8, ruled out the problem of the equipment itself, the fault will be localized in the docking problem or external factors;
9, the sub-frame of all the pigtails pulled out to observe, the main control is no longer reset, one by one will be inserted back into the pigtail, found that when inserted into the 11-SLQ4-2 optical port pigtail when the main control began to reset;
10, the failure point will be located in the 11-SLQ4-2 and the opposite end of the equipment on the docking, to the customer to understand that the opposite end of the MSAP equipment for the positive;
11, the use of :cfg-set-lineused:23,11,2,unused command to the optical port corresponding to the public service is set to unavailable, observe a period of time after the failure does not reappear, the fault is eliminated.
Root cause
Frequent interruptions of the E1 byte of the optical port's official business due to error messages sent by the docking device cause the hard dog to reset.
Suggestions and summary
The hard dog reset of the main control board is generally caused by interruptions, and the causes of interruptions are generally: ECC interruptions, Ethernet interruptions and official interruptions, when the problem occurs, you need to confirm each item.


Chinese
English





