Description of the problem
Huawei OSN3500 equipment ring networking, -----NE1-12(slot11)-------(Slot8)NE1-23(Slot11)-----(Slot8)NE1-13-------, all the fibers are not directly connected through the optical fiber, and in the middle of the network, all the fibers are jumped through the Siemens wavelength division equipment, and the whole network is clocked with the extended SSM protocol, NE1-23 began to track the clock in the direction of NE1-12. Field engineers feedback that when disconnecting the fiber between NE1-12 and NE1-23, the NE1-23 clock is first inverted to the internal clock source, and then after a period of time, it is inverted to the line board in the direction of NE1-13 network elements. Reported alarm: LTI, S1_SYN_CHANGE
Processing
Please refer to the attached document for a detailed description
From the returned data, it can be judged that the clock situation is normal. But there are some irrationalities in the reporting of LTI alarms: LTI alarms are reported after S1SYNCCHANGE alarms; in fact, there is a fixed delay of 7-8 seconds for s1syncchange alarms. In this case, it is more accurate to look at the clknotracemode alarm.
The other important point about the cause of this problem is that: suppose a network element has two line clock sources A&B, when clock source A has a failure to invert to clock source B, and clock source A is troubleshooting, then by default there is a 5-minute recovery time, then in these 5 minutes, if clock source B has a failure, then before the end of the inversion recovery time, the clock source will not track the clock in B but will instead be inverted to the internal clock source. I heard from R&D that this mechanism is done as suggested. The problem should be caused by RLOF being reported above both clock sources for a short period of time at both ends.
Solution
1, first suspected that there are problems with the clock configuration, after the recovery of MO, analysis, the entire ring on the network elements have been opened to expand the SSM protocol, and the clock ID of the central network elements have been set up, and from the phenomenon of the problem, the clock source is inverted to the internal clock after 30 seconds or can be inverted to the other direction, so initially ruled out for the problem of clock configuration;
2, suspected that the middle through the Siemens WDM equipment for optical signal processing delays, but a line in the ring of other sites for testing, the clock can be immediately inverted, initially ruled out the doubt;
3, suspect that NE1-23 site cross board above the existence of faults, but after the cross inversion, the failure phenomenon remains;
4, a line of test again, and then let a line of the clock collected part of the data, through the data analysis, finally found the cause of the problem, see the description of the cause analysis section.
Recommendations and summary
You need to understand the difference between clock reversal and business reversal above, business reversal, as long as the standby channel, regardless of whether the reversal recovery time has arrived, will be reversed to the standby channel; while the clock reversal, you need to be in the standby channel's reversal recovery time expires, before it will be reversed to the standby channel above.
Cause Analysis
1. 00:44:46: End of MSAIS alarm on board 8:
107726 8 MS_AIS MJ end 2009-03-05 00:40:17 2009-03-05 00:44:46 0x01 0x00 0x01 0xff 0xff
The clock quality of board 8 changed from DNU (clock not available) to PRC (primary clock):
#1-23:szhw [Castellana S.][][2009-03-05 00:44:45-05:00]>
The clock source was reversed, from board 11 to board 8:
EVENT-SYN-SWITCH
FROM-SYN TO-SYN SWITCH-STATE SYN-TABLE
0x0b01 0x0801 auto syn-tbl
#1-23:szhw [Castellana S.][][2009-03-05 00:44:45-05:00]>
2. immediately followed by a rlof alarm on board 8:
107739 8 R_LOF CR end 2009-03-05 00:44:46 2009-03-05 00:44:47 0x01 0x00 0x01 0xff 0xff
Board 8 clock quality changed from prc to DNU:
EVENT-SSM-CHANGE
SYN FROM-SSM TO-SSM MANUAL
0x0801 QL_PRC QL_DNU 0
#1-23:szhw [Castellana S.][][2009-03-05 00:44:46-05:00]>
The clock source chaged from slot 8 to slot 11 //The clock source changed , inverted from board 8 to board 11:
EVENT-SYN-SWITCH
FROM-SYN TO-SYN SWITCH-STATE SYN-TABLE
0x0801 0x0b01 auto syn-tbl
And then, the quality of the clock in slot 8 is DNU, and in slot 11 is 0x12 (LTI)// At this point: the quality of board 8 is DNU, and the quality of the clock in board 11 is 0x12.
3、00 :45:45, rlof alarm in board 11
107741 11 R_LOF CR end 2009-03-05 00:45:45 2009-03-05 00:50:44 0x01 0x00 0x01 0xff 0xff
Clock source reversed again:
107742 10 S1_SYN_CHANGE MJ end 2009-03-05 00:45:45 2009-03-05 00:45:51 0x01 0x00 0x01 0xff 0xff
107743 9 S1_SYN_CHANGE MJ end 2009-03-05 00:45:45 2009-03-05 00:45:52 0x01 0x00 0x01 0xff 0xff
SYN-SWITCH-STATE
SWITCH-STATE CURRENT-SYN
syn-auto 0xf101
Total records :1
#1-23:szhw [Castellana S.][][2009-03-05 00:48:42-05:00]>
Tracking internal sources at this moment, reporting LTI alerts:
107746 9 LTI MJ end 2009-03-05 00:45:53 2009-03-05 00:49:59 0x01 0x00 0x01 0xff 0xff
107745 10 LTI MJ end 2009-03-05 00:45:53 2009-03-05 00:50:00 0x01 0x00 0x01 0xff 0xff
4. 00:49:52, board 8 waits for the recovery time (5 minutes) to expire, clock quality is restored, and the clock source is reversed back to board 8:
EVENT-SSM-CHANGE
SYN FROM-SSM TO-SSM MANUAL
0x0801 QL_DNU QL_PRC 0
#1-23:szhw [Castellana S.][][2009-03-05 00:49:52-05:00]>
EVENT-SYN-SWITCH
FROM-SYN TO-SYN SWITCH-STATE SYN-TABLE
0xf101 0x0801 auto syn-tbl
#1-23:szhw [Castellana S.][][2009-03-05 00:49:52-05:00]>
The LTI alert ends after the S1SYNCCHANGE alert ends.
From the above, the initial judgment is that the clock situation is normal. Only in the LTI alarm reporting, there are some irrationalities: LTI alarms are reported or ended after S1SYNCCHANGE alarm; in fact, there is a fixed delay of 7-8 seconds for s1syncchange alarm. In this case, it is more accurate to look at the clknotracemode alarm.
The other important point about the cause of this problem is that: suppose a network element has two line clock sources A&B, when clock source A has a failure to invert to clock source B, and clock source A is troubleshooting, then by default there is a 5-minute recovery time, then in these 5 minutes, if clock source B has a failure, then before the end of the inversion recovery time, the clock source will not track the clock in B but will instead be inverted to the internal clock source. I heard from R&D that this mechanism is done as suggested. The problem should be caused by RLOF being reported above both clock sources for a short period of time at both ends.
The related technical information in this chapter and the troubleshooting process of SDH equipment are collected and organized by Shenzhen Optical Transmission Network Technology Company Limited ( www.opticaltrans.com), please retain! Our company specializes in the sale of Huawei SDH optical transmission equipment, SDH transmission equipment.


Chinese
English





