Description of the problem
A customer large customer ring 211-212-213-214-215-216 composed of 2.5G bidirectional multiplexing segment, in which 211 and 212 are 10G equipment other sites for the 2500 + equipment, 211 and 212 are connected to the bare fiber (that is, the actual fiber optic cable connection) between the other sites are connected through the Metro6100 equipment wave channel for networking.
On a certain day, the fiber optic cable between users 211 and 212 is disconnected, and the 9-slot S16 at site 216 (connected to the 215 Anhua 10-slot S16) irregularly and instantaneously reports R_LOS or R_LOF, resulting in an abnormal reversal of the multiplexing segment and service interruption. 15-slot SL16A at site 211 occasionally follows the 9-slot S16 at site 216 and reports R_LOS or R_LOF at the same time.
During troubleshooting, the first-line engineers can reproduce the fault every time by shutting down the laser of SL16A in slot 16 or 15 at site 211 when the network-wide multiplexing segment protocol is normal. After the full network multiplexing protocol is stopped, the fault cannot be reproduced by the above operation. Swapping the east-west optical interface boards at sites 211, 215, and 216 does not cause the fault to disappear. The fault does not disappear by swapping the crossover boards of 215 and 216.
Processing
The following is the specific positioning process from August 24th to August 31st:
1. Since the fault does not reappear in the case of a network-wide protocol stop, it means that the fault is only related to the inversion action (i.e., the switching of pages). In order to prove this, we carry out a multiplexing segment practice inversion and a mandatory inversion for the east and west directions of site 211 respectively, and the fault does not reappear in the case of the practice inversion, and the fault reappears in the case of the mandatory inversion. From this it is clear that the failure is indeed due to multiplexed segment inversion (page switching) and has nothing to do with the protocol itself.
2. Since there are two cases of RLOS reported on the optical board, one is that the optical port does not receive light, and the other is that the optical port receives white light. In order to find out which one of these causes triggers the fault, so, when the fault was reproduced, the optical power meter was utilized to monitor the light-emitting port of the 10-slot S16 at site 215, and it was observed for 5 minutes, and no optical power jump was seen. Then, we set the clock tracking mode to free oscillation at all sites in the network and turned off the laser at SL16A in slot 16 at site 211, and the fault reproduced. This indicates that the luminous power in slot 10 at site 215 was normal when the fault occurred, and there was no relationship with the clock, which means that there should be no problem at sites 215 and above.
3, according to the above, we need to go to Taojiang to test the 216 site 9 slots of the receiving port of the optical power is normal, due to waiting for the instrumentation, we can only 215 and upstream sites to further analyze. In order to rule out whether it is a 2500 + network elements of the multiplexing segment ring spare channel ring, we configure the VC4 pass-through service on the spare channel of each multiplexing segment node on the network management of the whole network, and the fault did not reappear, indicating that the spare channel is good.
4. Next, we use the PTP command to directly send through pages in the order of 215, 214, and 213 sites. After sending through pages to 215 and 214, the fault did not reappear, and after sending through pages to 213, the fault soon reappeared. From the phenomenon, there should be a cumulative process for the failure to appear.
Next, we loopback to slot 16 of 211, and the fault does not disappear; we loopback to slot 15 of 212, and the fault does not disappear; we loopback to slot 16 of 212, and the fault disappears; and we loopback to slot 9 of 213, and the fault disappears. Since the optical port inner loopback has been performed on slot 9 of 213, the fault does not reappear when the PTP command is used to send down the passthrough page.
5, because we can not find the exact location of the fault, and we are operating in Anhua, so the fault of the upstream site 215 site 9, 10 slots for the replacement of the optical board, respectively, replaced with a new 62S16 and 63S16 board, but the fault does not disappear. Then the crossover boards were replaced with 62XCS single boards and the spare crossover boards were pulled out, and the faults still did not disappear, which indicated that there was no problem with the single boards at this site. And also carefully observed the network element, there is no reverse pin condition.
6, next, we got the splitter, installed a 63S16 single board in 6 slots of the 215 site, through the splitter, the 215 site 10 slots of the light-emitting port for the splitter, a part of the normal transmission to the WDM, a part of the transmission to the 6 slots of the S16 single board of the light receiving port. Then, the fault reproduction, in the fault reproduction, the 6-slot S16 did not report RLOS or RLOF alarm, indicating that the 215 site 10 slot sends out the light is normal.
At this time, the optical power of 63S16: -12.7
Optical power of 9S16 in the direction of Taojiang: -7.8 (this value has been maintained)
The optical power of the incoming wave division is -3.4 after the splitting of the No. 10 board in 215.
In order to determine the accuracy of the RLOS that was not received on slot 6, we replaced the 62S16 in slot 6 again to make the fault reproduce, and the 62S16 in slot 6 also did not report the RLOS or RLOF.
At that time, the optical power of 62S16: -15.30
Optical power of 9S16 in the direction of Taojiang: -7.8 (always keep this value)
After splitting at 215's board 10, the optical power of the incoming wave split is -3.4.
Under normal circumstances, the query 215 site 9S16 optical power is -5.60. suspecting that the optical power is not normal, we add light attenuation on 215 site 9S16, the optical power is -15.00, and at this time, the optical power of board 6 is -15.30. but the fault will still reappear.
7, since there is no problem at site 215, we next need to determine whether it is 216 site 9 slots board is bad, or just received RLOS. so we are in the Taojiang site, through the SDH tester for split test.
We were in the 215 9-slot light receiving port and Taojiang WDM equipment sent to the 215 network element 9-slot light-emitting port to split the light, the test results are the same, the SDH meter will report RLOS.
In order to prove that our operation of splitting light to slot 6 at site 215 is feasible, we also passed splitting light to slot 6 S16 board at site 216, and at the time of fault reproduction, slot 6 S16 board reported RLOS alarm.
In summary, there was no problem at sites 215 and 216, so the problem was judged to be between these two sites. Tests of the WDM veneer were performed by the office to confirm the problem veneer.
Root cause
According to the analysis of the whole network data on site and the testing of optical signals of 215-Anhua and 216-Taogang sites by SDH tester, it is basically determined that the problem is between 215 and 216 sites, and through the analysis, it is preliminarily positioned as a failure of a LWX board of WDM equipment in Meicheng, but the office decides not to go to Meicheng to test it for the time being because there are no spare parts for replacement, and the engineer of the office will go to test and replace the board directly after the spare parts are applied for. When the spare parts application comes down, the office engineer will go directly to test and replace the board. Therefore, this conclusion was not fully confirmed. The fault was triggered by the reversal of the multiplexing segment (i.e., page switching), which was positioned as a WDM equipment fault, and the problem was solved after the office replaced the veneer.
Solution
Emergency recovery measures:
None
Complete Resolution Measures:
Replacement of LWX veneer for WDM equipment in Mason City


Chinese
English





