Problem Description
Counterclockwise 170-178-179-180-181-182 network elements form an MSP ring. Among them, 170 is an OSN7500 and the rest of the devices are OSN3500 devices. The ECC link management at site 178 shows that 178 reaches the 170 network element by forwarding through the 179 network element. 170 network element 11-T2SL64 and 178 network element 8-N2SL64 are interfaced.
Alarm messages
None
Processing
1 ) Use the network administrator and the command line CM-GET-BDINFO to check the ECC allocation of network element 170 is normal
The display is as follows: FIBER-PORT-STATE
BID PORT PORT-STATE PORT-RATE LINK-CHAN LOGIC-CHAN-STATE
6 1 port-enable D1-D3 0 ok
6 2 port-enable D1-D3 1 ok
7 1 port-enable D1-D3 2 ok
8 1 port-enable D1-D3 3 ok
11 1 port-enable D1-D3 4 ok
12 1 port-enable D1-D3 5 ok
13 1 port-enable D1-D3 6 ok
Use the command cm-get-chaninfo: 4 to query the ECC channel status of the corresponding single board, it shows that the send/receive bytes are both available and increasing (see the attachment for details), but the DNEID shows 0X00FFFFFF, and the SNEID shows 0x000900b2 (178)
2 ) Use cm-get-bdinfo command to query in 178 network element, it shows that it fails to receive
FIBER-PORT-STATE
BID PORT PORT-STATE PORT-RATE LINK-CHAN LOGIC-CHAN-STATE
8 1 port-enable D1-D3 0 rx_f
11 1 port-enable d1-d3 1 ok
The query using the command cm-get-chaninfo: 0 shows that there are only sending bytes but no receiving bytes on this channel. However, DNEID shows 0000000000 and SNEID shows 0000000000.
3 ) In 170, 178, use the command cm-get-chanerror to query, there is no error code in the ECC channel of these two single boards.
(4 ) Hard reset GSCC at site 170 and GSCC at site 178, the phenomenon is still the same, hard reset SL64 at site 170 and SL64 at site 178, the phenomenon is still the same.
(5 ) Replace the SL64 board at site 178, the problem is solved.
Root cause
(1 ) 170, 178 site ECC setup problems (not enabled, use byte inconsistency, ECC )
2)Insufficient number of ECC channels
3)Master control problem
4 ) Line veneer problem
5 ) Other problems such as inconsistent ECC protocol stack checksum switches
Recommendations and Summary
There are many reasons for ECC failure, including error code, inconsistent ECC checksum state, port assignment, enable state, channel byte ( D1-D12,D1-D3,D4-D12 ), host, line veneer, etc. When locating the failure, cm-get-ECC is the best way to locate it. The commands cm-get-eccroute,cm-get-bdinfo,cm-get-chaninfo,cm-get-chanerror are very useful in locating faults, and you can basically locate the faulty site and faulty board by analyzing the returned parameters.


Chinese
English





