Description of the problem
In the osn3500 equipment deployed in the customer's current network, the px1 single board and peg16 single board have repeatedly flashed the lsr_will_die alarm for about 20 seconds, and after modifying the threshold value of the paranoid current, the alarm is still reported on some optical ports in the current network, which has caused the customer to attach great importance to it, and ask for the root cause analysis and a thorough solution.
Host version: v200r011c00spc200
Data board: ssn1peg16, ssn1pex1
Alarm information
lsr_will_die
Processing
Temporary solution: Block the lsr_will_die alarms on the ge and 10ge ports on the network management, set the alarm threshold to 900, turn on the network management lsr_bcm_alm alarm monitoring, and determine whether the optical module is abnormal by observing the lsr_bcm_alm alarms.
Complete solution: need to develop a patch or a subsequent version to solve the floating-point operation abnormality problem.
Root Cause
The ler_will_die alarm is the alarm reported by the paranoia current exceeding the set threshold value, this alarm indicates that the laser life is about to end, the appearance of the alarm does not mean that the optical module immediately fails, but can continue to use for a period of time, during this period of time to be ready to replace the optical module.
Combined with the phenomenon of the current network and a line of data analysis, more than one site at the same time reported that the alarm, and the use of optical modules not more than 2 years (the life of the optical module is generally in the 3 ~ 5 years), as well as this alarm is a flash, so the preliminary inference that more than one optical module at the same time the possibility of bad is not large, so from the single board software and hardware from the two sides to locate the problem.
Hardware:
1, analyze the veneer manufacturing information, found that some of the optical modules are not supported by the data veneer, and the same veneer uses a lot of different vendors of optical modules, which has certain hidden problems, but not the key to the problem, because the problematic optical modules in other sites also have normal work.
2, in the laboratory to build a mirror environment, reproduce the alarm.
3, the laboratory to complete the warm box test, observe the relationship between temperature and alarm.
Software:
1, software R & D troubleshooting code, lsr_will_die alarm and lsr_bcm_alm alarm will theoretically appear in pairs, although the current network will be lsr_bcm_alm blocked off, but can be viewed through the navigator of this alarm.
2、Coordinate a line, in the present network will bias current alarm high threshold set to 1, after 25 seconds or so to report the alarm, found that the bias current high alarm and laser life is about to end alarm are pairs of appearance, but in the present network of the previous alarm in the log, did not find that the two alarms are pairs of appearances, the only difference between the two alarms in the judgment of the more floating-point operation, so the suspicion of floating-point operation is very big.
3, through the laboratory simulation, found and analyze the conclusion is consistent. Therefore, it can be judged that lsr_will_die alarm is due to floating-point operation abnormality caused by the alarm false alarms, before other bureaus also appeared floating-point operation abnormality caused by the power_abnormal alarm abnormal reporting problems in the laboratory to hang a temporary version of the print floating-point operation results, also appeared in the floating-point operation result abnormality.
At this point, it can be determined that the lsr_will_die alarm is a false alarm caused by floating-point operation anomalies.
Recommendations and Summary
Data veneer optical port lsr_will_die alarm is designed with reference to ptn product, traditional mstp product data veneer does not have this alarm, so we should pay attention to the applicability of the alarm and the sensitivity of the alarm when designing the alarm. In particular, some sensitive overseas customers see this alarm is very important, asked to give the root cause, giving us unnecessary trouble. It is recommended to modify the alarm name or block the alarm directly.
In addition, for the setting of alarm threshold value, there must be a unified and effective standard, it is best not to easily modify the threshold value, otherwise it is easy to cause customers to resent and question.
This chapter of related technical information and SDH equipment troubleshooting process by the Shenzhen Optical Transmission Network Technology Limited collection (www.opticaltrans.com), reproduced please retain! Our company specializes in Huawei SDH optical transmission equipment,SDH transmission equipment sales.


Chinese
English





