Problem Description
The TEMP_OVER alarm is reported on more than one single board of OSN1500 at a bureau in northern China. Combining the alarm parameters and the problems of the equipment working environment, it is concluded that the alarm is caused by the low temperature of the single boards, and the following discusses the temperature detection mechanism of the NG-SDH equipment as well as some of the problems encountered.
Current network version: 5.36.18.50
Alarm message
TEMP_OVER
Processing
Check the TEMP_OVER alarm parameters in the current alarm as follows:
2929679 12 TEMP_OVER MJ start 2011-12-01 04:52:16 None 0x01 0x00 0x01 0x02 0xff
3000350 80 TEMP_OVER MJ start 2011-12-24 04:59:15 None 0x01 0x00 0x01 0x02 0xff
Combined with the current OSN1500 product manual ( V100R008C02 ), for the SDH class board, the parameter 4 indicates the type of board operating temperature over the limit, and for the cross class board, the parameter 1 indicates the type of board operating temperature over the limit, in which 0x01 indicates the upper limit of the board operating temperature, and 0x02 indicates the lower limit of the board operating temperature. Then for the three-in-one board should look at the parameter 1, the upper limit of the operating temperature over the limit, obviously and the actual situation is the opposite, the query of the current single board temperature cfg-get-bdtemp:80, the single board the current problem of -2 ° C.
BOARD-TEMP
BID TEMP-NOW
80 -20
Total records :1
So the product manual in the operating temperature over the limit type to see the parameter 1 is wrong, after the R & D to confirm, for the cross-class TEMP_OVER alarm parameters only need to look at the parameter 4, parameter 1 is a fixed value, OSN3500 ( V100R008 ) and OSN7500 ( V100R008 ) for the interpretation of the parameters are also wrong, other versions such as the R11 product documentation in the alarm There is a change in the explanation of the parameters, and you only need to pay attention to parameter 1.
2, NG-SDH equipment temperature reporting mechanism, the single board has a temperature chip, real-time detection of the temperature of the single board, the other single board and the main control board is through the backplane communication, the real-time temperature reported to the host, the host combined with the current network element of all the single board reported temperature in the maximum value of the temperature, recorded to the temperature performance events, so only the main control board can query the performance of the single board temperature events ( bdtempmax,bdtempmin,bdtempcur ), other single board query out is the temperature of the laser, the temperature is higher than the single board problem.
3, different versions of the same single board for the temperature query support is different, for example, SSN2PQ1 VER.C (with temperature chip) and SSN2PQ1S VER.C (without temperature chip), the former belongs to the early version, the latter is the product of the cost reduction, so the former supports the temperature query and temperature alarm reporting, the latter does not support this feature. For details, you can use cfg-get-bdtemp: bid to determine whether the board contains a temperature chip based on the returned result.
4, temperature alarm is a single board temperature chip query real-time temperature of the board, and temperature alarm threshold for comparison, if the limit is reported TEMP_OVER alarm. Of course, the alarm threshold can be modified through the command line : cfg-set-bdtempth:Bid, temphighgate, templowgate; threshold value is a range of restrictions, such as the upper threshold in the 60-80, some of the data board ( SSN1EAS1/SSN1EAS2/SSN1EMS4/SSN1EGS4 ) is 80, others are 70, and the default is 65. It is generally not recommended to set the temperature threshold above 70.
Root Cause
None
Recommendations and Summary
Generally TEMP_OVER is reported because the temperature is too high, but there are cases when the temperature is too low, in order to eliminate this alarm, please try to make sure that the temperature of the working environment of the device is in normal conditions (between 0~45 degrees).


Chinese
English





