Problem Description
HSC_UNAVAIL alarm is found in a network site, and after processing, it is found that the alarm has many possible causes, and there will be many situations in the site of NG-SDH equipment to report HSC_UNAVAIL alarm, and the following is a detailed description of the meaning of the alarm.
Alarm information
HSC_UNAVAIL
Processing
In the process of processing, the final positioning is a cross-board failure, but through analysis, not only cross-board failure will lead to the report of this alarm, the following is a list of possible failures according to the parameters of the reported alarm:
1, alarm parameter PARA1 = 0x80 when
It means that the cross board has just been reset for less than 5 minutes (the timing of 5 minutes is a little bit wrong, within the range of 5 to 8 minutes is normal, because the timing is counted from the start of the alarm module's task after the start of the count), and if there is no other abnormality at this time, the alarm will disappear automatically after 5 minutes.
This alarm does not necessarily affect the emergence of cross-maintenance reversal, but only to remind the user at this time it is best not to hard reset the motherboard or pull the board operation, so as not to affect the business (the backup board just up to synchronize some data from the motherboard), in addition to the single-board srv on the yellow light
2, alarm parameter PARA1 = 0x01 when
indicates that the backup board detected the state of the board is bad, this alarm will generally be accompanied by HARD_BAD alarm, if there is no HARD_BAD alarm in the current alarm, we must pay attention to query the history of the alarm in the wrong HARD_BAD, and at the same time back up the records of the black box, to locate the specific aspects of the hardware failure caused by. (Note: If there is no HARD_BAD alarm in the current alarm, there is also a possibility that the alarm PARA1=0x81 was reported after the hard reset, and the alarm 0x80 was canceled after the 5-minute timer arrived, so there is still the alarm PARA1=0x01, which is caused by the unfinished host backup of the bad ready line, and it will end only after the host backup is completed.)
3、Alarm parameter PARA1 = 0x02 when
means that the business board detects a bad ready line, when this alarm occurs, the general business board will report T_LOSEX (the business board detects that the business bus sent over by the crossover has an alarm) or TR_LOC (the business detects that the crossover board sends over the header of the frame, the clock, the board is bad and other faults) and other alarms, and at the same time, the crossover board is bad through the 0xCE4 command; if it is a UXCSB with an extended subframe, it may be that the XCE board reports the bad line. It is possible that the XCE board reports BUS_ERR alarm
On the cross board, you can check the status of the cross board reported by the service board through the 0xC47 command
:optp:9,0,77,1,c,47,0
:optp:9,0,77,1,c,47,1
4、When alarm parameter PARA1 = 0x04
indicates that the cross-over board detects a type 2BUS_ERR alarm (i.e., the main board of the bus reporting the alarm detects it normally, while the backup board detects it abnormally ).
You can check the integrated status of type 2 BUS_ERR detection by 0xC47 command on the cross board
:optp:9,0,77,1,c,47,2
:optp:9,0,77,1,c,47,3
The specific alerts are as follows
Note: Cross actually report alarms may also be a composite of 2, 3 or 4 of the above four cases, such as PARA1= 0x81, PARA1=0x03, PARA1=5, PARA1= 0x07, etc., at this time, we have to locate them step by step according to the meanings represented by each bit.
5, the alarm does not disappear
After hard resetting the main crossover board for master and backup crossover inversion, the HSC_UNAVAIL alarm does not disappear.
The general situation of the problem is as follows: the host gives the hard reset command to the cross board before the 0xC52 pre-reset command, set the cross board bad offline (this is done to realize the cross of the master and the backup of the rapid reversal, reduce the impact on the business), in the real reset of the single board before the cross software detects the single board state bad report HSC_UNAVAIL ( PARA1= 0x01 ), and then due to the failure of the host side processing, the single board reset alarm does not disappear. After that, due to the faults in host processing, the alarms do not disappear after the reset of the single board, and no BD_STATUS alarms are reported.
In similar cases, we should go to the site in time to locate whether it is a problem with the cross board.
Method 1:
Directly use 0x211 to command whether there are still alarms on the single board
:optp:9,0,2,6,2,11,0,0
Method 2:
You can also use the alm-set-bsrep command to turn on the original alarm reporting function , to see if the cross-board reports the end alarm to the host, and this alarm should be registered as coming from the single board . If the crossover board doesn't report, it will automatically check the end of the host after 3 minutes. If the cross board doesn't keep reporting alarms to the host, it's not the cross board's problem anyway.
Root Cause
HSC_UNAVAIL is only reported by the spare board, the meaning of this alarm is as follows:
Alarm meaning: para[0] indicates the type of alarm
BIT[0]: Bad state of this board
BIT[1]: A service board detects that this board is bad
BIT[2]: This board has detected a type 2BUSERR.
BIT[3~6]:reserved
BIT[7]: spare board hard reset up for 5 minutes
para[1] Indicates the primary and backup status of the unavailable board
0-Main board; 1-Spare board
para[2] Indicates the physical slot number where the board is unavailable
9, 10 (3500 devices )
80, 81 (1500, 2500 devices )
para[3~4] 0xff
Recommendations and Summary
The alarm is summarized as follows:
1. 5 minutes after the hard reset up of the cross board, if there is no accompanying other faults, only the HSC_UNAVLIABLE alarm is reported;
2、Cross-board software detects its own hardware failure and sets the single board bad, at this time, in addition to reporting HSC_UNAVLIABLE alarm, accompanied by HARD_DAD alarm and the rest of the alarms that lead to bad hardware, such as CHIP_FAIL, POWER_ABNORMAL, etc.;
3. The FPGA of the cross-board detects a hardware failure, which is accompanied by the HARD_DAD alarm in addition to the HSC_UNAVLIABLE alarm.
4. The cross-over software detects a type 2BUS_ERR, which is accompanied by a BUS_ERR alarm in addition to the HSC_UNAVLIABLE alarm;
5, the business board detects that the clock or frame header or signal sent by the cross board is faulty, and reports that the cross board is bad (the cross board itself does not have any faults), at this time, the cross board only reports the HSC_UNAVLIABLE alarm, and the business board reports the T_RLOC or T_LOSEX alarm;
The HSC_UNAVAIL alarm is only reported by the crossover backup board! The main crossover board does not process them!


Chinese
English





