Académique Documents
Professionnel Documents
Culture Documents
1 Problem Description
Last week the CDT trace was failure when it was carried out in LMT because of flow control.
After analysis, we found the problem that SPU CPU overloaded and then the CPU overload
From the daily performance data analysis, SPU CPU usage is hign and above 60% from 9:00
VS.MeanCPUUtil.SPU(%)
80
70
60
50
40
30
20
10
0
00
00
00
00
00
00
00
00
00
00
0
:0
:0
:0
:0
:0
:0
:0
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10
11
12
13
14
15
16
The following is average CPU usage for all subsytem, it is around 65% from 10:00 to 18:00:
Average of MeanCPUUtil.SPU(%)
70
60
50
40
30
20
10
0
00
00
00
00
00
00
00
0
00
00
00
:0
:0
:0
:0
:0
:0
:0
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10
11
12
13
14
15
16
8
28
-2
-2
-2
-2
-2
-2
-2
-2
-2
8
-2
-2
-2
-2
-2
-2
-2
1-
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
1.2 Mass of SHO lead to CPU overload
The main factors which impact on SPU CPU usage are following: RRC Connect sub per BH,
From the following statistic, SHO times occupy 40% of the CPU usage.
19%
And SHO Att distribution in one day is almost same with the CPU usage distribution, both of
0
100000
200000
300000
400000
500000
600000
700000
800000
20
2010-02-17
10 0:
0
10
20
30
40
50
60
70
-1 00
Time -2
20 8
2009-12-7 11:30 10 1:
-1 00
-2
2009-12-7 15:30 20 8
10 2:
-1 00
2009-12-7 19:30 20
-2
8
10 3:
2009-12-7 23:30 -1 00
-2
20 8
2009-12-8 3:30 10 4:
-1 00
-2
2009-12-8 7:30 20 8
10 5:
-1 00
2009-12-8 11:30 -2
20 8
2009-12-8 15:30 10 6:
-1 00
-2
2009-12-8 19:30 20 8
10 7:
-1 00
2009-12-8 23:30 -2
8
HUAWEI Confidential
2009-12-9 7:30 10
-1 9 :0
-2 0
2009-12-9 11:30 20 8
10 10
-1 :0
Average of MeanCPUUtil.SPU(%)
VS.SHO.Att.RNC
2009-12-9 15:30 20
-2
8
10 11
2009-12-9 19:30 -1 : 00
-2
20 8
2009-12-9 23:30
Issue Of CPU Overload For RNC201
10 12
-1 :0
-2 0
2009-12-11 4:00 20 8
10 13
-1 :0
2009-12-11 8:00 -2 0
2009-12-11 12:00 10 14
-1 :0
-2 0
2009-12-11 16:00 20 8
10 15
-1 :0
2009-12-15 13:00 -2 0
8
16
2010-1-28 0:00 :0
09/12/2009, and it is just the date when RNC201 cutover from KD3MSC1 to AB3MSC1.
0
2010-1-28 4:00
From the history performance data analysis, the SHO Att number increased from the time
2010-1-28 8:00
0
Because of high signal load leading to CPU overload, at the same time this issue springs the
Page3, Total14
2010-1-28 12:00
100000
200000
300000
400000
500000
600000
700000
800000
2010-1-28 16:00
Issue Of CPU Overload For RNC201
From the following gragh, we can see that Paging succ rate and RRC SSR both declined at
% Pa g in g S u c c Ra te
100%
90%
80%
70%
% Paging Succ Rate
60%
50%
40%
30%
20%
10%
0%
11-22
11-29
12-6
12-13
12-20
12-27
1-3
1-10
1-17
1-24
1-31
2-7
A B JR NC2 0 1 IB D RN C6 0 1 K A N RNC 5 0 1 T o ta l A v erage K A DRN C8 0 1
% RRC S S R ( s e r v ic e )
100%
95%
90%
85%
% RRC SSR (service)
80%
75%
70%
65%
60%
55%
50%
11-22
11-29
12-6
12-13
12-20
12-27
1-3
1-10
1-17
1-24
1-31
2-7
According to performance data analyisis, the main reason for high utilization of SPU CPU is
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
0
100000
200000
300000
400000
500000
600000
700000
800000
2010-02-17
Time(As hour)
Time
2009-12-8 1:00
2009-12-8 19:00 2009-12-7 11:30
2009-12-9 13:00 2009-12-7 15:30
2009-12-10 7:00 2009-12-7 19:30
HUAWEI Confidential
2009-12-9 7:30
2010-1-11 23:00
2009-12-9 11:30
2010-1-12 17:00
VS.SHO.Att.RNC
VS.SHO.Att.RNC
2010-1-15 17:00
2010-1-16 11:00 2009-12-11 4:00
The difference of SHO before and after MSC cutover
2010-1-21 17:00
2010-1-22 11:00
2010-1-28 4:00
09/12/2009, and it always keeps a high level in busy hour since that. From the following daily
Page5, Total14
2010-1-28 12:00
2010-1-24 17:00
2010-1-28 16:00
Issue Of CPU Overload For RNC201
The following analysis is based on the performance data of 7th Dec and 10th Dec. It shows the
difference of statistic about SHO counters including CS, PS and Signal Only. The blue line
indicates counters before MSC cutover, and the red line indicates the counters after MSC cutover.
CS
VS.SHO.AMR.AttOut:
PS :
VS.SHO.PS64.AttOut
VS.SHO.PS128.AttOut
VS.SHO.PS384.AttOut
VS.HSDPA.SHO.CellChg.AttOut:
VS.SHO.SigOnly.AttOut
From the above statistics, one conclusion is that there is small change in CS domain and PS
domain before and after MSC cutover, But there is much increasing for signal only SHO attempts
after MSC cutover, the signal only SHO attempts even achieves to 230,000, and this is consistent
Signal only SHO happens during period after RRC connection and before RAB connection.
Usually, RAB assignment happens immediatelly after RRC connection, as the following shows
So it should have few times of signal only SHO druing this period, but why mass signal only
From the analysis of IOS trace, many scene as the following exist in the system:
According to above gragh, from time 10:25:11(55), UE requests service to CN, but it
receives RAB assignment from CN till time 10:25:17(50), This process spends almost 6S. It is
just because so long waiting time and radio network environment changes, signal only SHO
happens.
configuration )delays the RAB assignment from CN, and then leads to mass signal only SHO.
2.3 Prediction the CPU usagea after the issue of signal only SHO resolved
Here is a prediction for CPU usage after the number of signal only SHO return to the level
before MSC cutover. If number of signal only SHO reduces to the level before MSC cutover,
(remove 230,000 times), we can caculate the CPU usage according to the arithmetic theory of
CPU load:
In present network, the statistic period for performance data is 30 minutes, and there are 8
sub-systems. Taking 230,000 for calculation, we can get result: Times per BH: 230000/1800/8 =
15.972,
According to test from lab, the main consumption of CPU are following,
Control plane traffic Unit CPU usage per
parameter procedure
That means the present CPU usage can reduce 7.67 percentage, it just get the CPU usage
before MSC cutover. Because now in busy hour the Ave CPU usage is 64.87%, if we minus
7.67%, the CPU usage will be under 60%, then it avoids the CPU overload problem.
The location for UE request service to CN(RRC_UL_DIR_TRANS) and RAB assignment from
In this process, RNC makes a role for message transfer, but CN makes a key role for message
resolution.
Here are analysis compare of IOS trace for RNC201 between before and after MSC cutover
TIME
FROM-CN Duration Average
URNTI FROM-UE
RRC_UL_DIR_TRANSF
RANAP_RAB_ASSIGN (s) (s)
MENT_REQ
210863312 16:30:23.24 16:30:23.93 0.69
RNC201 210867568 16:30:21.47 16:30:23.29 1.82
210872472 16:29:37.70 16:29:38.54 0.84
Before 210872832 16:30:24.22 16:30:24.95 0.73
1.23
MSC 210875112 16:27:38.53 16:27:41.14 2.51
Change 210876360 16:27:52.05 16:27:52.43 0.38
210879352 16:30:55.53 16:30:55.90 0.37
210880216 16:30:15.46 16:30:17.92 2.46
210931288 10.25.11.55 10.25.17.50 5.95
RNC201 210933344 10.21.44.49 10.21.45.47 1.00
210933648 10.57.20.16 10.57.24.51 4.35
210934288 10.57.38.90 10.57.40.23 1.33
After 4.12
210935168 10.19.06.20 10.19.12.29 6.09
MSC 210936784 10.22.05.43 10.22.09.50 4.13
Change 210938592 10.27.51.41 10.27.55.90 4.49
210943248 10.46.00.13 10.46.05.63 5.60
630196824 11.01.34.86 11.01.35.35 0.49
630201048 11.20.31.52 11.20.31.91 0.40
630203544 10.47.26.50 10.47.26.92 0.42
IBADAN 630208312 11.06.58.38 11.06.58.78 0.40
630211048 11.01.40.46 11.01.40.89 0.43 0.43
RNC601
630211776 10.40.48.70 10.40.49.08 0.38
630325360 10.47.49.04 10.47.49.42 0.38
630325704 10.50.44.58 10.50.45.00 0.42
630326120 11.01.49.70 11.01.50.12 0.58
It is obvious from above, after RNC201 cutovered to new MSC, the ave duration between
two signal messages is 4.12s, even the longest is 6.09s, But before MSC cutover, the ave duration
is 1.23s; By the way, the duration of RNC601 which belong to the other MSC is only 0.43.
3 Conlusion
1. SPU CPU usage is high because signal overload, and the high usage excess threshold of
2. The main factor for high SPU CPU usage is SHO times.
5. Mass signal only SHO is because of the RAB assignment delayed by CN.
6. Predicted calculation indicates that if the signal only SHO can return the level that before
4 Suggestion
Combined the above analysis, the long duration comes from CN. So further investigation
1.After MSC cutover, Is there any parameters adjustment, especilally the parameters about
2.Compare KD3MSC1 and AB3MSC1, Is there any difference between the function
performance or capacity?