Vous êtes sur la page 1sur 14

Issue Of CPU Overload For RNC201

Issue Of CPU Overload For RNC201

Huawei 3G RNO team 2010-2-16

1 Problem Description

1.1 SPU CPU Usage is high in busy hour

Last week the CDT trace was failure when it was carried out in LMT because of flow control.

After analysis, we found the problem that SPU CPU overloaded and then the CPU overload

sprung the flow control alarm in this RNC.

From the daily performance data analysis, SPU CPU usage is hign and above 60% from 9:00

am to 18:00pm, Sub 5:0 even always keeps above 70% :

VS.MeanCPUUtil.SPU(%)

80
70
60
50
40
30
20
10
0
00

00

00

00

00

00

00

00

00

00

0
:0

:0

:0

:0

:0

:0

:0
0:

1:

2:

3:

4:

5:

6:

7:

8:

9:

10

11

12

13

14

15

16

WSPU:1:0 WSPU:1:1 WSPU:3:0 WSPU:3:1 WSPU:4:0 WSPU:4:1


WSPU:5:0 WSPU:5:1

The following is average CPU usage for all subsytem, it is around 65% from 10:00 to 18:00:

2010-02-17 HUAWEI Confidential Page1, Total14


Issue Of CPU Overload For RNC201

Average of MeanCPUUtil.SPU(%)

70
60
50
40
30
20
10
0
00

00

00

00

00

00

00

0
00

00

00
:0

:0

:0

:0

:0

:0

:0
0:

1:

2:

3:

4:

5:

6:

7:

8:

9:
10

11

12

13

14

15

16
8

28
-2

-2

-2

-2

-2

-2

-2

-2

-2

8
-2

-2

-2

-2

-2

-2

-2
1-
-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1
-
10

10

10

10

10

10

10

10

10

10
10

10

10

10

10

10

10
20

20

20

20

20

20

20

20

20

20
20

20

20

20

20

20

20
1.2 Mass of SHO lead to CPU overload

The main factors which impact on SPU CPU usage are following: RRC Connect sub per BH,

CS and PS sub per BH and Handover times at BH.

From the following statistic, SHO times occupy 40% of the CPU usage.

CPU usage per procedure per sub

RRC Connect sub per BH


26%
CS voice call per CS
40% voice sub per BH

PS call per PS sub per


BH

15% Handover times per


call (Intra RNC
soft&softer handover)

19%

And SHO Att distribution in one day is almost same with the CPU usage distribution, both of

them have the same change trend:


2010-02-17 HUAWEI Confidential Page2, Total14
1.4
1.3
20
10
-1
-2
8

0
100000
200000
300000
400000
500000
600000
700000
800000
20

2010-02-17
10 0:

0
10
20
30
40
50
60
70

-1 00
Time -2
20 8
2009-12-7 11:30 10 1:
-1 00
-2
2009-12-7 15:30 20 8
10 2:
-1 00
2009-12-7 19:30 20
-2
8
10 3:
2009-12-7 23:30 -1 00
-2
20 8
2009-12-8 3:30 10 4:
-1 00
-2
2009-12-8 7:30 20 8
10 5:
-1 00
2009-12-8 11:30 -2
20 8
2009-12-8 15:30 10 6:
-1 00
-2
2009-12-8 19:30 20 8
10 7:
-1 00
2009-12-8 23:30 -2
8

Time Backward to locate the issue


20
10 8:
2009-12-9 3:30 -1
-2
00
20 8

HUAWEI Confidential
2009-12-9 7:30 10
-1 9 :0
-2 0
2009-12-9 11:30 20 8
10 10
-1 :0
Average of MeanCPUUtil.SPU(%)

VS.SHO.Att.RNC
2009-12-9 15:30 20
-2
8
10 11
2009-12-9 19:30 -1 : 00
-2
20 8
2009-12-9 23:30
Issue Of CPU Overload For RNC201

10 12
-1 :0
-2 0
2009-12-11 4:00 20 8
10 13
-1 :0
2009-12-11 8:00 -2 0

Impact on KPIs of RRC Succ rate and Paging succ rate


20 8
VS.SHO.Att.RNC

2009-12-11 12:00 10 14
-1 :0
-2 0
2009-12-11 16:00 20 8
10 15
-1 :0
2009-12-15 13:00 -2 0
8
16
2010-1-28 0:00 :0
09/12/2009, and it is just the date when RNC201 cutover from KD3MSC1 to AB3MSC1.

0
2010-1-28 4:00
From the history performance data analysis, the SHO Att number increased from the time

2010-1-28 8:00
0

Because of high signal load leading to CPU overload, at the same time this issue springs the

Page3, Total14
2010-1-28 12:00
100000
200000
300000
400000
500000
600000
700000
800000

2010-1-28 16:00
Issue Of CPU Overload For RNC201

flow control including the paging and RRC connect.

From the following gragh, we can see that Paging succ rate and RRC SSR both declined at

busy hour from date 09/12/2009.

% Pa g in g S u c c Ra te

100%

90%

80%

70%
% Paging Succ Rate

60%

50%

40%

30%

20%

10%

0%
11-22

11-29

12-6

12-13

12-20

12-27

1-3

1-10

1-17

1-24

1-31

2-7
A B JR NC2 0 1 IB D RN C6 0 1 K A N RNC 5 0 1 T o ta l A v erage K A DRN C8 0 1

% RRC S S R ( s e r v ic e )

100%

95%

90%

85%
% RRC SSR (service)

80%

75%

70%

65%

60%

55%

50%
11-22

11-29

12-6

12-13

12-20

12-27

1-3

1-10

1-17

1-24

1-31

2-7

A B JRNC2 0 1 IB DRNC6 0 1 K A NRNC5 0 1 To ta l A v erage K A DRNC8 0 1

2 Issue Further Analysis

According to performance data analyisis, the main reason for high utilization of SPU CPU is

too many SHO numbers since MSC cutover 09/12/2009.

2010-02-17 HUAWEI Confidential Page4, Total14


2.1

0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
0
100000
200000
300000
400000
500000
600000
700000
800000

2010-02-17
Time(As hour)
Time
2009-12-8 1:00
2009-12-8 19:00 2009-12-7 11:30
2009-12-9 13:00 2009-12-7 15:30
2009-12-10 7:00 2009-12-7 19:30

statistic it is more obvious.


2009-12-11 1:00
2010-1-3 17:00
2009-12-7 23:30
2010-1-4 11:00 2009-12-8 3:30
2010-1-5 5:00 2009-12-8 7:30
2010-1-5 23:00
2010-1-6 17:00
2009-12-8 11:30
2010-1-7 11:00 2009-12-8 15:30
2010-1-8 5:00 2009-12-8 19:30
2010-1-8 23:00
2009-12-8 23:30
2010-1-9 17:00
2010-1-10 11:00 2009-12-9 3:30
2010-1-11 5:00

HUAWEI Confidential
2009-12-9 7:30
2010-1-11 23:00
2009-12-9 11:30
2010-1-12 17:00

VS.SHO.Att.RNC
VS.SHO.Att.RNC

2010-1-13 11:00 2009-12-9 15:30


2010-1-14 5:00 2009-12-9 19:30
2010-1-14 23:00
2009-12-9 23:30
Issue Of CPU Overload For RNC201

2010-1-15 17:00
2010-1-16 11:00 2009-12-11 4:00
The difference of SHO before and after MSC cutover

2010-1-17 5:00 2009-12-11 8:00


2010-1-17 23:00
2009-12-11 12:00
2010-1-18 17:00
2010-1-19 11:00
2009-12-11 16:00
2010-1-20 5:00 2009-12-15 13:00
2010-1-20 23:00
2010-1-28 0:00
From the above picture, Number of SHO Att increased suddenly from the morning of

2010-1-21 17:00
2010-1-22 11:00
2010-1-28 4:00
09/12/2009, and it always keeps a high level in busy hour since that. From the following daily

2010-1-23 5:00 2010-1-28 8:00


2010-1-23 23:00

Page5, Total14
2010-1-28 12:00
2010-1-24 17:00
2010-1-28 16:00
Issue Of CPU Overload For RNC201

2.2 Further analysis for number of SHO Att

The following analysis is based on the performance data of 7th Dec and 10th Dec. It shows the

difference of statistic about SHO counters including CS, PS and Signal Only. The blue line

indicates counters before MSC cutover, and the red line indicates the counters after MSC cutover.

CS

VS.SHO.AMR.AttOut:

PS :

VS.SHO.PS64.AttOut

2010-02-17 HUAWEI Confidential Page6, Total14


Issue Of CPU Overload For RNC201

VS.SHO.PS128.AttOut

VS.SHO.PS384.AttOut

VS.HSDPA.SHO.CellChg.AttOut:

2010-02-17 HUAWEI Confidential Page7, Total14


Issue Of CPU Overload For RNC201

Signal only period:

VS.SHO.SigOnly.AttOut

From the above statistics, one conclusion is that there is small change in CS domain and PS

domain before and after MSC cutover, But there is much increasing for signal only SHO attempts

after MSC cutover, the signal only SHO attempts even achieves to 230,000, and this is consistent

with the total increasing SHO attempts after MSC curover.

Signal only SHO happens during period after RRC connection and before RAB connection.

Usually, RAB assignment happens immediatelly after RRC connection, as the following shows

it only spends no more than 200 ms.

So it should have few times of signal only SHO druing this period, but why mass signal only

SHO come out after MSC cutover?

From the analysis of IOS trace, many scene as the following exist in the system:

2010-02-17 HUAWEI Confidential Page8, Total14


Issue Of CPU Overload For RNC201

According to above gragh, from time 10:25:11(55), UE requests service to CN, but it

receives RAB assignment from CN till time 10:25:17(50), This process spends almost 6S. It is

just because so long waiting time and radio network environment changes, signal only SHO

happens.

More examples as following:

2010-02-17 HUAWEI Confidential Page9, Total14


Issue Of CPU Overload For RNC201

2010-02-17 HUAWEI Confidential Page10, Total14


Issue Of CPU Overload For RNC201

Conclusion:After RNC cutover to new MGW, the new CN (new parameters

configuration )delays the RAB assignment from CN, and then leads to mass signal only SHO.

2.3 Prediction the CPU usagea after the issue of signal only SHO resolved

Here is a prediction for CPU usage after the number of signal only SHO return to the level

before MSC cutover. If number of signal only SHO reduces to the level before MSC cutover,

(remove 230,000 times), we can caculate the CPU usage according to the arithmetic theory of

CPU load:

In present network, the statistic period for performance data is 30 minutes, and there are 8

sub-systems. Taking 230,000 for calculation, we can get result: Times per BH: 230000/1800/8 =

15.972,

According to test from lab, the main consumption of CPU are following,
Control plane traffic Unit CPU usage per
parameter procedure

RRC Connect sub per BH times 0.90%

CS voice call per CS voice sub per


times 1.97%
BH

PS call per PS sub per BH times 3.20%

Handover times per call (Inter/Intra


times/call 0.48%
RNC soft&softer handover)

So 230,000 Signal only SHO consume CPU: 15.972*0.48%=7.67%

That means the present CPU usage can reduce 7.67 percentage, it just get the CPU usage

2010-02-17 HUAWEI Confidential Page11, Total14


Issue Of CPU Overload For RNC201

before MSC cutover. Because now in busy hour the Ave CPU usage is 64.87%, if we minus

7.67%, the CPU usage will be under 60%, then it avoids the CPU overload problem.

2.4 Comparison analysis

The location for UE request service to CN(RRC_UL_DIR_TRANS) and RAB assignment from

CN (RANAP_RAB_ASSIGNMENT_REQ) is as following which marked by blue line,

In this process, RNC makes a role for message transfer, but CN makes a key role for message

resolution.

Here are analysis compare of IOS trace for RNC201 between before and after MSC cutover

and analysis compare of IOS trace between RNC201 and RNC601:

2010-02-17 HUAWEI Confidential Page12, Total14


Issue Of CPU Overload For RNC201

TIME
FROM-CN Duration Average
URNTI FROM-UE
RRC_UL_DIR_TRANSF
RANAP_RAB_ASSIGN (s) (s)
MENT_REQ
210863312 16:30:23.24 16:30:23.93 0.69
RNC201 210867568 16:30:21.47 16:30:23.29 1.82
210872472 16:29:37.70 16:29:38.54 0.84
Before 210872832 16:30:24.22 16:30:24.95 0.73
1.23
MSC 210875112 16:27:38.53 16:27:41.14 2.51
Change 210876360 16:27:52.05 16:27:52.43 0.38
210879352 16:30:55.53 16:30:55.90 0.37
210880216 16:30:15.46 16:30:17.92 2.46
210931288 10.25.11.55 10.25.17.50 5.95
RNC201 210933344 10.21.44.49 10.21.45.47 1.00
210933648 10.57.20.16 10.57.24.51 4.35
210934288 10.57.38.90 10.57.40.23 1.33
After 4.12
210935168 10.19.06.20 10.19.12.29 6.09
MSC 210936784 10.22.05.43 10.22.09.50 4.13
Change 210938592 10.27.51.41 10.27.55.90 4.49
210943248 10.46.00.13 10.46.05.63 5.60
630196824 11.01.34.86 11.01.35.35 0.49
630201048 11.20.31.52 11.20.31.91 0.40
630203544 10.47.26.50 10.47.26.92 0.42
IBADAN 630208312 11.06.58.38 11.06.58.78 0.40
630211048 11.01.40.46 11.01.40.89 0.43 0.43
RNC601
630211776 10.40.48.70 10.40.49.08 0.38
630325360 10.47.49.04 10.47.49.42 0.38
630325704 10.50.44.58 10.50.45.00 0.42
630326120 11.01.49.70 11.01.50.12 0.58

It is obvious from above, after RNC201 cutovered to new MSC, the ave duration between

two signal messages is 4.12s, even the longest is 6.09s, But before MSC cutover, the ave duration

is 1.23s; By the way, the duration of RNC601 which belong to the other MSC is only 0.43.

3 Conlusion

Here is a conclusion for the analysis:

1. SPU CPU usage is high because signal overload, and the high usage excess threshold of

flow control, so flow control is sprung.

2. The main factor for high SPU CPU usage is SHO times.

3. Mass SHO comes out since MSC cutover.

4. The increased SHO mainly comes from signal only SHO.

5. Mass signal only SHO is because of the RAB assignment delayed by CN.

6. Predicted calculation indicates that if the signal only SHO can return the level that before

MSC cutover,SPU CPU usage will decline by 7 percentage.


2010-02-17 HUAWEI Confidential Page13, Total14
Issue Of CPU Overload For RNC201

4 Suggestion

Combined the above analysis, the long duration comes from CN. So further investigation

should be carried at CN side.

Here is some suggestion should be checked from CN:

1.After MSC cutover, Is there any parameters adjustment, especilally the parameters about

controling time of RAB assignment?

2.Compare KD3MSC1 and AB3MSC1, Is there any difference between the function

performance or capacity?

3.Are there many equipments under AB3MSC1?

4.Is there any adjustment at HLR after MSC cutover?

2010-02-17 HUAWEI Confidential Page14, Total14

Vous aimerez peut-être aussi