Vous êtes sur la page 1sur 7

Infrastructure

Monitoring: Solving
& Predicting
Flapping Events

11/21/16

Prodapt Confidential @ 2015

Slide 1

Presentation Structure temp slide

11/21/16

Infrastructure Monitoring and Flapping


Top Flapping Events
Top 5 Flapping Events by Occurrence
Root Causes

Prodapt Confidential @ 2015

Slide 2

Infrastructure Monitoring & Flapping


Events

Approximately 50% of the tickets raised during monitoring are flap


events. Event flaps or flapping events are events that occur when a
service or host changes state too frequently. This results in a huge number
of notifications, which can be indicative of either transient or real network
problems.

Event flapping are major source of headache for data centers


as they cause:
A lot of resource wastage
Decreased performance
Diminished customer experience

11/21/16

Prodapt Confidential @ 2015

Slide 3

Top Flapping Events


Flapping Events
Flapping Events
that Affect
Visibility of Critical
Events

Flapping Events
that Affect
Productivity

Device Failed Availability Check:


Component device xxxx is not
available: Event triggers when a
component in a device is unavailable.
Contributes the most to the events that
are flapping.

Site Down: Event triggers when a


site is down for a specific threshold
period.
Port Not Responding: Event
triggers when a specific port is down
or not responding.

Network Latency: Event triggers when


a packet sent from one hop to another
exceeds the threshold time.

Disk Space: Event triggers when


space occupied in a drive has
exceeded the threshold.

Site Down: Event triggers when a site is


down for a specific threshold period.

CPU: Event triggers when CPU


capacity of a device exceeds the
threshold set.

App Snippet: These are related to the


actual dynamic applications in science
logic - the things that govern what gets
monitored on a device.
SNMP Down: Event triggers when the

11/21/16

Physical Memory: Event triggers


when the RAM capacity of a device
exceeds the threshold set.
Prodapt Confidential @ 2015

Slide 4

Top 5 Flapping Events by Occurrence


Occurrence of Top 5 Flapping Events
30%
25%
20%
15%
10%
5%
0%

Device Failed Availablity Check: Component device xxxx is not available

bove data reflects the flapping event occurrences over a period of one month wh
onitoring infrastructure for one of the tier-1 managed hosting provider.
11/21/16

Prodapt Confidential @ 2015

Slide 5

Root Causes
Component Device
xxxx Is not Available
Sometimes, events
trigger for those
components/device
s also that arent
necessary.

Network Latency

Site Down

When network
latency is present,
dependent systems
can sometimes
raise false events
as they are waiting
for inputs from
other systems.

When timeout
threshold limit is
very low and
response takes little
bit higher time
these events
trigger.

App Snippet Down


When a device
goes down or
network
experiences any
issues these
events trigger.

11/21/16

SNMP Port not


Responding
When a device
goes down, it
triggers the
alerts for
dependent
devices too
which are false
events.

Prodapt Confidential @ 2015

Slide 6

Thank You

11/21/16

Prodapt Confidential @ 2015

Slide 7

Vous aimerez peut-être aussi