Vous êtes sur la page 1sur 24

Distributed Fault Management approach for Next Generation Networks

CHAPTER 1

INTRODUCTION

Most architectures of currently deployed network management systems (NMS) for


telecommunication networks can be characterized as centralized and hierarchical
While marking a great achievement in making operation more efficient, such NMS
solutions require powerful machines because of the complex logic and large amount
of management information to be processed, furthermore they involve costly
redundancy mechanisms in order to avoid single point so failure hierarchy
additionally introduces different levels of abstraction - from element management of
individual network elements (NEs) up to business management at the top of the
Telecommunication Management Network (TMN) pyramid Management information
flow in both directions - up and down this static hierarchy- is cascaded, with
information mapping performed at each layer. From a functional point of view, the
five FCAPS disciplines are rather separated. Interactions between these disciplines
typically happen at a higher management layer, or even through a human operator.
This approach to manage communication networks is very well understood and works
well for classic telecommunication networks; plenty of technically mature systems
that incorporate that approach have been on the market for years. The last couple of
years, however, have revealed several emerging technologies like Voice over IP
(VoIP) and IP-TV, ubiquitous and pervasive computing, new wireless technologies,
and various implications of ad-hoc and peer-to-peer (P2P) networks. This results in
interesting Next Generation Network (NGN) scenarios that have already been (or are
about to be) realized by network operators.
Even if these NGN scenarios cover a wide spectrum of use cases and technologies,
they have a few characteristics in common:
(1) Large scale - up to 106 possibly small nodes
(2) Heterogeneity - different 11W and SW platforms, different vendors, different
access and communication protocols and
(3) Dynamics - nodes might appear and disappear regularly, and topology changes
will be the rule rather than the exception.
In a typical state-of-the-art fault management system the main fault processing logic
resides in a powerful correlation engine at the top of the processing chain where a lot

Dr. AIT, Dept. of ISE 2010-2011 1


Distributed Fault Management approach for Next Generation Networks

of alarms from each network node are received. Using sophisticated statistical
analysis this engine might find out the most probable root cause for a given sequence
of alarms. Further, in order to be meaningful to an operator, the information contained
in this large amount of alarms has to be enriched by correlation against data bases
containing topology and other configuration related information, stored on the central
server. If the number of emitting nodes and alarms exceeds a certain critical value
such a centralized correlation engine will become a bottleneck. Moreover, the
heterogeneity of NEs increases the complexity of the system, and finally - maybe the
most critical issue – if static information on network topology is used for fault
analysis, the system will inevitably suffer from inconsistencies as soon as this
topology exhibits dynamic aspects.
Motivated by these issues we will present a conceptually different fault management
design that is based on a peer-to peer (P2P) approach to network management
suggested in the CELTIC research project Madeira

Dr. AIT, Dept. of ISE 2010-2011 2


Distributed Fault Management approach for Next Generation Networks

CHAPTER 2

NETWORK MANAGEMENT GOALS AND REQUIREMENTS

2.1 OPERATIONAL GOALS

• Proactive monitoring of network infrastructure and service levels.


• Streamline network operations functions through NMS tools optimization.
• Scalability of NMS architecture to support new network technologies such as
Multiprotocol Label Switching (MPLS), wireless, quality of service (QoS), and others
• Increase the ability to detect soft failures at the protocol, hardware, system software,
and interface levels
• Help enable proactive maintenance to be performed by the network operations
center (NOC) support team upon detecting faults or performance degradation
• Help enable intelligent forwarding of network events to the NOC

2.2 FUNCTIONAL REQUIREMENTS

Given the high-level manageability goals outlined above, the following sections
highlight the functional requirements in specific network management functional
areas.

Fault Management
Fault management encompasses the discipline of identifying faults in a network
environment. Faults are identified by receiving events such as syslog and Simple
Network Management Protocol (SNMP) traps from network devices, polling network
device MIBs, and identifying real or potential error conditions and setting thresholds
that trigger events. In addition, the NMS should be able to provide event correlation
as well as reporting and tracking. The NMS used should also provide a northbound
interface for exporting critical messages to a higher level manager or MoM (manager
of managers).
In an ideal environment, the fault manager would collect both syslog and SNMP
information, filter that information, and pass the filtered data to a MoM for further
processing. This method helps decrease the amount of data that an end user needs to

Dr. AIT, Dept. of ISE 2010-2011 3


Distributed Fault Management approach for Next Generation Networks

see or react upon. The MoM, in turn, can provide further analysis and automation
based on the incoming event streams such as verifying down circuits, testing
connectivity, and opening trouble tickets based on those findings.

Unmanaged Events
Stand-alone fault managers are used to gather event data from devices throughout the
network and report their findings. They have little to no capability of automating
reactions based on gathered data. When a message comes into the fault manager, the
typical course of action is simply to report the fault to a screen being monitored by
operations personnel.

Managed Events
By employing the use of a MoM, your system can react to these events automatically,
which can drastically reduce downtime in mission-critical networks. For example,
when an event comes in from the fault manager, the MoM can:
• Verify connectivity to the reported down device/interface by ping/Telnet or other
means
• Gather information about the device such as vendor, serial number, location, contact
information, circuit IDs, and site IDs, and so on from a device inventory database
• Attach historical reports gathered from other NMSs such as bandwidth, CPU,
memory, and so on
• Open a trouble ticket automatically and have that ticket prepopulated with important
information from the device information database
This method would not only relieve operations personnel from having to look up the
information for an outage, but would save critical time in bringing the fault to a
resolution.

Event Correlation
Event management encompasses event-correlation and root-cause analysis. It allows
for multiple input streams from various network devices and environments and, using
knowledge of the network topology and a sophisticated rule set, attempts to identify
the source or root cause of a network fault or problem.
• At the top level (MoM), event correlation features should be supported to aggregate
and correlate incoming alarms. The system needs to have the intelligence to correlate

Dr. AIT, Dept. of ISE 2010-2011 4


Distributed Fault Management approach for Next Generation Networks

event types (SNMP, syslog, and so on) as well as to provide automation of tasks
based on event criteria.
• Filtering capability should be supported to selectively display relevant alarms.
• The system should be capable of escalating critical alarms based on the number of
occurrences and time delays in acknowledgement.
• Alarm severity should be customizable based on end-user or operational needs.
• Alarm properties and escalation should be policy based, dependent on the role of the
device in the network.
• The system should be able to virtually partition the managed network into multiple
logical entities based on geographical locations.
• The fault management system should support role-based access to fault events based
on job responsibilities.
• A knowledge base consisting of troubleshooting guidelines or methodologies should
be part of the fault management system. This is to facilitate rapid problem isolation on
network-related issues.
• The system should provide integration between the fault and the inventory
management system to support auto population of information.
• Integrate between the inventory system and the trouble-ticketing system for auto
population of relevant trouble ticket fields.
• The system should provide the flexibility to forward traps and alarms to a different
location/system for after-hours monitoring.

Log Management
Logging is a critical part of network management. Good logs can help you find
configuration errors, understand past intrusions, troubleshoot service disruptions, and
react to probes and scans of your network. Cisco devices have the ability to log a
great deal of their status.
Syslog is also a great resource for network compliance, allowing companies to adapt
quickly to changing regulations such as Sarbanes Oxley (SOX), Control Objectives
for Information and related Technology (COBIT), IT Infrastructure Library (ITIL),
Gramm-Leach-Bliley Financial Modernization Act (GLBA), Visa Card Holder
Information Security Program (Visa CISP), Payment Card Industry (PCI) Data

Dr. AIT, Dept. of ISE 2010-2011 5


Distributed Fault Management approach for Next Generation Networks

Security Standards, Health Insurance Portability and Accountability Act (HIPAA),


Committee of Sponsoring Organizations (COSO) of the Treadway Commission, and
custom regulations.
Defining all aspects of a syslog server is outside the scope of this document.

NMS North and Southbound API Interfaces


Communication between multiple network management systems is extremely
important for event correlation and data aggregation. Most, if not all, NMSs should be
able to communicate bidirectional. This helps ensure the ability to provide correlated
events as well as the coordination of data sources throughout the network such as
inventory, access, performance data, and so on.

Dr. AIT, Dept. of ISE 2010-2011 6


Distributed Fault Management approach for Next Generation Networks

CHAPTER 3

HIERARCHICAL APPROACH TO NETWORK MANAGEMENT

Layering of network management not only allows NMS systems to communicate


better, it reduces the amount of alerts seen by network operations support staff. At the
lowest layer, it is nearly impossible to keep up with events displayed from each
network element reported in the NMS architecture. For example, it is not feasible to
have someone watching every syslog event that occurs on the network. Instead, you
rely on systems at the Network Management Layer (NML) to filter through all events
and show only those events deemed as most important. The Service Management
Layer (SML), meanwhile, is used to further summarize events from the NML and tie
multiple network management systems together. A good NMS system will also
provide reduplication of these network events in order to further reduce the amount of
unnecessary messages seen by operations personnel.
The hierarchical model in Figure 1 shows the major components that make up a
comprehensive NMS system and provides a high-level integration scenario. Cisco
Advanced Services encourages the adoption of a layered, hierarchical network
management system. This type of architecture involves data flow and integration of
multiple NMS tools to be effective. Figure 1 depicts those tool and data relationships.

Figure 1: Hierarchical Network Model

Dr. AIT, Dept. of ISE 2010-2011 7


Distributed Fault Management approach for Next Generation Networks

The underlying hierarchical philosophy is to get the organization to a basic level of


integrated network management. The foundation for this architecture comes from the
Telecommunications Management Network (TMN) (M.3000) model. "TMN provides
a framework for achieving interconnectivity and communication across heterogeneous
operations system and telecommunication networks. To achieve this, TMN defines a
set of interface points for elements which perform the actual communications
processing (such as a call processing switch) to be accessed by elements, such as
management workstations, to monitor and control them. The standard interface allows
elements from different manufacturers to be incorporated into a network under a
single management control."

Element Management Layer


The first level, the Element Management Layer, defines individual network elements
used in deployment. In defining this layer, for each anomaly that occurs in the
network, potentially multiple devices can be affected by the event and can
independently alert network management systems that an event has occurred resulting
in multiple instances of the same problem.

Network Management Layer


In the middle of the diagram is the Network Management Layer. This function takes
input from multiple elements (which in reality might be different applications),
correlates the information received from the various sources (also referred to as
root-cause analysis), and identifies the event that has occurred. The NML provides a
level of abstraction above the Element Management Layer in that operations
personnel are not "weeding" through potentially hundreds of Unreachable or Node
Down alerts but instead are focusing on the actual event such as, "an area-border
router has failed."

Service Management Layer


At the top of the diagram is the Service Management Layer. This layer is responsible
for adding intelligence and automation to filtered events, event correlation, and
communication between databases and incident management systems. The goal is to
move traditional network management environments and the operations personnel

Dr. AIT, Dept. of ISE 2010-2011 8


Distributed Fault Management approach for Next Generation Networks

from element management (managing individual alerts) to network management


(managing network events) to service management (managing identified problems).

As an evolution of TMN, the TeleManagement Forum, or TMF, proposed the


Telecom Operation Map (TOM) and, more recently, the eTOM. These models
describe at a high level the processes a telecom operator needs to fulfill to manage its
network and services infrastructure. Furthermore, TMF has defined the NGOSS
architecture, which is a technology agnostic framework for the construction of
management applications. NGOSS fosters component based architecture with
interfaces between components defined as contracts, a shared information model and
the separation of implementation from business logic.
An interesting implementation of NGOSS concepts is OSS/J, which pursues the
implementation of a set of APIs, based on J2EE technologies, to allow the integration
of OSSs . It is also worth mentioning the 3GPP approach to network management,
which is based on the IRP (Integration Reference Point) concept. The IRP is
analogous to the TMN Q3 reference point, improving it by defining the information
models in an implementation independent UML.
An important assumption behind TMN and similar frameworks is that network
elements have limited management capabilities, being focused on their
communications role. Therefore, management functions are performed externally by
dedicated systems; while network elements only provide simple management agents
to allow these external systems access and manipulate management data. It was
recognized considerable time ago, however, that network devices can do much more
than running a simple agent, being even capable of managing themselves.
This is the approach taken in IP networks, where nodes are able to perform
management tasks for routing, signaling, path provisioning, etc. Following this trend,
the control plane paradigm has emerged in telecom networks, giving more autonomy
to network elements for certain tasks. Particularly in optical networks, research is
being conducted to create an optical control plane that enables automated
multi-vendor network operation. Examples are the ITUs Architecture for
Automatically Switched Optical Networks (ASON) and the IETFs Generalized
Multi-Protocol Label Switching (GMPLS) . However, decentralized standards are not
(yet) available for wireless networks. Note that the existence of a control plane does
not mean that the management plane is no longer necessary, but rather that it must
Dr. AIT, Dept. of ISE 2010-2011 9
Distributed Fault Management approach for Next Generation Networks

adapt to the new scenario, focusing on the tasks for which it is better suited and
relying on the control plane for other tasks such as routing and signaling .
Benefits of Hierarchical Layers

From a practical perspective, integrating these elements involves:


• Assembling a robust set of event correlation rules that consistently and
accurately identify the source of an event.

• Opening a trouble ticket in an incident management application that


operational personnel begin working on

This helps enable an operations organization to:


• Proactively manage the network.
• Identify and correct potential network issues before they become problems.
• Prevent a loss of network connectivity, thus ensuring organizational productivity.
• Focus on the solution instead of the problem.

But some of the disadvantages of hierarchical network management are

• In the past years, TMN has been the dominant network management
framework. It promotes a well-known centralized approach which has a
number of effects on the scalability of the network management application.

• Managing a large network from a single, central point will increase the load of
the central manager and could create bandwidth bottlenecks on links that are
close to that central manager.

• Another disadvantage is the lack of flexibility, since the current generation of


management architectures use static topology data, often based on manually
generated files.

Dr. AIT, Dept. of ISE 2010-2011 10


Distributed Fault Management approach for Next Generation Networks

Peer to Peer
In a P2P system, the nodes have a significant or total degree of autonomy from central
servers. As pointed out by, P2P systems enable the utilization of previously unused
resources as storage, cycles or content for example, by tolerating and working with
the variable connectivity of numerous devices. An overall characteristic of a peer-to-
peer network is that the nodes can send and receive information in a way that makes
them both servers and clients, or “servants”. In both and, a distinction is made
between pure peer-to-peer networks and hybrid peer-to-peer networks, in such a way
that:
Pure P2P architectures are completely decentralized: There is no central server or
router. Each node can issue and respond to requests, or route requests to other nodes.

In Hybrid P2P architectures, more types of nodes exist: The leaf nodes are nodes
with an information need or information resource. In other words, they can provide
information to or request information from other leaf nodes. Another type of nodes,
super peers, has a more “server-like” role in the network. These nodes provide
regionally centralized services to the network in order to improve the routing of
information requests. In these nodes are called directory nodes or ultra peers. Each
directory node provides directory services for portions of the network and directory
nodes work in a cooperative manner to cover the whole network.

Figure 2: Pure Peer-to-Peer (left) and Hybrid Peer-to-Peer (right)

Dr. AIT, Dept. of ISE 2010-2011 11


Distributed Fault Management approach for Next Generation Networks

CHAPTER 4
MADEIRA ARCHITECTURE

In an attempt to overcome the shortcomings of the traditional management


approaches to face the challenges of next generation telecommunication networks,
Madeira aims to develop a new management framework based on peer-to-peer
networking concepts. Furthermore, it provides novel technologies for a logically
meshed Network Management System that facilitates self-management and dynamic
behavior of nodes within the network. Madeira also takes advantage of the Policy
Based Management Paradigm that pursues the separation of management logic from
the actual applications. This logic is then specified as a set of rules or policies that can
be dynamically fed into the management system allowing a change of its behavior
without the need of changing the application or even restarting it. Besides the
innovative architectural framework, the Madeira project will provide interface
protocols, standards and a reference software implementation and apply it to a
specific network management scenario. Ultimately, by enabling the management of
network elements of increasing numbers, heterogeneity and transience, the Madeira
approach should reduce the Operational Expenses, or OPEX.
Madeira focuses on Fault and Configuration Management functional areas and,
especially, on the way they can co-operate to solve management problems. In doing
so, it will act as a complement to traditional management systems. There are many
management tasks that current network management systems perform well, such as
Performance Management. For these tasks, a hierarchical approach is entirely
appropriate. Madeira will investigate the feasibility of distributing management
responsibilities among peer nodes in order to perform certain tasks more efficiently.
In other words, the Madeira management approach is applied to management tasks
that are difficult to carry out using conventional methods, or tasks that can be carried
out more efficiently using a distributed approach.

Dr. AIT, Dept. of ISE 2010-2011 12


Distributed Fault Management approach for Next Generation Networks

Figure 3: Madeira Architecture

Based on the approach of applying P2P concepts to the management domain the
Madera architecture has been designed with a number of key principles in mind .The
most important of these principles is heterogeneity’ or the ability of the Madeira
system to be applied to many management domains and across heterogeneous devices
and platforms. The main vehicle for this generic management is the usage of policies,
notifications and applications.
The Madeira architecture is essentially composed of an Adaptive Management
Component (AMC) and a Platform. The AMC is the component that manages a given
node and together many AMCs can orchestrate the overall behavior of the meshed
network. These AMCs have the ability to exchange and export Network Management
information between peer management applications and are deployed as an overlay
network, communicating using the peer-to-peer paradigm. The AMC itself is
composed of a number of sub-elements which facilitate this management.
The AMC Core is the primary component or brain’ of the AMC and, based on
notifications and policies, orchestrates the services and applications to facilitate the
required network management function. Services are components that provide some
functionality required by the AMC Core. Applications provide the actual

Dr. AIT, Dept. of ISE 2010-2011 13


Distributed Fault Management approach for Next Generation Networks

management functionality specific to a particular device. Applications are physically


divided into parts that run within each AMC but are logically connected through peer
interactions. For example each AMC will run some Fault Management application
which together with FM applications running on other nodes constitutes the overall
FM application of the entire meshed network. Policies provide the generic
management functionality shared by all AMCs within the same management domain.
The generic management specified in policies is then mapped to applications.
Notifications enable inter-AMC communication, via the Madeira platform, about
events within the mesh network and also provide a means to logically distributed
applications.

The following groups of services are available in an AMC:

The Configuration Management and Fault Management contain the specific


network management applications. They and provide the ability to setup the network,
react to faults and other FM and CM related tasks. A description of how these tasks
are performed will be described in the section dedicated to the scenario.

The Northbound Interface is optional. It offers services that communicate with a


higher layer Operation Support System (OSS) via Web Services. The OSS can, for
example, retrieve information like network topology, events or alarms. More
information on the connection with an external OSS will be given in the section
“Connecting to the North”.
The AMC Specific Services offers a base for the Network Management
Applications. It mainly provides services to communicate with other AMCs. This can
be either publish-subscribe based, or a direct peer to- peer connection to another AMC

Dr. AIT, Dept. of ISE 2010-2011 14


Distributed Fault Management approach for Next Generation Networks

CHAPTER 5
THE MADEIRA SCENARIO

The goal of the scenario is to prove the capabilities of the Madeira approach to
deal with real life management problems. It provides a number of challenging
tasks to test the management approach. To emphasize the strengths of Madeira,
the problems that arise in the scenario are difficult to solve with traditional
management approaches, especially with respect to dynamic reconfiguration and
changing topologies that occur in Wi-Fi networks.
The scenario focuses on the areas of Configuration Management and Fault
Management, with an emphasis on the integration between both of them. In the
scenario, a number of wireless base stations are deployed in such a way that
wireless equipment (for example, laptops or PDAs) may have coverage from one
or more base stations. Not every base station has a wired connection to the back
haul network, as is the case in a traditional wireless network. Base stations
directly connected to the backhaul network are called gateways.

Configuration management
The rest of the base stations can only use a wireless connection to reach a gateway
and thus the backhaul network. After deploying the base stations, Madeira
automatically sets up the wireless meshed network using OLSR (Optimized Link
State Routing protocol) as the routing algorithm. OLSR is a link stated routing
protocol that is specifically developed for mobile ad-hoc networks . Based on
pre-installed policies, base stations are grouped into a number of clusters by the
Grouping Service. These policies can be based on a number of criteria such as, for
example, number of nodes per cluster or topological proximity.
Figure 4 depicts an example topology that could be the result of this process. As
can be seen, clusters may or may not have direct backhaul connectivity. The network
elements in a cluster monitor each other and exchange management information on a
peer-to-peer basis. As mentioned, wireless network equipment that wants to

Dr. AIT, Dept. of ISE 2010-2011 15


Distributed Fault Management approach for Next Generation Networks

Figure 4: Management clusters formed in wireless Mesh Network

Dr. AIT, Dept. of ISE 2010-2011 16


Distributed Fault Management approach for Next Generation Networks

Figure 5: Management Cluster Hierarchy

Dr. AIT, Dept. of ISE 2010-2011 17


Distributed Fault Management approach for Next Generation Networks

use the network is in range of one or more base stations. If the wireless equipment is
in range of more than one base station, it selects one of them as its preferred base
station, and uses this connection to use services on the Internet for example. If the
wireless equipment is in range of just one base station, it must select that base station
as its preferred base station.
Each cluster has exactly one Cluster Head. Policies are used for this election, and can
be based on criteria like load, optimal connectivity or robustness. The Cluster Head is
responsible for coordination and topology publishing of its cluster. Different levels of
clustering can exist in Madeira.
The creation of this hierarchy is also based on policies. As mentioned in the previous
section dedicated to the architecture, the top level Cluster Head is responsible for
publishing the topology of the complete network. This can be done to a higher layer
Operation Support System or another Network Management System for example. The
cluster hierarchy is the basic management overlay that is used by all management
functionality in Madeira. It creates a scalable environment for network management.
The Madeira Configuration Management application is responsible for the
construction, maintenance and viewing of the topology. Other applications, such as
Fault Management, use this management overlay to implement their functionality.

Fault management
During usage of the network, it is inevitable that unexpected faults occur. When such
a fault occurs, it is important that: Appropriate action is undertaken quickly in order to
reduce the service impact. Meaningful information on the fault is presented to the
operator (in particular in those cases where automatic restoration is not or not fully
possible). Besides Configuration Management (CM), Madeira focuses on Fault
Management (FM) and how CM actions and events are related to FM faults and
alarms. Correlation between CM events and FM faults is an important aspect in order
to discover the actual cause of a problem in the network.
Alarms can be generated by two different sources:
1. Hardware level alarms are generated by the base station in case of a hardware
fault.
2. Platform level alarms are generated by either the Directory Service or the CM
application. The directory Service can indicate loss of connection with a neighboring
node, and the CM application can indicate changes in the topology (a node leaves or

Dr. AIT, Dept. of ISE 2010-2011 18


Distributed Fault Management approach for Next Generation Networks

joins a cluster). When a fault occurs, the FM application will receive one or more
alarms. For example, a hardware level problem say also cause a fault on platform
level, creating two alarms. These alarms are correlated into a new alarm by FM and
sent to the Cluster Head. This Cluster Head also performs correlation of the alarm
with alarms originating from other nodes in order to get a clearer picture of the
probable cause and possible solution. It can then forward the alarm to a higher
hierarchy level. This process is repeated until it reaches the Top Level Cluster Head,
which can notify the Northbound Interface in order to produce an alarm for the
external OSS. This paper provides a few example scenarios in order to explain the
basic concepts and functionality of the FM application in Madeira. These scenarios
describe two faults with similar impact but very different in nature (in the first case a
node goes down, while in the second there is just a loss of connectivity between two
nodes), and focused on the way Madeira distinguishes these two cases by correlating
alarms and CM events at different levels of the management hierarchy.

Base Station E outage

Figure 6: Base Station Outage

Figure 6 depicts two Madeira Management Clusters, with node A and node G being
the Cluster Heads. The solid lines indicate “physical” OLSR links between nodes. The
dotted line represents an inter-cluster connection between node E and node F. When
node E fails, the Directory Service of its one-hop neighbours D and F will notice this.
Both nodes will notify their Cluster Heads that the link with node E has failed, and
that therefore E might be faulty. Besides receiving this alarm from node D, Cluster
Head A also receives a notification from the CM application that node E isn’t part of
the cluster anymore. It will then send an alarm with this knowledge to a higher
hierarchy level.

Dr. AIT, Dept. of ISE 2010-2011 19


Distributed Fault Management approach for Next Generation Networks

When the Cluster Head G of node F receives the alarm that node E isn’t reachable, it
will also forward this alarm to a higher hierarchy level.
After receiving the alarms from node A and G, the higher level Cluster Head tries to
correlate the information with other alarms. Since both alarms contain the same
knowledge (a possible fault of node E) they will be merged to a single alarm with the
same content. Afterwards the alarm will be forwarded up the hierarchy pyramid, until
the Top Level Cluster Head will notify the NBI, which will inform the external OSS
on the failure of node E.

Link outage between node D and node E

In this scenario, shown in Figure 7, the link between node D and node E fails. The
link between E and F remains intact. Both nodes D and E will receive a notification
from their Directory Service indicating the neighboring node is no longer reachable.
Node D concludes E might be faulty and forwards this information to its

Figure 7: Link Outage

Cluster Head A, who also receives a notification from the CM application that E is no
longer part of its cluster. After combining this information, it will send an alarm to the
next hierarchy level, identical to the previous scenario. Besides receiving the
notification from the Directory Service, node E will recognize that it is no longer
connected to its cluster head and it will join another cluster (it is assumed E joins the
cluster containing F and G). After this reconfiguration process E will forward the
information that D is no longer reachable to its new cluster head G, which will
additionally receive a notification from the CM application that E has joined its
cluster and forward this information to the next hierarchy level.

Dr. AIT, Dept. of ISE 2010-2011 20


Distributed Fault Management approach for Next Generation Networks

The higher level Cluster Head of this next hierarchy level receives the alarm from A,
indicating that node D reported that E is unavailable and probably faulty. It also
receives a notification from the CM application indicating that node E has joined
another cluster, and an alarm from G indicating that node E reported that D is
unavailable and probably faulty. After correlating these three notifications, the Cluster
Head concludes that a link outage between D and E occurred and either suppresses
this alarm (if configured to do so) or forwards this knowledge as a minor alarm up the
hierarchy pyramid until the Top Level Cluster Head and NBI is reached, which will
then inform the external OSS on the link outage between D and E.

Dr. AIT, Dept. of ISE 2010-2011 21


Distributed Fault Management approach for Next Generation Networks

CHAPTER 6
FUTURE ENHANCEMENT

Currently the project is focusing on the implementation and integration of the


different components that encompass the Madeira solution. A prototype management
system dealing with specific Configuration and Fault management scenarios of Wi-Fi
networks will be developed and tested on top of a test bed provided by the partners.
At the time of writing, several aspects have already been implemented. The release of
a first prototype is planned for the beginning of this year. A second iteration
demonstrating the main aspects of Madeira will be finished at the end of the project,
in July 2006. Although it is not explicitly mentioned as being in the scope of Madeira,
some efforts are being undertaken to research security mechanisms for peer-to-peer
environments.

In such environments, where there is no centralized security solution, security and


trust are important aspects. A possible solution for this could be to use Public Key
Cryptography to create a web-of-trust, similar to the PGP approach . This should not
be mistaken with the FCAPS Security Management area, which focuses on authorized
use of the network, data integrity and confidentiality for example.

Dr. AIT, Dept. of ISE 2010-2011 22


Distributed Fault Management approach for Next Generation Networks

CHAPTER 7
CONCLUSION
The Madeira Management framework proposes the use of peer-to-peer techniques to
fulfill management tasks. The ability to perform self-management and the usage of
Management Clusters solves scalability issues that are present in current hierarchical
network management approaches. So-called Adaptive Management Components, or
AMCs, that run on network elements, are in charge of performing the various
management tasks By using the peer-to-peer interface, these AMCs can communicate
with each other, creating a Management Overlay, in order to execute management
tasks and make up network management applications.

Furthermore, because of the peer-to peer concept, Madeira doesn’t have a need for a
central server. Besides reducing the operating expenses, this also eliminates the single
point of failure that exists in traditional systems. Madeira also offers support for a
higher layer Operation Support System (OSS). Such an OSS can access Madeira via
the Northbound Interface. This Web Services based interface enables an OSS to
acquire topology information, receive alarms, introduce policies and perform other
management tasks. By using a publish/subscribe system for notifications from the
network, Madeira can notify multiple external systems about events or alarms at the
same time. In order to prove the feasibility of the Madeira approach, a challenging
scenario has been identified, dealing with Configuration and Fault Management of
highly dynamical wireless networks. Based on the Madeira framework, a
Management System addressing this scenario will be prototyped and tested on a real
test bed.

Dr. AIT, Dept. of ISE 2010-2011 23


Distributed Fault Management approach for Next Generation Networks

BIBLIOGRAPHY

[1] Markus Leitner, Philipp Leitner, Martin Zach, Sandra Collins, Claire Fahy ”Fault
Management based on peer-to-peer paradigms” an IEEE 2007 paper, pp. 697-700

[2] Ray Carroll, Claire Fahy, Elyes Lehtihet, Sven van der Meer, Nektarios Geor
galas, David Cleary, “Applying the P2P paradigm to management of large-scale
distributed networks using a Model Driven Approach” 2006 IEEE

[3] Pablo Arozarena Llopis, Martijn Frints, David Ortega Abad, Javier González
Ordás, Liam Fallon, Martin Zach, Hai Nguyen Thi Van, Joan Serrat Fernández
“Madeira: A peer-to-peer approach to network management”, pp. 141-153 ,2006

[4] “Cisco Advanced Services Network Management Systems Architectural Leading


Practice”, white paper from Cisco Public Information, 2007

[5] Bela Berde, Carolina Pinart, Javier Gonzales Ordas, Piet Demeester and Koen
Casier: An Experience on Implementing Network Management for a GMPLS Nework
IV Workshop in MPLS/GMPLS networks. 21-22 April 2005, Gerona, Spain.

Dr. AIT, Dept. of ISE 2010-2011 24

Vous aimerez peut-être aussi