Académique Documents
Professionnel Documents
Culture Documents
CHAPTER 1
INTRODUCTION
of alarms from each network node are received. Using sophisticated statistical
analysis this engine might find out the most probable root cause for a given sequence
of alarms. Further, in order to be meaningful to an operator, the information contained
in this large amount of alarms has to be enriched by correlation against data bases
containing topology and other configuration related information, stored on the central
server. If the number of emitting nodes and alarms exceeds a certain critical value
such a centralized correlation engine will become a bottleneck. Moreover, the
heterogeneity of NEs increases the complexity of the system, and finally - maybe the
most critical issue – if static information on network topology is used for fault
analysis, the system will inevitably suffer from inconsistencies as soon as this
topology exhibits dynamic aspects.
Motivated by these issues we will present a conceptually different fault management
design that is based on a peer-to peer (P2P) approach to network management
suggested in the CELTIC research project Madeira
CHAPTER 2
Given the high-level manageability goals outlined above, the following sections
highlight the functional requirements in specific network management functional
areas.
Fault Management
Fault management encompasses the discipline of identifying faults in a network
environment. Faults are identified by receiving events such as syslog and Simple
Network Management Protocol (SNMP) traps from network devices, polling network
device MIBs, and identifying real or potential error conditions and setting thresholds
that trigger events. In addition, the NMS should be able to provide event correlation
as well as reporting and tracking. The NMS used should also provide a northbound
interface for exporting critical messages to a higher level manager or MoM (manager
of managers).
In an ideal environment, the fault manager would collect both syslog and SNMP
information, filter that information, and pass the filtered data to a MoM for further
processing. This method helps decrease the amount of data that an end user needs to
see or react upon. The MoM, in turn, can provide further analysis and automation
based on the incoming event streams such as verifying down circuits, testing
connectivity, and opening trouble tickets based on those findings.
Unmanaged Events
Stand-alone fault managers are used to gather event data from devices throughout the
network and report their findings. They have little to no capability of automating
reactions based on gathered data. When a message comes into the fault manager, the
typical course of action is simply to report the fault to a screen being monitored by
operations personnel.
Managed Events
By employing the use of a MoM, your system can react to these events automatically,
which can drastically reduce downtime in mission-critical networks. For example,
when an event comes in from the fault manager, the MoM can:
• Verify connectivity to the reported down device/interface by ping/Telnet or other
means
• Gather information about the device such as vendor, serial number, location, contact
information, circuit IDs, and site IDs, and so on from a device inventory database
• Attach historical reports gathered from other NMSs such as bandwidth, CPU,
memory, and so on
• Open a trouble ticket automatically and have that ticket prepopulated with important
information from the device information database
This method would not only relieve operations personnel from having to look up the
information for an outage, but would save critical time in bringing the fault to a
resolution.
Event Correlation
Event management encompasses event-correlation and root-cause analysis. It allows
for multiple input streams from various network devices and environments and, using
knowledge of the network topology and a sophisticated rule set, attempts to identify
the source or root cause of a network fault or problem.
• At the top level (MoM), event correlation features should be supported to aggregate
and correlate incoming alarms. The system needs to have the intelligence to correlate
event types (SNMP, syslog, and so on) as well as to provide automation of tasks
based on event criteria.
• Filtering capability should be supported to selectively display relevant alarms.
• The system should be capable of escalating critical alarms based on the number of
occurrences and time delays in acknowledgement.
• Alarm severity should be customizable based on end-user or operational needs.
• Alarm properties and escalation should be policy based, dependent on the role of the
device in the network.
• The system should be able to virtually partition the managed network into multiple
logical entities based on geographical locations.
• The fault management system should support role-based access to fault events based
on job responsibilities.
• A knowledge base consisting of troubleshooting guidelines or methodologies should
be part of the fault management system. This is to facilitate rapid problem isolation on
network-related issues.
• The system should provide integration between the fault and the inventory
management system to support auto population of information.
• Integrate between the inventory system and the trouble-ticketing system for auto
population of relevant trouble ticket fields.
• The system should provide the flexibility to forward traps and alarms to a different
location/system for after-hours monitoring.
Log Management
Logging is a critical part of network management. Good logs can help you find
configuration errors, understand past intrusions, troubleshoot service disruptions, and
react to probes and scans of your network. Cisco devices have the ability to log a
great deal of their status.
Syslog is also a great resource for network compliance, allowing companies to adapt
quickly to changing regulations such as Sarbanes Oxley (SOX), Control Objectives
for Information and related Technology (COBIT), IT Infrastructure Library (ITIL),
Gramm-Leach-Bliley Financial Modernization Act (GLBA), Visa Card Holder
Information Security Program (Visa CISP), Payment Card Industry (PCI) Data
CHAPTER 3
adapt to the new scenario, focusing on the tasks for which it is better suited and
relying on the control plane for other tasks such as routing and signaling .
Benefits of Hierarchical Layers
• In the past years, TMN has been the dominant network management
framework. It promotes a well-known centralized approach which has a
number of effects on the scalability of the network management application.
• Managing a large network from a single, central point will increase the load of
the central manager and could create bandwidth bottlenecks on links that are
close to that central manager.
Peer to Peer
In a P2P system, the nodes have a significant or total degree of autonomy from central
servers. As pointed out by, P2P systems enable the utilization of previously unused
resources as storage, cycles or content for example, by tolerating and working with
the variable connectivity of numerous devices. An overall characteristic of a peer-to-
peer network is that the nodes can send and receive information in a way that makes
them both servers and clients, or “servants”. In both and, a distinction is made
between pure peer-to-peer networks and hybrid peer-to-peer networks, in such a way
that:
Pure P2P architectures are completely decentralized: There is no central server or
router. Each node can issue and respond to requests, or route requests to other nodes.
In Hybrid P2P architectures, more types of nodes exist: The leaf nodes are nodes
with an information need or information resource. In other words, they can provide
information to or request information from other leaf nodes. Another type of nodes,
super peers, has a more “server-like” role in the network. These nodes provide
regionally centralized services to the network in order to improve the routing of
information requests. In these nodes are called directory nodes or ultra peers. Each
directory node provides directory services for portions of the network and directory
nodes work in a cooperative manner to cover the whole network.
CHAPTER 4
MADEIRA ARCHITECTURE
Based on the approach of applying P2P concepts to the management domain the
Madera architecture has been designed with a number of key principles in mind .The
most important of these principles is heterogeneity’ or the ability of the Madeira
system to be applied to many management domains and across heterogeneous devices
and platforms. The main vehicle for this generic management is the usage of policies,
notifications and applications.
The Madeira architecture is essentially composed of an Adaptive Management
Component (AMC) and a Platform. The AMC is the component that manages a given
node and together many AMCs can orchestrate the overall behavior of the meshed
network. These AMCs have the ability to exchange and export Network Management
information between peer management applications and are deployed as an overlay
network, communicating using the peer-to-peer paradigm. The AMC itself is
composed of a number of sub-elements which facilitate this management.
The AMC Core is the primary component or brain’ of the AMC and, based on
notifications and policies, orchestrates the services and applications to facilitate the
required network management function. Services are components that provide some
functionality required by the AMC Core. Applications provide the actual
CHAPTER 5
THE MADEIRA SCENARIO
The goal of the scenario is to prove the capabilities of the Madeira approach to
deal with real life management problems. It provides a number of challenging
tasks to test the management approach. To emphasize the strengths of Madeira,
the problems that arise in the scenario are difficult to solve with traditional
management approaches, especially with respect to dynamic reconfiguration and
changing topologies that occur in Wi-Fi networks.
The scenario focuses on the areas of Configuration Management and Fault
Management, with an emphasis on the integration between both of them. In the
scenario, a number of wireless base stations are deployed in such a way that
wireless equipment (for example, laptops or PDAs) may have coverage from one
or more base stations. Not every base station has a wired connection to the back
haul network, as is the case in a traditional wireless network. Base stations
directly connected to the backhaul network are called gateways.
Configuration management
The rest of the base stations can only use a wireless connection to reach a gateway
and thus the backhaul network. After deploying the base stations, Madeira
automatically sets up the wireless meshed network using OLSR (Optimized Link
State Routing protocol) as the routing algorithm. OLSR is a link stated routing
protocol that is specifically developed for mobile ad-hoc networks . Based on
pre-installed policies, base stations are grouped into a number of clusters by the
Grouping Service. These policies can be based on a number of criteria such as, for
example, number of nodes per cluster or topological proximity.
Figure 4 depicts an example topology that could be the result of this process. As
can be seen, clusters may or may not have direct backhaul connectivity. The network
elements in a cluster monitor each other and exchange management information on a
peer-to-peer basis. As mentioned, wireless network equipment that wants to
use the network is in range of one or more base stations. If the wireless equipment is
in range of more than one base station, it selects one of them as its preferred base
station, and uses this connection to use services on the Internet for example. If the
wireless equipment is in range of just one base station, it must select that base station
as its preferred base station.
Each cluster has exactly one Cluster Head. Policies are used for this election, and can
be based on criteria like load, optimal connectivity or robustness. The Cluster Head is
responsible for coordination and topology publishing of its cluster. Different levels of
clustering can exist in Madeira.
The creation of this hierarchy is also based on policies. As mentioned in the previous
section dedicated to the architecture, the top level Cluster Head is responsible for
publishing the topology of the complete network. This can be done to a higher layer
Operation Support System or another Network Management System for example. The
cluster hierarchy is the basic management overlay that is used by all management
functionality in Madeira. It creates a scalable environment for network management.
The Madeira Configuration Management application is responsible for the
construction, maintenance and viewing of the topology. Other applications, such as
Fault Management, use this management overlay to implement their functionality.
Fault management
During usage of the network, it is inevitable that unexpected faults occur. When such
a fault occurs, it is important that: Appropriate action is undertaken quickly in order to
reduce the service impact. Meaningful information on the fault is presented to the
operator (in particular in those cases where automatic restoration is not or not fully
possible). Besides Configuration Management (CM), Madeira focuses on Fault
Management (FM) and how CM actions and events are related to FM faults and
alarms. Correlation between CM events and FM faults is an important aspect in order
to discover the actual cause of a problem in the network.
Alarms can be generated by two different sources:
1. Hardware level alarms are generated by the base station in case of a hardware
fault.
2. Platform level alarms are generated by either the Directory Service or the CM
application. The directory Service can indicate loss of connection with a neighboring
node, and the CM application can indicate changes in the topology (a node leaves or
joins a cluster). When a fault occurs, the FM application will receive one or more
alarms. For example, a hardware level problem say also cause a fault on platform
level, creating two alarms. These alarms are correlated into a new alarm by FM and
sent to the Cluster Head. This Cluster Head also performs correlation of the alarm
with alarms originating from other nodes in order to get a clearer picture of the
probable cause and possible solution. It can then forward the alarm to a higher
hierarchy level. This process is repeated until it reaches the Top Level Cluster Head,
which can notify the Northbound Interface in order to produce an alarm for the
external OSS. This paper provides a few example scenarios in order to explain the
basic concepts and functionality of the FM application in Madeira. These scenarios
describe two faults with similar impact but very different in nature (in the first case a
node goes down, while in the second there is just a loss of connectivity between two
nodes), and focused on the way Madeira distinguishes these two cases by correlating
alarms and CM events at different levels of the management hierarchy.
Figure 6 depicts two Madeira Management Clusters, with node A and node G being
the Cluster Heads. The solid lines indicate “physical” OLSR links between nodes. The
dotted line represents an inter-cluster connection between node E and node F. When
node E fails, the Directory Service of its one-hop neighbours D and F will notice this.
Both nodes will notify their Cluster Heads that the link with node E has failed, and
that therefore E might be faulty. Besides receiving this alarm from node D, Cluster
Head A also receives a notification from the CM application that node E isn’t part of
the cluster anymore. It will then send an alarm with this knowledge to a higher
hierarchy level.
When the Cluster Head G of node F receives the alarm that node E isn’t reachable, it
will also forward this alarm to a higher hierarchy level.
After receiving the alarms from node A and G, the higher level Cluster Head tries to
correlate the information with other alarms. Since both alarms contain the same
knowledge (a possible fault of node E) they will be merged to a single alarm with the
same content. Afterwards the alarm will be forwarded up the hierarchy pyramid, until
the Top Level Cluster Head will notify the NBI, which will inform the external OSS
on the failure of node E.
In this scenario, shown in Figure 7, the link between node D and node E fails. The
link between E and F remains intact. Both nodes D and E will receive a notification
from their Directory Service indicating the neighboring node is no longer reachable.
Node D concludes E might be faulty and forwards this information to its
Cluster Head A, who also receives a notification from the CM application that E is no
longer part of its cluster. After combining this information, it will send an alarm to the
next hierarchy level, identical to the previous scenario. Besides receiving the
notification from the Directory Service, node E will recognize that it is no longer
connected to its cluster head and it will join another cluster (it is assumed E joins the
cluster containing F and G). After this reconfiguration process E will forward the
information that D is no longer reachable to its new cluster head G, which will
additionally receive a notification from the CM application that E has joined its
cluster and forward this information to the next hierarchy level.
The higher level Cluster Head of this next hierarchy level receives the alarm from A,
indicating that node D reported that E is unavailable and probably faulty. It also
receives a notification from the CM application indicating that node E has joined
another cluster, and an alarm from G indicating that node E reported that D is
unavailable and probably faulty. After correlating these three notifications, the Cluster
Head concludes that a link outage between D and E occurred and either suppresses
this alarm (if configured to do so) or forwards this knowledge as a minor alarm up the
hierarchy pyramid until the Top Level Cluster Head and NBI is reached, which will
then inform the external OSS on the link outage between D and E.
CHAPTER 6
FUTURE ENHANCEMENT
CHAPTER 7
CONCLUSION
The Madeira Management framework proposes the use of peer-to-peer techniques to
fulfill management tasks. The ability to perform self-management and the usage of
Management Clusters solves scalability issues that are present in current hierarchical
network management approaches. So-called Adaptive Management Components, or
AMCs, that run on network elements, are in charge of performing the various
management tasks By using the peer-to-peer interface, these AMCs can communicate
with each other, creating a Management Overlay, in order to execute management
tasks and make up network management applications.
Furthermore, because of the peer-to peer concept, Madeira doesn’t have a need for a
central server. Besides reducing the operating expenses, this also eliminates the single
point of failure that exists in traditional systems. Madeira also offers support for a
higher layer Operation Support System (OSS). Such an OSS can access Madeira via
the Northbound Interface. This Web Services based interface enables an OSS to
acquire topology information, receive alarms, introduce policies and perform other
management tasks. By using a publish/subscribe system for notifications from the
network, Madeira can notify multiple external systems about events or alarms at the
same time. In order to prove the feasibility of the Madeira approach, a challenging
scenario has been identified, dealing with Configuration and Fault Management of
highly dynamical wireless networks. Based on the Madeira framework, a
Management System addressing this scenario will be prototyped and tested on a real
test bed.
BIBLIOGRAPHY
[1] Markus Leitner, Philipp Leitner, Martin Zach, Sandra Collins, Claire Fahy ”Fault
Management based on peer-to-peer paradigms” an IEEE 2007 paper, pp. 697-700
[2] Ray Carroll, Claire Fahy, Elyes Lehtihet, Sven van der Meer, Nektarios Geor
galas, David Cleary, “Applying the P2P paradigm to management of large-scale
distributed networks using a Model Driven Approach” 2006 IEEE
[3] Pablo Arozarena Llopis, Martijn Frints, David Ortega Abad, Javier González
Ordás, Liam Fallon, Martin Zach, Hai Nguyen Thi Van, Joan Serrat Fernández
“Madeira: A peer-to-peer approach to network management”, pp. 141-153 ,2006
[5] Bela Berde, Carolina Pinart, Javier Gonzales Ordas, Piet Demeester and Koen
Casier: An Experience on Implementing Network Management for a GMPLS Nework
IV Workshop in MPLS/GMPLS networks. 21-22 April 2005, Gerona, Spain.