Vous êtes sur la page 1sur 39

6 Case Study: IT Infrastructure Monitoring

In my role as Enterprise Virtualization & Storage Specialist at UofT, I was

assigned the task of implementing a monitoring system for the new Data Centre (DC),

recently completed and used by the University for its central administrative computing.

Already implemented was an OSS package called Cacti; it had been selected by

management due to their familiarity with it, having used it extensively to monitor Java

Virtual Machine (JVM) for the Learning Management System called Blackboard Learn.

In addition to JVM monitoring, Cacti had been leveraged to monitor host network

interfaces, memory usage, CPU usage, and Apache Web Server statistics. The goal was

and remains to extend Cactis monitoring capabilities to hundreds if not thousands of

devices, physical and virtual within and connected to this new facility.

Questions arose regarding Cactis limitations, which had been experienced and

motivated a review of available monitoring packages. These questions included: does our

current monitoring package meet all our needs? Can the package be scaled up as the DC

and its virtual infrastructure and systems grow? Should we change to another OSS

solution like Zabbix or Nagios? Each person had expressed his or her own preference and

agreement without an evaluation seemed unlikely.

123
This model revealed to management the strengths and weaknesses of Cacti in

addition to Zabbix and Nagios. These two OSS Monitoring packages are commonly used

in the community.

The target users of this system are the staff responsible for this facility, thus we

have implemented with the assistance of the Hardware Infrastructure Group (HIG) in

consultation with the Manager of the Data Centre. In this case we have included real

employee salary, hardware, and services costs in Canadian dollars (sign: $; code: CAD),

for the financial criterion.

Starting with the Manager, Data Centers, John Calvin, and in consultation of other

members of his team, we have conducted several interviews to understand why the

customer seeks to use OSS instead of CSS. Additionally, we needed to establish the

business and technical requirements motivating their choice.

The following section presents an overview of the Information and Technology

Services division (I+TS) of The University and outlines the importance of this system.

We then show how the model was applied to three OSS packages, and finally how these

results justify retaining the already implemented Cacti solution. In addition, the outcome

of the evaluation suggested the use of additional plugins and performance tuning to

achieve the stated objectives.

124
6.1 Introduction

The University of Toronto is a publicly funded undergraduate, graduate and

research university in Toronto, Canada. At the time of writing it had 65,612 full time

undergraduates, of which 56,380 are domestic and 9,232 are international students, and

15,287 graduate students, of which 13,210 are domestic and 2,077 are international. The

Universitys annual operating budget for the current fiscal year is $1.8B CAD (University

of Toronto, 2012).

According to the I+TS official web site61, the Office of the Chief Information

Officer (CIO) is responsible for planning and provision [sic] of central IT services at the

University of Toronto (University of Toronto, 2010). Eight key areas integrate I+TS as

described in Appendix F I+TS Organizational Structure.

Within the portfolio of the CIO, the Enterprise Infrastructure Solutions group is

responsible for designing and implementing networking, server, storage and other

enterprise-level solutions that are secure, efficient, reliable and cost-effective (University

of Toronto, 2012).

The renovation of the Universitys primary Data Centre is listed as an action in

the I+TS Roadmap (University of Toronto, 2011), in order to provide reliable and secure

61
Information + Technology services - http://jmll.me/tbc2

125
network, server, and storage services while decreasing the use of space and energy. In

May of 2011 the Data Centre redesign started with an approved budget of $5.1M CAD

and ROI in two years. DC redesign project was led by Patrick Hopewell, Director of EIS,

Tom Molnar, Manager HIG, and John Calvin, Manager, Data Centers, working with

Ehvert Mission Critical to build a state of the art, sustainable and efficient Data Centre.

A state-of-the-art DCs performance, availability and overall health needs to be

based on a reliable IT Infrastructure Monitoring System, considering not only network

devices like switches, routers, fiber channel switches, servers, and storage heads, but it

needs to include very specific devices such as Uninterruptible Power Supplies (UPS),

Power Distribution Units (PDUs), Airflow, temperature and humidity sensors among

other instrumentation devices. For instance, according to the Calvin (2012), the accepted

safe rate of change of temperature for most server equipment is less than 10C/hr. A loss

of airflow in a sealed vented cabinet operating at about 10kW would cause a 20C rise of

temperature in 30 to 40 seconds. Therefore, monitoring is an essential part of controls

automation and alerting to respond to events that threaten production services.

The I+TS ecosystem is categorized as hybrid, due to the mixture of OSS and

CSS-based information technology services. However there is a push toward the adoption

of OSS to cap and reduce licensing costs. One scenario often mention by Calvin (2013) is

the monitoring of large-scale networks having many thousands of devices, which in the

case of some CSS licensing models would require licensing on a per-device basis. On the

other hand, OSS offers no added software purchase or licensing costs as the number of

126
devices increases. Other disadvantages of using CSS were enumerated, such as the extra

costs for integrating software via pay-for-use APIs, software lifecycles imposed by the

vendors, the end-of-support-life (EOSL) announcements made by vendors to push the

sale of new releases, upgrades and software support.

According to Gartner (2011), Cacti, Nagios and Zabbix are among the most

popular OSS web-based monitoring systems in the current market. Thus, these three

candidates were evaluated using The Integral OSS Evaluation Model, defined in this

document. It is important to reiterate that Cacti has already been implemented, and thus

the outcome of this analysis is to be used to support its continued use or justify the

migration to a more suitable package.

6.2 Phase 1. Definition

The primary user community for the monitoring software is a group of highly

skilled technical staff responsible for the design and implementation of datacenter

networking, servers, storage and other enterprise level solutions. The secondary user

community is the Enterprise Applications and Solutions Integration (EASI) team

responsible for the development and implementation of all computer applications offered

at the institutional level (University of Toronto, 2012).

127
We asked the question during the interview conducted with the manager of the

Data Centre What are the requirements for the IT Infrastructure Monitoring System?

The answer of this question and others are documented in Appendix G. As a result,

almost 50 elements were listed, and for a better understanding as the model recommends,

they were classified into three main categories: functional, non-functional and

technological requirements, as shown in Table 42.

Functional* Non,functional* Technological*


Network(fault(determination(1(Logic(hierarchy( User(Management(tools,(Role(based(access( Support(SNMP(v1(and(v2c,(v3((
control(
Auto(topology(creation( Ownership(of(devices( Linux/Unix(platforms.((
Interface(Discovery(should(be(automatic(for( Graphs(should(be(created(on(demand,(not(in( DB(backend(should(be(configurable(to(
switches(and(devices( mass(with(every(sample( use(any(OS/CS((RDBMS(
Topographic(map(of(the(devices.( Authentication(framework,(such(as(LDAP,( Handle(64bit(values(
Shibboleth(
Device(Auto(discovery(capabilities(( Data(Importing/Exporting(formats(XML,(CSV,(XLS( Gather(SNMP(data(at(1min(intervals(
Threshold(triggered( Reconcile(missing(data(and(identify(the(network( Compiled(not(interpreted.(
fault.(
Alerting(lists(and(scheduled(alerts((alert(schedule( Keep(at(least(25(months(of(data(at(1(minute( (
intervals(
Mobile(alerting.(SMS/Email(or(PushNotifications( Import/Export(data(and(templates( (
Independent(probing(capabilities( Scale(to(thousand(of(devices( (
Agentless(non1intrusive( Handle(more(than(64K(outstanding(requests( (
Applications,(services,(operating(systems,(network( Memory(footprint(should(not(increase(with(the( (
protocols,(system(metrics( number(of(devices(being(monitored(
Script(based(monitoring( Modular.(Plugin(based( (
Network(fault(determination(1(Logic(hierarchy( Billing(system.( (
Scheduled(downtime( Logging(capability.(( (
(Devices,(Graphs(and(data(templates( Decentralized(and(centralized(configuration(( (
Summary(reporting(capability( Automatic(Configuration(tools(1(SSH(keys( (
exchange.((
Reporting(Indicators((throughput,(peaks,(traffic,( Manageable(MIBs(Library( (
etc.(
Historical(reports( Dashboard(and(the(ability(to(create(custom( (
dashboards(
User(Reports:(accounts,(first(use,(last(use( Mobile(app(for(viewing( (
data/graphs/logs/thresholds.(Mobile(handset(
friendly.(
( Produce(time(aligned(graphs(in(a(standard(format( (
configurable(by(the(administrator(
( Graph1Noise(filtering(1(Removal(of(erroneal(points( (
( Multi1display(identification.( (

Table 42 IT Infrastructure Monitoring System Requirements

This list of requirements provides a better idea of what must be evaluated to

ensure that the chosen software will fulfill the needs expressed by the user. In this case

the evaluator must have a fair amount of technological knowledge to recognize for

128
instance, how a monitoring system will handle 64 bit values, and of course if the

technological requirement is not well defined, working closely with the user to gain a

better understanding of the requirements is strongly recommended.

6.3 Phase 2. Identification

Tables 43, 44, and 45 show the identification cards for the evaluated packages;

they provide a quick reference of the relevant characteristics of each package. For

instance, Cacti is written in PHP, Nagios in C and Zabbix in three different languages

Java, PHP and C, applied in different modules.

Category Sub-category Description


Name Cacti
Version 0.8.8a
License GNU General Public License
General
Type Monitoring System
Site http:///cacti.net
Language PHP
Hardware Network Access
Requirements Web Server (Apache), MySQL, PHP, RRDTool, net-snmp. Runs over
Software
Unix and Windows
Official http://docs.cacti.net/ ; http://www.cacti.net/downloads/docs/html/
Non-Official http://blog.cactiusers.org/
Documentation Comprehensive Linux Install Guide by Lee Carter, Solaris Install Guide
Relevant
by Javier Vidal Postigo, German Install Guide by Sebastian Larisch.
Books Cacti 0.8 Beginner's Guide, Thomas Urban
Official http://forums.cacti.net; http://cacti.net/mailing_lists.php
Support & Non-Official http://blog.cactiusers.org/
Community Issue tracker site http://bugs.cacti.net/
Relevant https://help.ubuntu.com/community/Cacti
svn checkout svn://svn.cacti.net/cacti;
Source http://www.cacti.net/downloads/cacti-0.8.8a.tar.gz
Distribution
Binaries N/A
Platforms Windows, Linux/Unix
Modularity PIA - Plugin Architecture
Architecture
Plugins http://docs.cacti.net/plugins
http://gregsowell.com/?page_id=86;
Training http://www.transitiv.co.uk/services/training/cacti;
+ Services http://www.credativ.co.uk/services/training/monitoring/cacti/
Support http://www.transitiv.co.uk/services/consultancy/cacti
Consulting http://www.transitiv.co.uk/services/consultancy/cacti

Table 43 Cacti Identification Card

129
Areas in which Zabbix seems to be very strong are official documentation,

community support, issue tracker systems. On the other hand, Cacti has a user

organization providing bundles such as CactiEZ62, a self installing Linux Distribution

based off [sic] CentOS that sets up and configures a customized Cacti install (Conner,

2012).

Category Sub-category Description


Name Nagios Core
Version 3.4.4
General License GNU General Public License
Type Monitoring System
Site http://www.nagios.org/
Language C
Hardware Network Access
Requirements
Software C Compiler, Web Server (Apache), GD Library and Unix/Linux as OS.
Official http://nagios.sourceforge.net/docs/3_0/toc.html
Non-Official http://exchange.nagios.org/directory/Documentation/Nagios-Core-
Documentation
Documentation Relevant http://www.fullyautomatednagios.org/wordpress/documentation/
Books Nagios Core Administration Cookbook By: Tom Ryder; Nagios: Building
Enterprise-Grade Monitoring Infrastructures for Systems and Networks, Second
Edition By: David Josephsen
Official http://library.nagios.com/; http://support.nagios.com/forum/;
http://support.nagios.com/wiki/index.php/Main_Page
Support &
Non-Official https://help.ubuntu.com/community/Nagios
Community
Issue tracker site http://tracker.nagios.org/my_view_page.php
Relevant http://nagiosplugins.org/support
Source http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.4.4.tar.gz
Distribution Binaries N/A
Platforms Linux/Unix
Modularity Plugin Architecture
Architecture
Plugins http://www.nagios.org/download/plugins; http://nagiosplugins.org
Training http://www.nagios.com/services/training/
+ Services Support http://support.nagios.com/; http://support.nagios.com/wiki/index.php/Main_Page
Consulting http://www.nagios.org/support/servicepartners/

Table 44 Nagios Identification Card

Another important fact is that both Nagios and Zabbix have their own business

solutions as a paid service providing support, training and consulting services.

62
CactiEZ - http://jmll.me/tbc19

130
Category Sub-category Description
Name Zabbix
Version 2.0.4
License GNU General Public License version 2
General
Type Monitoring System
Site http://www.zabbix.org/
Language C (server, proxy, agent), PHP (frontend), Java (Java gateway)
Hardware Network Access, 100MB Disk Space, 256M RAM, Pentium IV or equivalent
Requirements
Software Apache Web Server, MySQL, PostgreSQL, SQLite, Oracle or IBM DB2
Official https://www.zabbix.com/wiki/doku.php; http://blog.zabbix.com/
Non-Official https://s3.amazonaws.com/analyticarts/zabbix/Zabbix2-0Manual.pdf
Documentation
Relevant N/A
Books Zabbix 1.8 Network Monitoring By: Rihards Olups
https://www.zabbix.com/forum/; https://support.zabbix.com/secure/Dashboard.jspa;
Official https://lists.sourceforge.net/lists/listinfo/zabbix-announce;
Support & https://lists.sourceforge.net/lists/listinfo/zabbix-users
Community Non-Official N/A
Issue tracker site https://support.zabbix.com/browse/ZBX
Relevant N/A
http://sourceforge.net/projects/zabbix/files/ZABBIX%20Latest%20Stable/2.0.4/zabbix-
Source 2.0.4.tar.gz/download
Distribution
Binaries http://www.zabbix.com/download.php
Platforms Cross Platform
Modularity Plugins
Architecture
Plugins N/A
Training http://www.zabbix.com/business_solutions.php
+ Services Support http://www.zabbix.com/business_solutions.php
Consulting http://www.zabbix.com/business_solutions.php

Table 45 Zabbix Identification Card

6.4 Phase 3. Qualification

Using the results from the Definition phase, the identification cards for the three

candidates, and the criterion previously defined in the model, will be scored by first

setting weights for each desired feature set.

131
6.4.1 Functionality

Recall that requirements shown in Table 42 have been classified in three main

sections. Each requirement has been assigned a weight relative to its importance to the

customer. These requirements are listed in Appendix I. Summarizing, Cacti met 14 out of

19 functional requirements, 11 out of 23 non-functional requirements and 5 out of 7

technological requirements. Nagios met 15 out of 19 functional, 11 out of 23 non-

functional and 6 out of 7 technological. Zabbix fulfilled 15 out of 19 functional, 16 out of

23 non-functional and 6 out of 7 technological. Finally, Table 46 shows that Zabbix

meets more requirements for an IT Infrastructure Monitoring System than Nagios and

Cacti.

Functionality
Cacti Nagios Zabbix
Functional 0.8 1.0 1.0
Non- 0.7 0.7 0.9
functional
Technological 1.3 1.3 1.3
Total 0.9 1.0 1.1

Table 46 Classified requirements with the importance set.

6.4.2 License

The user did not specify a type of OSS license agreement as being a requirement.

However, the user did specify a preference for OSS vs. OSS. Therefore, Table 47 scores

all three packages with a one.

132
License
Cacti Nagios Zabbix
License GPL GPL GPL
Required N/A
Total Score 1 1 1

Table 47 License scorecard: Cacti, Nagios, and Zabbix.

6.4.3 Community

Cacti is categorized by its characteristics as an Organization. It has an identified a

bug tracking system called bug.cacti.net that is updated frequently. According to the

Cacti Forums, the official community site, it has 228,715 posts, 43,831 topics and a total

of 41,032 members (Cacti.net, 2013). While sub-communities for Ubuntu, openSUSE

and VMware exist there is not much activity.

The following mailing lists were identified: cacti-announce for announcements,

cacti-user for users in general and cacti-devel for developers (Cacti.net, 2012). The

mailing list cacti-user is the one with the most activity. Further analysis of data gathered

from the List Archive site provided by SourceForge.net from 2009 to 2012 (illustrated in

Figure 27) shown that 2009 has been the most active year to date with 633 messages; in

2012 the standard deviation of the years data was 16.5, which shows how the activity

changes month to month.

133
Figure 26 Cacti-user mailing list activity from 2009 to 2012. Data gathered from SourceForge.net (2013).

Cacti has a Plugins site at docs.cacti.net/plugins, where around 100 add-ons,

scripts, data, graph, hosts templates and data queries, both official (supported) and user-

supplied (not supported), may be downloaded.

The Nagios community can be categorized as a Commercial Organization,

because Nagios Enterprises (NE), which is a commercial entity, is behind its

development, support, and community sites. Even though, NE provides special

distributions with additional features for an extra fee, Nagios Core, the foundation, is free

and is lead by Nagios.org, a community site funded by NE. Nagios.org estimates its

worldwide community in more than 1 million users, including individuals and companies

(Nagios.org, 2013). Exchange is a sub-site of Nagios.org where all types of projects such

as plugins (2500+), add-ons (500+) and utilities (16) among others can be found

(Nagios.org, 2012). The Nagios forum is hosted at support.nagios.com/forum with a total

of 40,009 posts and 18,312 members. This forum includes private areas for customer

support as well as general support, thus it is not the best reference.

134
Figure 27 Nagios-user mailing list activity from 2009 to 2012. Data gathered from SourceForge.net (2013).

Eight mailing lists are hosted on SourceForge.net: nagios-announce, nagios-

checkins, nagios-devel, nagios-devteam, nagios-project, nagios-users, nagios-users-br,

and nagios-users-ru. Similar to the results we saw for Cacti, the Nagios user mailing list

had more activity than others with a total of 77,205 messages from 2001 to 2012

(SourceForge.net, 2013).

Further analysis on data gathered from the List Archive site provided by

SourceForge.net for the period 2009 to 2012 (illustrated in Figure 28) showed that 2009

has been the most active year with 7,017 messages; in 2012 the standard deviation of the

years data was 40.9 as the monthly change rate. Finally, there is a World Conference

promoted by NE, which offers attendees a central place to collaborate, experience,

exchange and enhance their knowledge on everything Nagios related.

135
The Zabbix Community official site is at Zabbix.org, which names itself as the

community platform. The Zabbix Forum (www.zabbix.com/forum) is composed of

107,561 posts and 12,185 members (Zabbix SIA, 2013; Zabbix SIA, 2013) and in the

Blog (blog.zabbix.com) site, users find a fair amount of information about Zabbix

versions, solved issues, useful hints and recommendations from Zabbix experts and

developers, special events, conferences, features, etc. SourceForge.net manages Zabbix

mailing lists for announcements, users, developers and translators who translate the

product and documentation into various languages. Among these four, the Zabbix-users

list is the most used, with 1266 messages from 565 subscribers (SourceForge.net, 2013)

in the period from 2002 to 2012.

Presented in the Figure 29 is an analysis of the Zabbix mailing list data from

SourceForge.net for the period of 2009 to 2012 (illustrated in Figure 29) showing that

2009 was the most active year, with 195 messages; in 2012 the standard deviation of the

years data is 8.16 representing the change in the number of messages monthly.

Figure 28 Zabbix-user mailing list activity from 2009 to 2012. Data gathered from SourceForge.net (2013).

136
We know that Zabbix has a modular architecture, however we could not identify

an official site hosting plugins or modules. On the other hand, there are posts in the

official forums, but nothing specific with regard to obtaining plugins. A simple Google

search demonstrates how these plugins are hosted on various source-code hosting sites

such as Freecode.com, GitHub.com or within personal blogs.

The Zabbix community does not seem to be as well structured as a Commercial

Organization and, given its lack of activity it is correctly classified as an Organization

The Nagios community is certainly well structured and NE has done a great job

involving users in events, blogs, etc. Therefore, the Nagios community is categorized as

Commercial Organization.

Cacti, on the other hand, relies completely on its community, and although no

commercial entity supports the product, the community is characterized by extensive

activity and remarkable support; thus Cactis community is categorized as an

Organization.

Community
OSS Package Type Score
Cacti ORG 1
Nagios COR 4
Zabbix ORG 1

Table 48 Community scorecard: Cacti, Nagios and Zabbix.

137
6.4.4 Seniority

Cactis initial version, 0.5 was released on September 23, 2001; this

approximately 11 years ago (SourceForge.net, 2001). The latest version, and the one

being evaluated in this case, is 0.8.8a released on April 29, 2012. This is the 40th release

in its lifetime.

NetSaint (Nagios version 0.0.1) was made publicly available on March 14, 1999

and was renamed Nagios 34 releases later in 2002. After 14 years of development (2013)

it is now at version 3.4.4 (Nagios Core), which is the version evaluated in this case.

Zabbix was started as an internal project in a bank by Alexei Vladishev in 1998

and was not released until March 23rd 2004 as Zabbix 1.0. To date, Zabbix has had a total

of 75 stable releases (SourceForge.net, 2013; SourceForge.net, 2013) in its nine years of

life; the latest version (2.0.4) was released on December 8th, 2012 and is the version

evaluated in this case.

Therefore, the model recommends assessing lifespan and versions released.

However, in this case we have chosen the latest stable release, meaning this sub-criterion

can be avoided ignoring. On the other hand, lifespan has been taken into account and, in

this case Nagios with 14 years ranks first; Cacti is in second place with a ten years and

Zabbix ranks last with nine years.

138
Seniority
Package Lifespan Score Final Sore
Cacti 11 3 3
Nagios 14 3 3
Zabbix 9 2 2

Table 49 Seniority scorecard: Cacti, Nagios and Zabbix.

Cacti and Nagios are the senior OSS offerings in the IT infrastructure monitoring

market, and as such have the confidence of the users community.

6.4.5 Support

The community in the Cacti forums provides support. Cacti-Support is a main

forum divided into four main sub-forums: General, with 80,184 posts and 18,184 topics;

Lunix/Unix Specific, with 45,263 posts and 93,337 topics; Windows Specific, with

21,514 posts and 3,469 topics; and Unstable Development Versions, with 1,664 posts and

294 topics (Cacti.net, 2013). With respect to paid support, an England-based company

called Transitiv Technologies specializes in Open Source Support & Services and can

provide support through their Cacti Consultancy Services (Transitiv Technologies, 2013).

An American company called credativ LLC also provides OSS support in the US, UK,

Germany, and Canada. credativ's Cacti support covers all of the following Linux

supported distributions: Debian, Ubuntu, Red Hat, SuSE, openSuSE, CentOS, Xandros,

OpenBSD, and FreeBS for a monthly fee of $305USD (credativ LLC, 2013).

139
Nagios Enterprises offers professional annual support for organizations that

require this service (Nagios Enterprises, 2013). NE provides support directly in the

public forum63, however a specific forum called Customer Support requires credentials

to access, as well as a product license key associated with one of the following products:

Nagios XI, Fusion, Core or Incident Manager. A General Support forum is also

available for community support. The Nagios XI license includes support and

maintenance, from $1,295USD to $2,495USD per year (Nagios Enterprises, 2013). For

Nagios Core, the OSS version, annual support plans are available starting at $2,495USD

(Nagios Enterprises, 2013).

The Zabbix community support is provided through the Zabbix Help Forum64,

which hosts 35,560 posts and 10,047 topics. Zabbix offers to its customers 5 different

support tiers: per-incident based support plans up to complex support tiers, version

upgrade, on-site training and on-site consulting (Zabbix SIA, 2013). Nonetheless, there is

no publicly available pricing for this kind of support. The support web site states that the

Sales department can provide a quotation upon request.

During the interview of the Manager, Data Centres, we asked his opinion of

supporting the OSS package internally, and he pointed out that he had doubt that the

expertise existed within the organization and could be leveraged to support this package.

Staffing levels pose a challenge for any complex monitoring system. If monitoring

63
Nagios Support - http://jmll.me/tbc34
64
Zabbix Support - http://jmll.me/tbc9

140
solutions are to remain effective, they must constantly evolve; not only with moves, adds,

and changes, but by supporting new devices and new alerting mechanisms - from

numeric pagers to the thing after twitter. Influencing the direction of that evolution is

simpler when you have the source code and one good programmer (Calvin, 2013).

Consolidating the information above, Table 50 shows the score for each kind of

support and shows that support is not a significant concern for any of them.

Support
Kind Cacti Nagios Zabbix
Self-Support 1 0 1
Paid Support 1 1 1
Community Support 1 1 1
Total 3 2 3

Table 50 Support scorecard: Cacti, Nagios and Zabbix.

However, an organization might face different challenges down the road when

opting for community support. For instance, support response times and the quality of

response provided by an unpaid entity likely has no SLA; the response depends on the

availability of users within the community.

6.4.6 Interoperability

For this criterion, technological requirements were classified into four main

categories: interface, standards, protocols and data formats, in order to evaluate the OSS

monitoring software interoperability.

141
Cacti, for instance, has a web interface following the W3C standards and mobile

clients, such as iCacti65 for iOS devices; for Android devices, there is CactiViewer66 and

nmidClient Cacti67. Furthermore, with the support of additional modules or plugins this

OSS can handle PDF and HTML reporting; out-of-the-box it has PNG as its default

image standard. Cacti supports SNMP v1, v2c and v3 out-of-the-box and can export raw

data to CSV, plain text, and XML via RRDTools.

Nagios also has a standardized web interface and mobile clients developed by the

community and the proof of this can be seen simply by searching the Google Play Store

(Android store) for the word Nagios where one can find aNag68 and Nagbag69, among

others; for iOS there is OnCall70 and iNag71, for example. Between core functionalities

and additional plugins, Nagios handle PDF, PNG and RRD, but appears to lack HTML

support. Nagios can export data in all the previously stated required formats.

Finally, Zabbix has both a web interface following W3C and community

developed mobile clients for iOS, such as MobileOp72 and Mozaby73; for Android

65
iCacti, iOS Cacti client - http://jmll.me/tbc35
66
CactiViewer, Android Cacti client - http://jmll.me/tbc36
67
nmidClient Cacti, Android Cacti client - http://jmll.me/tbc37
68
aNag, Android Nagios client - http://jmll.me/tbc38
69
Nagbag, Anrdoid Nagios client - http://jmll.me/tbc39
70
OnCall, iOS Nagios client - http://jmll.me/tbc40
71
iNag, iOS Nagios Client - http://jmll.me/tbc41
72
MobileOp, iOS Zabbix client - http://jmll.me/tbc42
73
Mozaby, iOS Zabbix client - http://jmll.me/tbc43

142
devices there are apps including ZAX Zabbix74 and Zabbix on the go, among others in

the Google Play Store. This OSS can handle PDF and PNG but has a lack of RRD and

HTML standards use. For protocols listed, there is a full compatibility using SNMP v1,

v2c and v3. Lastly, the only mean to export data from Zabbix is by using plain text and

XML formats, this last with support of additional modules.

Interoperability
Importance Cacti Nagios Zabbix Cacti Nagios Zabbix
Score Score Score
Interface
W3C 1 1 1 1 1 1 1
Mobile 1 1 1 1 1 1 1
Standards
Portable Document Format 2 1 1 1 2 2 2
Portable Network Graphics 1 1 1 1 1 1 1
Hypertext Markup Language 2 1 0 0 2 0 0
RRDTool 2 1 1 0 2 2 0
Protocols
SNMP v1 2 1 1 1 2 2 2
SNMP v2c 2 1 1 1 2 2 2
SNMP v3 0 1 1 1 0 0 0
Data formats
Spreadsheets (XLS) 1 0 1 0 0 1 0
Comma-separated Values 2 1 1 0 2 2 0
Plain text 1 1 1 1 1 1 1
XML 2 0 1 1 0 2 2
Final Score 1.2 1.3 0.9

Table 51 Interoperability scorecard: Cacti, Nagios and Zabbix

Table 51 shows the scores for every Interoperability element evaluated as well as

the final score for each OSS package.

74
ZAX Zabbix. http://jmll.me/tbc48

143
6.4.7 Security

Cacti, Nagios, and Zabbix were queried at the CERT Vulnerability Notes

database and no results were obtained. However, in order to validate any vulnerability

already registered for these packages in 2013, the Common Vulnerabilities and

Exposures (CVE) registered two incidences for Nagios, buy they affect versions prior

3.4.4 (CVE, 2013), thus the scores remains equal at one for all three evaluated OSS

packages. For both Zabbix75 and Cacti76, one vulnerability in 2012 and none on 2013

have been found.

6.4.8 Roadmap

The Cacti development roadmap is available for future releases, including the one

that was targeted. Cacti 1.0.0 will be released during the first quarter of 2013; Cacti 1.1.0

in the third quarter of 2013 (Cacti.net, 2011). The roadmap also contains major features

for each version. Additionally, its last development check-in in Cactis SVN repository

(Cacti.net, 2013) was on January 4th, 2013, by the author Gandalf", the primary

contributor, which indicates that the primary contributor is still involved in the project.

75
Zabbix: Vulnerability Statistics - http://jmll.me/tbc44
76
Cacti: Vulnerability Statistics - http://jmll.me/tbc17

144
No product roadmap resource was found for Nagios, neither in the NE site nor the

Nagios Community site. With respect to activity, the last Nagios Core stable version

(3.4.4), part of 3.x distributions, dates to January 12th, 2013.

The Zabbix roadmap for the next version (2.2) is available at the Zabbix.org

wiki77. The roadmap is divided into two main branches: time and functional. The time

roadmap has no explicit year, but because the page, last edited January 9th, 2013, states

May 1st (no year) we anticipate the release date for version 2.2 on that date in 2013. The

functional roadmap provides a significant feature set description, and describes in a

document called Whats new available in the documentation site78. The most recent

modification to the Pre-2.1.0 (alpha) source code was made on February 9th, 2013

according to the Developers page at Zabbix site (Zabbix SIA, 2013).

Using the values from Figure 24 we completed Table 52 to obtain the final score

for the roadmap criterion including project activity as well.

Roadmap
Indicator\Package Cacti Nagios Zabbix
Roadmap 2 0 2
Project Activity 3 3 3
Final Score 2.5 1.5 2.5

Table 52 Roadmap scorecard: Cacti, Nagios and Zabbix.

77
Zabbix Roadmap - http://jmll.me/tbc45
78
Whats new on Zabbix 2.2 - http://jmll.me/tbc46

145
6.4.9 Performance

In this case a simple question is asked: Does the OSS package have performance

tuning parameters?

Consider first Cacti. It can use more memory by installing a plugin called

BOOST, which enables the ability process a much greater number of data sources

processed per pass (up to 400,000). It also enables image caching, in order to save

rendering and processing resources and improve front-end performance. Also, Cactis

default poller (written in PHP) can be replaced with Spine (C based) for high-

performance multi-threaded polling.

Tuning Nagios For Maximum Performance (Nagios Core, 2013) is a document

available in the Nagios Core Documentation site. In it is described a way to monitor a

large number of hosts and services (more than 1000). Subjects include optimizing

hardware for maximum performance and a set of tunable configuration parameters for the

application and operating system.

Zabbix, in its main documentation site, has a best practices article where it

recommends some general advice on hardware, such as using SCSI or SAS instead of

IDE or SATA, fast RAID storage, fast Ethernet adapter and plenty of memory. While it

does not specify how much of each resource to use, it does discuss which resources

matter most. Furthermore, recommended configuration parameters for optimal

146
performance are shown. This is a multi-tier system and also includes best practices for

the database engine, which is the most important part of Zabbix tuning (Zabbix SIA,

2013).

In this case Cacti is assigned a score of one (1) for this criterion, because it has

been designed to handle large environments and offers plugins and other options to tune

for a larger environment; Nagios is assigned a score of one (1) notwithstanding its limited

performance tuning options in the OSS version Nagios Core, but being pre-compiled is

a significant performance advantage which we felt offsets the limited tuning options; and

finally, a score of two (2) is assigned to Zabbix for providing explicit recommendations

and best practices documentation, and being pre-compiled in C.

No additional metrics were included because it would require a pilot project,

which was beyond the scope of this evaluation.

6.4.10 Scalability

Cacti can be considered a vertically scalable monitoring system, because its capacity

is increased by adding memory, CPU and storage, rather than adding more nodes which

makes evident the lack of scalability. It has been documented that the largest Cacti

installation comprises more than 1,000,000 data sources. To accomplish this, a couple of

resources were required: BOOST plugin and MySQL memory tables (Scheck, 2012). The

147
current 0.8.8 architecture meets the multi-polling strategy to gather data from various

devices at the same time using Spine to get a better performance.

Nagios can scale vertically and horizontally, however configuring a large-scale

installation is difficult and differs significantly from the default installation. In order to

accomplish horizontal scalability plugins like Distributed Nagios eXecutor79 are needed.

This plugin basically offloads a significant portion of the work normally done by Nagios

to a distributed network of remote hosts (Intellectual Reserve, Inc., 2007).

Zabbix does scale horizontally, vertically and is cloud-based enabled (Vladishev,

2011). Zabbix is self-described as Enterprise Ready, due to its ability to scale from small

environments to large ones with thousands of devices. There are Zabbix installations with

over 100,000 devices monitored, showing that Zabbix is able to process more than

1,000,000 checks per minute using mid-range hardware and collecting gigabytes of

historical data daily (Zabbix SIA, 2013).

Cacti Nagios Zabbix


Cacti Nagios Zabbix
Score Score Score
Overall Scalability Vertical Both* Both 1.0 2.0 3.0
Linear Scalable Yes Yes Yes 1.0 1.0 1.0
Total 1.0 1.5 2.0

Table 53 Scalability scorecard: Cacti, Nagios and Zabbix.

79
Distributed Nagios eXecutor - http://jmll.me/tbc47

148
According to Calvin (2013) when comparing vertical and horizontal scalability to

linear scalability, linear scalability wins every time; there is no advantage to horizontal

scalability if it is not linear. Application may be linear scalable within defined

boundaries, for example, exceeding the maximum number of devices may push an

application beyond its linearly scalable window. Since all evaluated OSS package have

linear scalability, the linear scalability score is meaningless and can be ignored. However,

given a choice between linear and non-linear scalability we would always choose the

linear scalable application. While Nagios should have received a score of three (3) for

being both horizontally and vertically scalable, we have elected to deduct a point due to

the extraordinary complexity of a large-scale deployment. Strictly speaking this should

have been part of the implementation cost.

6.4.11 Documentation

Our investigation clearly showed that all three packages have adequate

documentation sites and most of the sub-criteria covered, however there are certain

elements that were not identified. Appendix H lists all the available documentation

resources that we discovered for Cacti, Zabbix and Nagios. It is important to mention that

not all sources were evaluated for their content.

Table 54 shows both sub-criteria of documentation, user and technical. Our three

candidates have been scored considering the official sources only. The Importance

weighting that has been assigned to each sub-criterion was determined by the customer.

149
User Documentation
Importance Cacti Nagios Zabbix Cacti Nagios Zabbix
Score Score Score
U-Guides 2 1 1 1 2 2 2
How-Tos 2 1 1 1 2 2 2
FAQs 1 1 1 1 1 1 1
Total Score 1.67 1.67 1.67

Technical Documentation
Importance Cacti Nagios Zabbix Cacti Nagios Zabbix
Score Score Score
Developers
API 1 1 1 1 1 1 1
SDK 0 0 0 0 0 0 0
SC 1 0 0 0 0 0 0
KB
Known 2 1 1 1 2 2 2
Issues
FAQS 2 1 1 1 2 2 2
Problems 1 1 1 1 1 1 1
Troubleshooting
Diagnosis 1 1 0 1 1 0 1
Logs 2 1 1 1 2 2 2
Maintenance
Install 2 1 1 1 2 2 2
Configure 2 1 1 1 2 2 2
Optimize 2 1 0 0 2 0 0
Total Score 1.36 1.09 1.18

Table 54 User and Technical documentation scorecard: Cacti, Nagios and Zabbix.

Finally, Table 55 provides the final score for each OSS. While the user

documentation for all three packages was equally useful the technical documentation is

arguably more important for the use of the OSS as the enterprise monitoring system.

Cacti obtained the highest score in that regard.

Documentation
Cacti Nagios Zabbix
Overall 2.00 1.00 2.00
Technical 1.36 1.09 1.18
User 1.67 1.67 1.67
Final 1.68 1.25 1.62

Table 55 Documentation scorecard: Cacti, Nagios and Zabbix.

150
6.4.12 TCO

There is a belief in the absolute freeness of OSS in many organizations, and this is

also perceptible in the University of Toronto culture. Many of the software packages that

UofT runs are OSS; such is the case of the official Identity Provider Service, which is

Shibboleth and the Next Generation Student Information Services (NGSIS)80 based on

the Kuali Foundation81. With the application of this model to Enterprise Monitoring

Software, we have exposed many of the hidden costs that free-software imposes.

The United Steelworkers (USWA), Staff-Appointed Unit, Local 1998 (USW 1998

Staff-Appointed) represents staff-appointed full and part-time administrative and

technical employees at the University of Toronto (University of Toronto, 2012). The

I+TS staff falls in the scope of the union, which henceforth will be called unionized staff.

The USW Salary Grid (effective July 1, 2012)82 mandates unionized staff salaries.

The annual salary used in our TCO calculation is based on the hourly for pay band 16 at

the hiring rate: $77,568 CAD (subject to deductions required by law), thus the hourly rate

would be $42.61 CAD before taxes. The hourly rate also includes an annual increase in

salary of 2%. As previously mentioned we have not included the overhead for the

80
NGSIS - http://jmll.me/tbc49
81
Kuali Foundation http://jmll.me/tbc53
82
USW1998 - http://jmll.me/tbc50

151
employee in our TCO calculations, only the salary. From now on, we will refer to this

type of unionized staff as an Information Technology Analyst 16 (ITA-16).

As previously mentioned, our analysis of OSS packages has a cost, which will

include as part of the Up-front evaluation study category within the TCO calculation.

The application of this model to IT Enterprise Monitoring Software has taken roughly

four hours a day for a week, which makes a total of 20 hours or $852.20CAD (ITA-16).

To streamline the proof of concept (POC) in this evaluation, we have an external

service called JumpBox83 that provides small virtual machine instances "ready-to-use" for

testing or POC with pre-packaged configurations and can be run in any computing

environment that supports virtualization, i.e. VMware, OpenStack, VirtualBox, etc. The

cost for a Gold license of this service is $150 USD/month. We chose the Gold offering

because it included 2 Priority Support incidents.

The internal cost of virtual hardware for a medium-size virtual machine (1vCPU,

2 GB RAM 1 Network Interface Connector (NIC); 150 GB of Storage using RAID

groups with 7200-RPM SATA), provided by the I+TS Virtualization and Storage

Services84, is $1,210CAD per annum, which includes VM Support but it does not include

Operating System installation and administration. HIG staff would perform those tasks,

and the cost is reflected in the TCO as the man-hour cost (ITA-16). Using a 30-day trial

83
JumpBox - http://jmll.me/tbc51
84
I+TS Digital Assets - http://jmll.me/tbc52

152
offering we perform the POC without committing to this cost (University of Toronto,

I+TS, 2012).

The Initial Configuration cost is the time invested in configuring the OSS for

the first run, and thus there is neither a migration cost (data & users) nor a training cost.

Process and best practices refer to the proper configuration of the core elements and

operating systems in order to achieve optimal performance. Theses obviously depend on

a knowledge base for each product, which may take significant time to acquire.

The Cost of support services is in our case simply an in-house unionized

resource tasked 5x5 basis (five hours a day, fives day a week) in pay band 10:

$54,118CAD, or $29.73CAD per hour (ITA-10). A total of $15,459.60CAD is

projected as the annual internal cost of support.

Upgrades and maintenances, executed on an annual basis, will rely on

unionized staff (ITA-16) with neither external consulting services nor external training.

For Cacti Integration, initial Configuration cost consist of two elements: we

estimate 25 hours (ITA-16) for Installation time at ~$1,065.25CAD and configuration

at the application level includes additional modules, templates (data, graphs, etc.),

security and backup, which we estimate to take 20 hours (ITA-16) at $852.2CAD.

Customization for business needs, initial configuration is simply the branding of the

153
login page and specific data-gathering templates, estimated at 10 hours (ITA-16) or

$426.1CAD.

Training, Initial training consists of gathering all the required information for

the installation and use the OSS, which is related to the documentation criterion;

considering that Cacti has more technological documentation, we have estimated 5 hours

(ITA-16) as being plenty of time, therefore $213.05CAD.

Cost of Support services in this case will rely on a unionized staff member

(ITA-10). We have estimated 30 hours (ITA-10) of Support, training for support per

annum or $891.9CAD.

Processes and best practices, Initial Configuration, including performance

tuning and special configurations, is estimated at 10 hours (ITA-16) or $421.1CAD; 25

hours (ITA-16) per annum for recurring configuration or $1,065.23CAD; 30 hours

(ITA-16) per annum for ongoing training for users or $1,278.3 CAD; 24 hours (ITA-

16) per annum or $1,022.64CAD for Maintenance. Thus, the TCO for Cacti is

$72,073.92 with a discount factor of 5%, previously described. Further details can be

seen in Appendix J.

The Nagios Integration, initial Configuration cost consist of two elements:

Installation Time is projected at 40 hours (ITA-16) or $1,704.4CAD, due to the lack of

154
expertise within the organization, and another 30 hours (ITA-16) or $1,278.3CAD for the

Initial configuration.

Customization for business needs, Initial Configuration (performance tuning

and special configurations) includes 20 hours (ITA-16) or $852.2CAD. The

Customization for business needs, initial resources cost includes the data-gathering

templates, which will have to be done from scratch, or by further study. In either case, 20

hours (ITA-16) or ~$852.2CAD is estimated.

The Training, Initial training cost assumes self-training. The staff responsible

will investigate, learn and share knowledge in order to achieve an acceptable level of

expertise. However, the documentation results showed that Nagios documentation is

limited. Therefore, at least 15 hours (ITA-16) or ~$639.15CAD is estimated.

Additionally, 30 hours (ITA-10) or $891.9CAD per annum, have been budgeted for

Recurring Support training. Finally, for Maintenance we have considered 24 hours

(ITA-16) per annum or $1,022.64CAD. The Nagios TCO has been estimated at

$74,422.47 on a three-year period with a discount factor of 5% as well. Further details

can be seen in Appendix K.

For Zabbix, the Integration, initial Configuration cost consists of two elements:

the Installation Time which is estimated in 30 hours (ITA-16) or $1,278.3CAD,

considering that the available documentation is good enough, and the Initial

Configuration cost is estimated at 25 hours (ITA-16) or ~$1,065.25 CAD.

155
Customization for business needs, initial resources cost, like Nagios, includes

the data-gathering templates, which need to be built from scratch unless an investigation

has been done previously. Hence 15 hours (ITA-16) or $639.15CAD have been

estimated. The Processes and Best Practices, Initial Configuration cost includes

performance tuning and special configurations. We have estimated 15 hours (ITA-16) or

$639.15CAD to set everything up.

The Training, Initial training, as already discussed, is strongly related to the

quality of the OSS documentation. For Zabbix, the documentation scored well, thus an

estimated of 10 hours (ITA-16) or ~$426.1CAD is reasonable. Additionally, 30 hours

(ITA-10) or $891.9CAD per annum have been estimated for Support, Training. Finally,

for Maintenance we have estimated 24 hours (ITA-16) per annum or $1,022.64CAD.

The calculated Zabbix TCO, including a discount factor of 5% for a three-year period, is

$73,144.17CAD. Further details can be seen in Appendix L.

Appendices J, K and L show the TCO breakdown for Cacti, Nagios and Zabbix

respectively. Summarizing, Cacti has the lowest TCO of the three, primarily because of

the staff time investment required for the other two. The reason for this is evident in the

documentation scores. Documentation and resource availability are critical when staff

must learn by themselves without any additional or third party training.

156
Therefore, the cost reduction perceived by EIS in using OSS does not bare

scrutiny. Hidden costs exist. For instance, a week of general training on any package

might reduce the learning curve and the time investment for several of the activities that

impact TCO.

Support costs can be all but dismissed when compared with CSS support costs,

which routinely exceed on hundred thousands dollars per year for an institution the size

of UofT. However, OSS paid annual support solutions can decrease the time wasted in

resolving common problems.

By calculating the TCO for a Free software product we dispel the myth that free

software has no cost. Furthermore, it shows that only the purchase price is reduced with

OSS practically all other costs remain similar.

From this TCO analysis we are now prepared to rank our three OSS candidates

for IT Enterprise Monitoring software: Cacti had the lowest TCO; Zabbix was second

and Nagios was third, but in all fairness, the delta between highest and lowest is less than

3.5% of the lowest TCO or about $2,400CAD. Thus we assigned Cacti a score of three

(3), Zabbix a score of two (2) and Nagios a score of one (1).

157
6.5 Phase 4 Valuation

In Table 56 all of the criteria scores have been consolidated along with a

weighting or importance assigned to each criteria. Multiplying the individual score by the

weighting factor, and then by totaling the weighted scores, we obtained a final score for

each candidate. The right-hand column holds the maximum score attainable for each

criterion, which yields a value of 18 for a Perfect score.

Criterion Weight Cacti Nagios Zabbix Perfect


% % % %
Functionality 2 73% 78% 83% 100%
License 1 100% 100% 100% 100%
Community 2 33% 100% 33% 100%
Seniority 1 100% 100% 67% 100%
Support 1 100% 67% 100% 100%
Interoperability 2 84% 89% 63% 100%
Security 1 100% 100% 100% 100%
Roadmap 1 83% 50% 83% 100%
Performance 2 33% 33% 67% 100%
Scalability 2 33% 50% 67% 100%
Documentation 2 98% 73% 95% 100%
TCO 1 100% 33% 66% 100%
Final Score 12.77 12.99 13.15 18.00
Final Percentile 71% 72% 73% 100%

Table 56 final scorecards of Cacti, Nagios and Zabbix.

Using this technique affords us an upper bound against which we can judge the

relative importance of the actual scores of the three candidates. This is made clear by

converting the scores into a percentage. A margin of error in our evaluation process of

more than 1% in this case, would mean that all three scores were effectively identical.

158
Functionality
100%
TCO License
80%

60%
Documentation Community
40%

20%

Scalability 0% Seniority

Performance Support

Roadmap Interoperability

Security

Cacti % Nagios % Zabbix %

Figure 29 Radar graph comparing criteria scores for Cacti, Nagios and Zabbix.

159
6.6 Phase 5 Selection

Consider Table 54. The scores tell us that we have a statistical dead heat; that is there

is no obvious best choice based on the final scores alone. However, Table 54 gives us

insight into the deficiencies of the existing OSS package (Cacti) and since Zabbix

scalability and core features fulfill most of UofT instrumentation needs, suggests that a

closer look at Zabbix is warranted.

The results dont suggest that a change from Cacti to Zabbix would substantially

change the cost or efficacy of the Enterprise Monitoring Solution. Reviewing these

results with the customer it was concluded that operating both could offer significant

benefits. Additional proposed strategies include a Zabbix implementation plus integration

with the current Cacti instance, in order to work as a distributable monitoring system, and

take advantage of the features of both systems.

Since EIS and HIG are responsible for the installation, operation and maintenance of

the Data Centre and the entire Virtualization Infrastructure, and considering the

remarkable expertise of the staff, the TCO of a second package is dwarfed by the other

operating costs of the Data Centre.

The current EIS Cacti installation login screen is shown on Figure 31, and further

screenshots in Appendix J where the installed plugins and current configuration

parameters are managed. Considering that Cacti is already implemented and monitoring

160
the electrical, environmental and operational indicators, and considering that it is already

in use to monitor the Learning Management System, it makes little sense to retire it.

Cacti scalability is still an ongoing concern and the short-term strategy is to

implement Spine, the fast C-compiled replacement for the PHP poller, Another strategy

to mitigate this issue is to create isolated Cacti instances and create a development

position in order to customize or add the features that it lacks. Ironically, even student

labor is paid at union rates for programming.

Figure 30 Enterprise Infrastructure Solutions Cacti Instance at https://cacti.eis.utoronto.ca

161

Vous aimerez peut-être aussi