Académique Documents
Professionnel Documents
Culture Documents
http://quattor.org
IT Post-C5, 12.12.2003
Outline
Concepts
quattor in a nutshell
Used
>1700 nodes out of ~ 2000 Multiple functionality (batch nodes, disk servers, tape servers, DB, web, ) Heterogeneous hardware (memory, HD size,..)
Part
of
, together with
Autonomous nodes:
Central control:
Reproducibility:
Scalability:
Use of standards:
Portability:
Linux, Solaris
Managing Computer Centre Machines with Quattor Post-C5 Cancio, Poznaski - n 4
Node
Configuration Management
Node Management
Configuration Management
Configuration Database Configuration access and caching Graphical and Command Line Interfaces
Automated node installation Node Configuration Management Software distribution and management
Managing Computer Centre Machines with Quattor Post-C5 Cancio, Poznaski - n 5
Configuration Management
Configuration Information
Configuration Information
is arranged in templates
Using
lxbatch
lxplus
lxplus001
lxplus029
GUI
CDB RDBMS S O A P
S Q L
CLI
pan XML H T T P
Scripts
Cache
CCM
P E R L Node
Configuration
machines.
Data
Configuration
Going
Conflicts
can ask about properties spanning across machines can run SQL queries (SELECT) and create views:
give me all machines with more than 512 Mbytes of memory give me all machines that belong to lxplus
Portability:
Hardware
CPU Hard disk Network card Memory size Node location in CC Repository definitions Service definitions = groups of packages (RPMs) Partition table Load balancing information Cluster name and type Batch master Contract type and number Purchase date
Managing Computer Centre Machines with Quattor Post-C5 Cancio, Poznaski - n 11
Software
System
Cluster information
Audit information
Provides
Information
Access
Software Servers
http nfs ftp
packages (RPM, PKG)
cache
RPM, PKG
SWRep
packages
Installed software
kernel, system, applications..
Install server
System services
AFS,LSF,SSH,accounting..
nfs/http
base OS
CCM
dhcp pxe
Install Manager
Node (re)install
CDB
Managing Computer Centre Machines with Quattor Post-C5 Cancio, Poznaski - n 15
Install Manager
Which OS version to install Network and partition information What core packages Custom post-installation instructions
Automated generation of control file (KickStart) It also takes care of managing DHCP (and TFTP/PXE) entries Can get its configuration information from CDB or via command line
NCM (Node Configuration Manager) is responsible for ensuring that reality on a node reflects the desired state in CDB. Framework system, where service specific plug-ins called Components make the necessary system changes
Regenerate local config files (eg. /etc/sshd/sshd_config) Restard/reload services (SysV scripts) configuration dependencies (eg. configure network before sendmail)
Porting of SUE features to NCM components started, to be completed with next CERN certified Linux and Solaris versions.
Currently available: grub, quota, snmp, dns, automounter, network, inetd, globuscfg, spma, sysaccounting, edg-cfg,
keep portability between Linux/Solaris whenever possible
Managing Computer Centre Machines with Quattor Post-C5 Cancio, Poznaski - n 17
tool geared towards sysadmins/operators allows to query/visualize the nodes configuration profile.
Universal
Extendable to multiple platforms and packagers (RH Linux RPM, Solaris PKG, others like Debian pkg) Multiple package versions/releases
Management
Client
Replication:
SPMA = Software Package Management Agent Manage all or a subset of packages on the nodes
On production nodes: wipe out unknown packages, (re)install missing ones. On development nodes (or desktops): non-intrusive, configurable management of system and security updates.
Plug-ins available for Linux RPM and Solaris PKG, (can be extended)
Scalability:
Package pre-caching
Possible to access multiple repositories (division/experiment specific) Modularity: can be configured via CDB, or locally
Managing Computer Centre Machines with Quattor Post-C5 Cancio, Poznaski - n 20
Deployment
>1700 nodes, 15 clusters to be scaled up to >5000 in 2006-8 (LHC) LXPLUS, LXBATCH, LXSHARE, LXBUILD, disk and tape servers, Oracle DB servers
4 RH73 nodes CDB: ~ 260 general templates, and 2 templates per node (one derived from LANDB) : >3800 templates in total
Upgrade from LSF 4.2 to LSF 5.1 on >1000 nodes within 15 minutes, without service interruption
Security
updates:
SSH security updates KDE upgrades (~ 400 MB per node) on >700 nodes etc (~once a week!)
Kernel
upgrades:
SPMA can handle multiple versions of the same package -> Allows to separate in time installation and activation (after reboot) of new kernel NCM component configures which kernel version to use
Estimated effort for moving from LCFG to quattor exceeded remaining EDG lifetime EDG focus on stability rather than middleware functionality
Tutorials
held at HEPiX and EDG conferences have caused positive feedback and interests:
EDG finish-up
End user documentation, install guide, packaging FIO will continue maintaining Quattor (as part of ELFms) after EDG finishes. CDB fine grained access control
Remaining developments
Scalability audit
Data encryption mechanisms: for sensitive data (ACLs) Specialized GUIs and CLIs (eg. operators) Take into account new commands and functionality
Finish SUE migration port of SUE features to NCM components in time for next certified Linux
Deploy software from EDG, LCG, experiment SW Deploy NCM components for local grid services
http://quattor.org
SUE:
Scalability
True hierarchical structures Extendable data manipulation language (user defined) typing and validation Sharing of configuration data between components now possible
Pre/post dependencies
Modularity
True hierarchical structures Extendable data manipulation language (user defined) typing and validation
Clearly defined interfaces and protocols Mostly independent modules light functionality built in (eg. package management)
Portability
Enhanced components
Installation subsystem uses system installer Components dont replace SysV init.d subsystem