Turbo Linux Cluster Server Users Guide

019.453.655.
618.233.785.8
125.664.857.367
978.449.356.785 257.514.369.516
744.335.695.7 451.009.658.007 122.342.981.161 549.326.784.677 738.309.304.390
154.019.250.391
125.664.857.367
463.655.735.962
214.324.369.457
214.324.369.457
122.342.981.161
451.009.658.007
659.212.773.536
645.375.986.542 112.905.35.987.635
744.335.695.787
234.354.956.117
553.125.69.145
738.309.304.390
553.125.69.145
019.453.655.612
978.449.356.785
679.345.245.667
659.212.773.536
767.234.679.565
547.383.211.231 679.345.245.667 767.234.679.565
112.323.612.962
547.383.211.231
212.8
618.233.785.818
234.354.956.117
125.664.857.367
978.449.356.785
214.324.369.457
549.326.784.677
553.125.69.145
019.453.655.612
744.335.695.787 112.323.612.962
? KIJAH IAHLAH $
USER GUIDE
TurboLinux Cluster Server 6 User Guide Version 6..0 September 2000 1999-2000 TurboLinux Inc. All Rights Reserved. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by TurboLinux Inc. TurboLinux assumes no responsibility or liability for any errors or inaccuracies that may appear in this book. This publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means -- electronic, mechanical, recording, or otherwise without the prior written permission of TurboLinux Inc., as long as this copyright notice remains intact and unchanged on all copies. TurboLinux, Inc., TurboLinux, and TurboLinux logo are trademarks of TurboLinux Incorporated. All other names and trademarks are the property of their respective owners. Written and designed at TurboLinux Inc. 8000 Marina Boulevard, Suite 300 Brisbane, CA 94005 USA T. 650.228.5000 F. 650.228.5001 http://www.turbolinux.com/
TABLE OF CONTENTS
P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
About TurboLinux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii TurboLinux Cluster Server Contents . . . . . . . . . . . . . . . . . . . . . . . . . . .viii Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix Contacting Us . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Typographic Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
C HAPTER 1 I NTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

What Is Cluster Server? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Why Use Cluster Server? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4

What Services Can Be Clustered? . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Whats New In This Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6

Separate Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 New Installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 Runs on Red Hat or TurboLinux . . . . . . . . . . . . . . . . . . 1-7 New Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7 Technical Improvements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 NAT Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8 Stateless Fail-over Support . . . . . . . . . . . . . . . . . . . . . 1-9 Delay Settings Separated . . . . . . . . . . . . . . . . . . . . . . 1-9 More Application Stability Agents . . . . . . . . . . . . . . . . 1-9 Added Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Security Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10 Synchronization Tools . . . . . . . . . . . . . . . . . . . . . . . 1-10 Cluster Management Console. . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
TurboLinux Cluster Server 6 User Guide
Enhanced Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configuration Tools. . . . . . . . . . . . . . . . . . . . . . . . . Configuration File Format. . . . . . . . . . . . . . . . . . . . . Error Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-11 1-11 1-12 1-12 1-12 1-13
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14 Hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15 Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-16
C HAPTER 2 C LUSTERING C ONCEPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

What Is a Cluster? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
What Makes a Cluster a Cluster? . . . . . . . . . . . . . . . . . . . . . . . . Related Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NUMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Processing. . . . . . . . . . . . . . . . . . . . . . . . 2-2 2-3 2-3 2-4 2-5 2-6
Components of a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7

Cluster Nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Types of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

Shared Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 Fail-over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
How a Cluster Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12

Traffic Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 2-13 2-13 2-14
Cluster Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Shared Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17

Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 Distributed File Systems. . . . . . . . . . . . . . . . . . . . . . 2-18
ii
Hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Storage Area Networks. . . . . . . . . . . . . . . . . . . . . . . Network Attached Storage . . . . . . . . . . . . . . . . . . . . High Speed Drive Interfaces . . . . . . . . . . . . . . . . . . .
2-19 2-20 2-20 2-21
C HAPTER 3 I NSTALLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Installation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 Installing Cluster Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 Post-Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Troubleshooting Installation Issues . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Unable to Find Installation Files . . . . . . . . . . . . . . . . . . . . . . . 3-15 Undetectable Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15 Installing on an Unsupported Distribution . . . . . . . . . . . . . . . . 3-16
C HAPTER 4 C ONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Planning the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Typical Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Small Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Larger Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4-3 4-4 4-4
Configuration Tool Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6

turboclusteradmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6 tlcsconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10 Service Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
Servers Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forwarding Mechanisms . . . . . . . . . . . . . . . . . . . . . . Direct Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Server Groups Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16 4-18 4-19 4-19 4-20 4-20
Advanced Traffic Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23

Advanced Traffic Manager Systems. . . . . . . . . . . . . . . . . . . . . . 4-24 Advanced Traffic Manager Settings . . . . . . . . . . . . . . . . . . . . . 4-24
iii
Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27 Global Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30

Security Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30 Network Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32 NAT Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
C HAPTER 5 C ONFIGURING C LUSTER N ODES . . . . . . . . . . . . . . . . . . . . . . 5-1

Configuring a Linux or UNIX Cluster Node . . . . . . . . . . . . . . . . . . . . . . 5-3
Tunneling Cluster Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Configuring a Windows NT Cluster Node . . . . . . . . . . . . . . . . . . . . . . . . 5-7 Configuring a Windows 2000 Cluster Node . . . . . . . . . . . . . . . . . . . . . 5-11 Configuring Cluster Nodes on Other Systems. . . . . . . . . . . . . . . . . . . . 5-16
C HAPTER 6 C ONFIGURATION F ILE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

The clusterserver.conf File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 Global Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Security Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 Network Mask Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4 NAT Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
UserCheck Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 Defining Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Servers and ServerPool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 ServerPool Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
AtmPool Section. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 VirtualHost Section. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
C HAPTER 7 ADMINISTRATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

Administrative Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Tuning the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 Kernel Table Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 Time Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
iv
Synchronization Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6

tlcs_content_sync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 tlcs_config_sync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
Cluster Management Console (CMC) . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18

Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daemon Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using /proc/net/cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/config . . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/connections . . . . . . . . . . . . . . . . . /proc/net/cluster/debug . . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/nat . . . . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/servers. . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/services . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/stat. . . . . . . . . . . . . . . . . . . . . . . /proc/net/cluster/timeout . . . . . . . . . . . . . . . . . . . . Common Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synchronization Tools Fail . . . . . . . . . . . . . . . . . . . . Verifying That the Cluster is Working . . . . . . . . . . . . . Determining Which ATM is the Primary. . . . . . . . . . . . Cluster Generates a Lot of Extra Traffic. . . . . . . . . . . . 7-18 7-19 7-22 7-23 7-23 7-24 7-25 7-25 7-26 7-27 7-27 7-28 7-28 7-29 7-30 7-30
C HAPTER 8 C LUSTER S ERVER A RCHITECTURE . . . . . . . . . . . . . . . . . . . . . 8-1

SpeedLink Kernel Module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Kernel Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 ip_cs Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 Compiling the Kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
Cluster Server Daemon (clusterserverd) . . . . . . . . . . . . . . . . . . . . . . . . 8-7 Application Stability Agents (ASAs) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9 Synchronization Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12 Cluster Management Console (CMC) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14 Putting All the Pieces Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-16
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17
G LOSSARY I NDEX
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-1
vi
P REFACE
Thank you for purchasing TurboLinux Cluster Server 6. We realize that you have many choices in selecting your clustering solutions. We have worked hard to make our software powerful, flexible, and easy to use. We are dedicated to offering the highest performance at the lowest cost with TurboLinux Cluster Server and all our products. This manual provides instructions for installing, configuring, and using TurboLinux Cluster Server 6. It can also be used as a reference guide for the more advanced features of the product. The manual will also explain what clustering is and why you might want to create a cluster.
About TurboLinux
TurboLinux, long the Linux leader in the Pacific Rim, is taking the world by storm. We have been working with Linux since 1993. We decided to offer our own distribution in 1997 with both English and Japanese language versions. We now offer TurboLinux Workstation and Server distributions in English, French, German, Italian, Spanish, Portuguese, Chinese, Japanese, and
vii
Russian. For the latest information on our fast-growing company, please visit our web site at http://www.turbolinux.com. TurboLinux is also the leader in enterprise-class Linux solutions. TurboLinux Cluster Server is just one of the many products that can be used in large enterprise environments, as well as in smaller companies that need the flexibility to grow. Our success and your satisfaction with TurboLinux are all made possible through the magic of the Open Source movement and the original creator of Linux, Linus Torvalds. We want to thank Linus and the thousands of developers around the world who contribute to making the magic possible.
TurboLinux Cluster Server Contents

Unlike previous versions, TurboLinux Cluster Server 6 runs on top of an existing operating system. Therefore, we have included a copy of TurboLinux Server 6.0 using the 6.0.5 release in the box. To install the product, you should install TurboLinux Server, unless you already have an existing TurboLinux Server or Red Hat Linux system. If you have an earlier version of the TurboLinux Server distribution, you should upgrade to the TurboLinux Server release included in the box. The TurboLinux Cluster Server 6 product includes the following materials: TurboLinux Cluster Server 6 Install CD TurboLinux Server 6.0 Install CD Using 6.0.5 Release Set of floppy diskettes labeled Boot and Extra Hardware. These can be used to install TurboLinux Server This manual, the TurboLinux Cluster Server 6 User Guide TurboLinux Server User Guide
viii
Registration card, including the serial number License agreement (in the TurboLinux Cluster Server 6 User Guide) Helpful Hints for Cluster Server, containing important information that was made available after the printing of this manual
Registration
You will be unable to fully utilize the Cluster Server product until you register it. The registration card included in the box contains a unique serial number. You must use this serial number to register the product and receive a license file. To register, browse to http://www.turbolinux.com/register/tlcs6. There you will be asked to enter your serial number, as well as some information about yourself and your company. The registration process will return a license file, which must be placed in the /etc/clusterserver/.licenses directory.
Support
TurboLinux provides 60 days of email installation support at no charge once you have registered your purchase at the web site. With our clustering products, we also offer 60 days of phone support at no additional charge. This support will help you get the product installed and operational. Additional support options are available, at hourly and daily rates. You may also find valuable information in the support section of our web site, at http://www.turbolinux.com/support.
ix
Contacting Us
We value your feedback. While every measure is taken to ensure the accuracy of our documentation, you may find some mistakes or oversights. Please let us know when you find something that you feel should be corrected, or if there is an important part of our product that you feel could be better explained. Please send us your input on any aspect of our products and supporting documentation. We listen to our customers. Email your suggestions to feedback@turbolinux.com.
Prerequisites
This manual assumes that you understand the basics of the Linux operating system and TCP/IP networking. You should be comfortable using the Linux or UNIX command line to perform routine system administration tasks. You will need root access to the systems within the cluster, and should be familiar with the responsibilities that come with having root access. You should also be familiar with IP addresses, network interfaces, subnets, subnet masks, port numbers, and daemons.
Typographic Conventions
This manual uses the following conventions: Monospace indicates utilities, commands, programs, and text examples that need to be entered exactly as shown. File names and directory paths are shown in Arial font.
Italics indicate CD and book titles, and emphasize words. Menu items and buttons are enclosed in single quotes. Command lines will start with a dollar sign ($) prompt, or a hash symbol (#) prompt if root access is required. They will appear in the following format:
$ ls -lAtr pictures # less /var/log/messages
xi
xii
Chapter 1
I NTRODUCTION
This chapter will introduce you to the TurboLinux Cluster Server 6 product. We will examine what the product is and how you can use it effectively to enhance the performance and reliability of your network and the services it provides. We will introduce you to the product, describing what it does and who the target audience is. Next well explain the benefits of using TurboLinux Cluster Server as compared to stand-alone systems and other clustering products. Well take a look at the improvements that have been made to this version of the product compared to version 4.0, the previous release. Finally, well review the software and hardware requirements for running Cluster Server 6.
1-1
Introduction
What Is Cluster Server?

TurboLinux Cluster Server is an enterprise-class solution that allows you to leverage your existing network resources to create scalable and reliable services. With it, you can significantly improve quality-of-service levels for virtually every TCP/IP network service, including web, email, news, and FTP. Cluster Server provides the architectural framework that will allow your network to effortlessly grow to meet new demands. Cluster Server implements load balancing and fail-over support of network services. Load balancing allows the services to run on multiple systems. The cluster will distribute client connections among the servers that make up the cluster. Fail-over allows the service to run on a single server. If that server should fail, another server within the cluster will take over for it. You can think of Cluster Server as similar to RAID. Whereas RAID uses an array of disks, Cluster Server uses an array of servers. Both provide the same features: enhanced speed, reliability, redundancy, and scalability. Cluster Server distributes the workload among several servers instead of concentrating all the work on one large server. However, the cluster will appear as a single machine to clients accessing it.
Target Audience
TurboLinux Cluster Server is targeted at medium to large companies who want to implement the high availability or scalability features at a modest price. Internet Service Providers will find the product useful to provide a higher level of uptime as well as scalability that allows them to add servers to the cluster to improve performance. Large enterprises can use the product to deliver standards-compliant services to large numbers of clients, either internally or on the Internet. Medium-sized companies can use the software to leverage existing computer systems as the companys needs grow.
1-2 TurboLinux Cluster Server 6 User Guide
What Is Cluster Server?
An administrator implementing Cluster Server should be familiar with Linux or UNIX and have a good understanding of TCP/IP networking. While clustering is a fairly simple concept, the implementation details can be rather complex. Troubleshooting any problems that arise will require not only understanding of the concepts behind TCP/IP, but also experience with the real-world problems that can arise. TurboLinux Cluster Server is not a Beowulf cluster, and is not intended to compete with Beowulf. It is not used to cluster CPU-bound processes, but instead focuses on network-based services. If you need a cluster to perform intensive processing tasks, you should consider TurboLinux EnFuzion. (See the EnFuzion web site at http://www.turbolinux.com/products/enf/.)
1 -3
Introduction
Why Use Cluster Server?

Cluster Server provides a cost-effective way to leverage your existing systems to create scalable network services. If it is important that your network remain available as often as possible, Cluster Server may be for you. If you need to provide services that are accessed more frequently than one server can handle, Cluster Server can help by creating a virtual server to handle the additional load. There are several hardware solutions available that perform the same function as TurboLinux Cluster Server. These closed boxes tend to be very expensive and less flexible. By using a Linux-based system, you have finer control of the cluster. You also have the option of running other services on the cluster manager, and can have the cluster manager double as a cluster node. Cluster Server also allows redundancy of the traffic manager itself, so you do not have a single point of failure like many of the hardware-based solutions. Cluster Server is a high-performance solution. The traffic management takes place at a very low level within the kernel. While all incoming traffic must come through the traffic manager, outbound traffic can go from the cluster node directly out to the client. Because most TCP/IP services have larger replies than requests, this is an important optimization. In addition to forwarding traffic, Cluster Server monitors the health and availability of the network resources. It continuously samples all server nodes, verifying that the applications are running properly. This is accomplished through the use of intuitive application polling agents. In addition, each backup traffic manager repeatedly queries the master traffic manager in order to verify that the cluster itself is functional.
1-4
Why Use Cluster Server?
What Services Can Be Clustered?

Many typical network services can be clustered with the Cluster Server product. The main requirement is that the service must be able to be run on more than one machine at a time. Just about any TCP/IP service will work. The following services are commonly used with Cluster Server: Web sites (HTTP, HTTPS) FTP Email (SMTP, POP3, and IMAP) News (NNTP) DNS LDAP
TurboLinux Cluster Server should generally not be used to cluster database servers that are write-intensive. There is no built-in locking mechanism between cluster nodes, so if more than one cluster node is writing to the same database, data could become corrupted. If you need to cluster a database, you do have a few options. If you use the cluster to read the database, and another single system to write to the database, everything should work fine. Another method is to use a two-tier model, with web servers within the cluster accessing a database server behind the cluster.
1 -5
Introduction
Whats New In This Release

This release of TurboLinux Cluster Server differs substantially from the previous version. Many features have been added, and the architecture of the system has changed. Even the name has changed from TurboCluster Server to TurboLinux Cluster Server. This section will outline all the user-visible changes that were made between the previous version (4.0) and this version. The primary changes are: Decoupling from the operating system New names for some parts Technical improvements Added security Cluster Management Console Usability enhancements Licensing changes
Separate Product
The previous version of this product was integrated into its own Linux distribution. This version has been decoupled from the operating system and is packaged as a separate product. Thus, it now requires a Linux distribution to have already been installed. It is recommended that you use TurboLinux Server 6.0 using the 6.0.5 release or later. You can also use Red Hat Linux 6.2. There are several advantages to having the clustering product distributed separately from the Linux distribution. First, it is easier to upgrade the operating system or the Cluster Server product separately. It is also easier to troubleshoot problems, because they can be isolated as either problems with the clustering software or the underlying operating system. Finally, you have the option to install the product on different Linux distributions, providing
1-6
you with more flexibility. If you have another software package that will only run on certain versions of Linux, you may now be able to use Cluster Server on that system as well.
New Installer
Since the previous version of the product was only available bound to its own Linux distribution, it was installed along with the operating system. With the new stand-alone version, a new installation tool has been created to install the various pieces. The installation program will guide you through the process. It is a menu-based program with an easy-to-use interface. The installation program will be covered in detail in chapter 3.
Runs on Red Hat or TurboLinux

Because the product is no longer bound to the operating system, it has been made to work under Red Hat Linux as well as TurboLinux Server. You will need to run TurboLinux Server 6.0 using the 6.0.5 release or later, or Red Hat Linux 6.2. No other Linux distributions are currently supported.
New Names
The name of the product has been changed from TurboCluster Server to TurboLinux Cluster Server. This is partly to distinguish the fact that it is now a separate product from the operating system. Due to this name change, many
1 -7
Introduction
of the components have also been renamed since version 4.0. Here is a table of some of these name changes. Table 1.1 Changed Component Names TURBOCLUSTER SERVER 4.0 NAME
turboclusterd turbocluster_sync tl_sync /etc/turbocluster.conf /var/log/turboclusterd.log TCSWAT
TURBOLINUX CLUSTER SERVER 6 NAME

clusterserverd tlcs_config_sync tlcs_content_sync /etc/clusterserver/clusterserver.conf /var/log/clusterserverd.log CMC (replacement for TCSWAT)
Technical Improvements
Version 6 has several technical improvements over the previous version. These include: NAT forwarding method Fail-over support Ability to specify different intervals for server and application checks More Application Stability Agents (ASAs)
NAT Support
In addition to the previously supported forwarding methods, Cluster Server 6 allows you to use Network Address Translation (NAT). So you now have three choices: direct forwarding, tunneling, or NAT. NAT is a technology normally used to hide a private network behind a firewall connected to the Internet. It allows traffic coming from and going to the private network to appear as if it is coming from one system. NAT simplifies
configuration, because you do not need to make any special changes to the cluster nodes themselves, except for setting the default gateway. It also provides some added security, because the cluster nodes cannot be accessed directly from the outside. The downside is that NAT has slightly reduced performance because all outbound traffic must go through the NAT box. The NAT system used in Cluster Server is implemented in accordance with RFC 1631, the Internet standard describing NAT.
Stateless Fail-over Support

In addition to load balancing, TurboLinux Cluster Server now also allows you to implement fail-over services. Whereas load balancing has two or more systems providing the same service at once, fail-over will use only one server at a time. Only if that server goes down will any of the other servers listed for that service be forwarded any network traffic.
Delay Settings Separated

Cluster Server has two different checks that it performs on cluster nodes. First it checks to see if the server responds to a network ping. Then it runs an Application Stability Agent (ASA) to determine if the specific services required are responding. In the previous version, the intervals for these two types of checks were tied together. Version 6 allows you to specify a different interval for each.
More Application Stability Agents

We have included more Application Stability Agents (ASAs) in this version. These include agents to connect to various enterprise-level databases, such as Oracle and DB2. The full list of ASAs is: DB2Agent dnsAgent
1 -9
Introduction
ftpAgent genericAgent httpAgent httpsAgent http10Agent imapAgent nntpAgent oracleAgent popAgent smtpAgent
Added Security
Several security features have been added to ensure the integrity of the system and to restrict access to the cluster. These include restricting access to the system and the use of Secure Shell (SSH) to transfer data between cluster nodes. In addition, the CMC program uses SSL-encrypted HTTPS, whereas the TCSWAT program that it replaces used regular unencrypted HTTP.
Security Settings
You can now specify systems to deny or allow access to the remote configuration capabilities of the cluster. These are similar to the TCP wrappers settings configured in the /etc/hosts.allow and /etc/hosts.deny files. You can specify individual hosts or ranges of IP addresses. These settings will be covered in more detail in the configuration chapters.
Synchronization Tools
The synchronization tools now use SSH to securely transfer data. This includes the transfer of both configuration information and content.
1-10
F-Secure SSH version 1.3.7 is installed with the Cluster Server package. If you have any other version of SSH on your systems, you should remove it to ensure full compatibility.
Cluster Management Console

A new web-based management system has been created, called Cluster Management Console, or CMC. This tool replaces the TCSWAT program from the previous version. The new tool has more functionality and provides more information about the cluster. CMC is used to monitor the current performance of your cluster, and can be used to dynamically modify the clusters settings. One of the most powerful features of CMC is the Traffic Monitor. It generates a real-time graph of the clusters performance. Log files can be displayed in CMC, and you can also look at the online documentation, including man pages. You can also stop and restart the Cluster Server daemon from the CMC web page. CMC will be covered in more detail in chapter 7.
Enhanced Usability
Several features have been added to increase usability. These include: Changes to the configuration tools Simplified configuration file syntax Improved formatting in log files
Configuration Tools
The configuration tools have been updated to be easier to use. Some of the terms used have been simplified, as have some of the menus. The tools have
TurboLinux Cluster Server 6 User Guide 1-11
Introduction
been made more user-friendly. The addition of the web-based Cluster Management Console also improves usability of the software.
Configuration File Format

The syntax of several options has been made more clear. Example configuration files are provided. While the format of the configuration file is pretty straight-forward, you should use the configuration tools when possible. The format of the configuration file has been changed from the format used in the 4.0 version, but it is pretty simple to convert an existing file to work with 6. Simply edit your /etc/clusterserver/clusterserver.conf file and remove the port numbers and in the AddServer lines. For more information on the configuration file format, see chapter 6.
Error Logs
The format of the error log files has been made more readable. Many of the messages have been clarified, and where possible they have been shortened to fit within 80 columns. This should help you when troubleshooting a problem with the cluster.
Licensing
The program now features license activation codes to enable the program. This allows more flexibility in pricing structures and allows us to provide customers with evaluation copies that time-out after a certain period of time. With the activation code system, if you are using a demo and decide to purchase a full license, you can simply copy new license files to the server and will not have to re-install the product. License files are cumulative. If you purchase a license for 2 ATMs and 2 nodes, and another license for 2 ATMs and 10 nodes, you will be able to use
1-12
up to 4 ATMs and 12 nodes. However, note that a system acting as both an ATM and a cluster node requires both an ATM license and a node license.
Registration
To use the product, you will need to register it. To register, browse to the registration web site at http://www.turbolinux.com/register/tlcs6. There you will be asked to enter the serial number that was provided in the box, as well as some information about yourself and your company. The registration process will return a license file, which must be placed in the /etc/clusterserver/.licenses directory.
1-13
Introduction
Requirements
TurboLinux Cluster Server is used to combine the resources of several computers. The requirements for each of these computers varies according to its function within the cluster. The two main functions are advanced traffic manager (ATM) and cluster node. Cluster nodes are simply systems that provide network services. The traffic manager is the machine that receives all incoming packets and forwards them to the cluster nodes. You will also have backup traffic managers, which will become active only if the primary ATM fails. A system may be configured to function as both a traffic manager and a cluster node at the same time.
Software
All traffic managers must have TurboLinux Cluster Server installed and running. Cluster nodes that are not traffic managers are not required to run the Cluster Server product. They can run any operating system, including Linux, UNIX, Windows NT, and Windows 2000. However, it will simplify cluster management if all the systems are running the same operating system and the Cluster Server software. To run Cluster Server you will need to have a Linux server running either TurboLinux Server or Red Hat Linux. (Note that the previous version of Cluster Server was integrated with TurboLinux Server; this version requires you to install TurboLinux Server prior to installing Cluster Server.) If you run TurboLinux Server, you must have version 6.0 using 6.0.5 release or later. For Red Hat systems, you must be running version 6.2. The product may be able to run on other Linux systems, but due to quality assurance issues, we can only provide support for the distributions mentioned here. TurboLinux Server 6.0 using 6.0.5 release included in the TurboLinux Cluster Server package. If you are running an older version of TurboLinux Server, or
1-14
Requirements
TurboCluster Server 4.0, please upgrade your operating system using the provided software. In addition to the Cluster Server management software, you will need to have software providing the services that are to be clustered. For example, if you are creating a web server cluster, each node in the cluster must be running its own web server. This software is not included with the Cluster Server product, but many network services are included with most operating systems. For example, TurboLinux Server and virtually every other Linux distribution comes with Apache web server.
Hardware
While Cluster Server can be run on modest hardware, such as a Pentium 100 with 32 MB of RAM, the product is designed to provide high performance. We suggest that you use hardware that fits these high performance needs. The hardware specifications for a traffic manager are similar to that of a network router. Choose hardware that is reliable and efficient. The important factors that you will want to focus on are network interface speed, memory, and CPU speed. Today that would mean at least a 100-Mbs Ethernet card, 256 MB of RAM, and a 700-MHz processor. (TurboLinux Cluster Server is only available for Intel-compatible architectures.) Disk space is less critical, unless you are running other services on the machine as well. Be sure to factor in any other software that will be running on the machine. The Cluster Server software itself will take up approximately 40 MB of disk space. Additional space will be required for log files and other administrative tasks. If an Advanced Traffic Manager is supporting NAT cluster nodes, then the ATM should have two network cards. One network card will be used to accept incoming client requests. The other will be used to connect to the NAT private network.
Introduction
The hardware requirements for cluster nodes are the same as if the systems were running stand-alone. The primary concern will be what services are running on the node. There are no additional requirements beyond the hardware recommendations of the operating system and the applications that will be running on the node. In order to provide the highest amount of uptime, you will want to employ as much hardware redundancy as possible. You should obviously use UPSes to ensure that the cluster will remain running in the event of a power failure. You may also want to consider redundant power supplies in each system. To ensure constant data access, you can use a RAID hard drive array. Drive mirroring and RAID 5 can provide redundancy, and hot-swappable hard drives will allow you to replace faulty components. Dont forget to perform routine system backups; redundant hardware cant prevent software catastrophes. A CD-ROM drive is required to install the product. The CD-ROM does not necessarily need to be installed in the server; you may mount the CD-ROM on a different server and access it via NFS or some other method. You will also need a connection to the Internet to download updates and to register the product.
Infrastructure
To run a cluster of network services, you will obviously need to have a stable network. If possible, it is recommended that you have all the cluster nodes on a single subnet, and that this subnet be separate from the rest of the network. This allows the cluster to run at maximum performance, while isolating any problems from the rest of the network. For very high-traffic clusters, you may saturate the bandwidth of a single subnet; in that instance you might have to consider multiple subnets.
1-16
Requirements
While putting all the nodes on a single subnet or LAN is recommended for maximum performance, it is by no means required. You have the flexibility to locate your nodes anywhere, especially when using the tunneling method. However, all the ATMs must be on the same subnet. This is because the ATMs will all need to be able to take on the virtual IP address of the cluster itself. This can only be done on the subnet that would normally contain that IP address. If you are looking to create a high availability web site, you should consider redundant Internet routers on the network. If one of the routers goes down, you can still access the cluster from the outside. For maximum redundancy, the routers should go through separate Internet Service Providers. The high availability of your cluster wont matter much if you become disconnected from the Internet. It is highly recommend that you have a DNS server running to map domain names into IP addresses. Reverse DNS lookups must be working properly as well, resolving IP addresses back into domain names. Like all servers, the systems within the cluster should have static IP addresses, not DHCP-assigned addresses.
1-17
Introduction
1-18
Chapter 2
C LUSTERING C ONCEPTS
This chapter will cover some of the basic concepts that will be required in order to understand how TurboLinux Cluster Server works. You will need to understand these concepts in order to make the most of the product. It will also help you to understand your options when configuring a cluster. We will look at the following topics: What is a cluster? Components that make up a cluster The various types of clusters How a cluster works How to manage a cluster Methods of sharing data between systems
2 -1
Clustering Concepts
What Is a Cluster?
A cluster is a group of individual computer systems that can be made to appear as one computer system. While that definition may sound simple, there are several other similar technologies. The differences between the technologies can be quite subtle. Computer clustering has been around in various forms since the 1980s, originating on the Digital VAX platform. The VMS operating system and VAX hardware combined to provide clustered services. These VAX clusters were able to share hardware resources, such as disk space, and were able to provide computing resources to multiple users. This section looks at what it means to be a cluster. Then it provides an overview of some of the related parallel processing technologies in order to draw some distinctions.
What Makes a Cluster a Cluster?

Clustering is just one form of parallel computing. One of the key points that distinguishes clustering from other related technologies is the ability to view the cluster as either a single entity or a collection of stand-alone systems. For example, a cluster of web servers can appear as one large web server, but at the same time, individual systems within the cluster can be accessed as individual systems, if desired. Because each system in the cluster is a separate computer, each has its own hardware, operating system, and software. Clusters can be either homogeneous, with all the systems running the same software on similar hardware. They can also be heterogeneous, with systems within the cluster running different operating systems on various hardware.
2-2
What Is a Cluster?
Related Technologies
Clustering falls within a continuum of parallel processing techniques. The primary distinctions are based on the level at which resources are shared or duplicated. At the lowest level, a system will have multiple processors on a single motherboard, and share everything else. At the other end of the spectrum, distributed processing employs multiple computers, but the system is generally not viewed as a single entity. Some parallel processing methods are (from tightest binding to loosest): SMP NUMA MPP Clustering Distributed processing
Each of these processes are explained in this section, except for clustering, which we have already covered.
SMP
Multi-processor systems today are generally of the symmetric type. This means that no one processor is any more important than the others, and all resources are equally available to all the processors. Systems of this type are called symmetric multi-processing, or SMP. A single computer has multiple CPUs but a single shared memory space and shared I/O facilities. The idea behind SMP is to transparently break down a computing problem into concurrent processes and allow these to execute on separate processors within the same machine. The emphasis here is on transparency. The same program can run time-sliced on a single processor machine, and the development tools need not even be aware of the underlying parallelism.
2 -3
Clustering Concepts
On an SMP machine, the operating system itself is responsible for dividing up the individual processes making up an application among the available CPUs. SMP machines are best used with operating systems and programs that use threading or light-weight processes. Windows NT is heavily thread-based, and Linux processes are fairly light-weight, so both scale fairly well on SMP hardware. SMP systems with two or four processors are fairly simple to build. Anything beyond that becomes rather difficult, because the processors all need to be able to access all the I/O and memory resources. Beyond four processors, these shared resources start to become a bottleneck, and adding more CPUs provides diminishing returns.
NUMA
SMP computers use a memory sharing scheme in which each processor has the same level of access to all the physical memory in the computer. Such a scheme is known as uniform memory access, or UMA. NUMA (non-uniform memory access) is a more complex technique which allows several processors in a multi-processor computer to share local memory in a more efficient manner than in simple SMP. Each CPU has direct fast access to a single memory area but can access other memory areas on the system with less immediate access. The basic idea of NUMA is to give certain processors an advantage in accessing a given range of physical memory. You can think of a NUMA machine as a sort of intermediate step between simple SMP machines and massively parallel systems. Access to any part of the memory is possible on a NUMA system; it just may take more time to access some memory addresses than others. However, the time to access the non-local memory will still be faster than accessing disk or network I/O. The system bus on a NUMA machine is quite complicated. It is often implemented as a mesh, with many connections to the bus. Coherency is also
What Is a Cluster?
a major issue. You may see the term ccNUMA, which indicates that the system maintains cache coherency. When a CPU is accessing memory, the cache internal to all the other processors must be checked to make sure that they have not modified the data that is being retrieved. NUMA systems try to optimize the main issue with parallel computing: interprocessor communication. In clusters and massively parallel systems, the overhead of communicating between processors is quite high, because the communication must travel across a network of some sort. NUMA uses a highspeed memory bus to communicate via the shared memory. While the speed of accessing non-local memory is not as high as that of a local memory access, it is much higher than communicating over the network. NUMA machines scale very well to a large number of processors -- thus they can sometimes rival the performance of massively parallel systems for calculation throughput. The downside is that, as you might imagine, the design of these machines involves extremely complex algorithms based on nano-split second timings and arbitration schemes. Thus they tend to be rather expensive machines. However, they have a great advantage -- from the perspective of the application software -- all the complex memory arbitration among processors is invisible. Massively parallel systems are blinding fast but almost require a per-problem configuration of the machine to take advantage of the speed. NUMA trades off some efficiency for simplicity of development tools and transparency of resources.
MPP
Massively parallel processing (MPP) is the heavyweight of the parallel computing world. In the MPP model, each node consists of a separate processor with its own dedicated resources. The idea of an MPP system is to break a computing problem down into parts that can be separately computed more or less independently of each other. Likewise, the architecture of the system has units that are fairly independent. Massively parallel systems are
TurboLinux Cluster Server 6 User Guide 2 -5
Clustering Concepts
usually used for high-end compute-intensive operations. For example, the current record holder as the worlds fastest computer is an MPP system used to create a mathematical model to simulate a nuclear blast. MPP is very closely related to clustering, but each node in an MPP system does not usually have full I/O capabilities. Thus each node in an MPP system may not be a viable stand-alone computer. An MPP system is usually larger than a typical cluster, but projects such as Beowulf are definitely blurring the distinctions. One of the problems with MPP is that programs must be written specifically for parallel systems. (This is also a problem with some types of clusters, including Beowulf.) There are two common APIs that are used: PVM and MPI. These APIs concentrate on breaking down a problem into chunks that can be computed in parallel. Thus, if the problem to be solved cannot be broken down in this way, an MPP system will not be of much help.
Distributed Processing
Distributed processing is probably the least well-defined of all the terms we have covered here. Distributed processing basically means that parts of the work to be done are done in different places. The most common example of distributed processing is the client/server architecture. The server has a specific job to perform, while the client performs another portion of the task, generally the task of displaying the information to the user. A distributed system is more loosely coupled than a cluster. In fact, it is usually difficult to see any coupling at all. There generally isnt any single entity that would be managed as a whole. With distributed processing, nodes retain their individual identity, while cluster nodes are usually anonymous. In a distributed processing system, you would say, give me data X from server Y. In a cluster, you would say, give me data X from the cluster.
2-6
Components of a Cluster
Components of a Cluster
There are two primary types of systems that make up a cluster: nodes and managers. The cluster nodes are the systems that provide the processing resources. The cluster manager or managers provide the logic that binds the nodes together to provide the appearance of a single system.
Cluster Nodes
Cluster nodes do the actual work of the cluster. Generally, they must be configured to take part in the cluster. They must also run the application software that is to be clustered. Depending upon the type of cluster, this application software may either be specially created to run on a cluster, or it may be standard software designed for a stand-alone system. TurboLinux Cluster Server and TurboLinux EnFuzion both allow the use of software written for stand-alone systems. Configuring the software to be used within the cluster is usually pretty straight-forward. We will sometimes refer to cluster nodes simply as nodes, servers, or server nodes.
Cluster Manager
The cluster manager divides the work amongst all the nodes. In most clusters, there is only one cluster manager. Some clusters are completely symmetric and do not have any cluster manager, but these are more rare today. They require complex arbitration algorithms and are more difficult to set up. In TurboLinux Cluster Server, the cluster manager is referred to as the Advanced Traffic Manager, or ATM. Cluster Server provides fail-over for the ATM so that there is no single point of failure. If the primary ATM goes down, a backup ATM will be able to fill in and take its place.
Clustering Concepts
Note that a cluster manager may also work as a cluster node. Just because a system is dividing the work does not mean that it cannot do any of the work itself. However, larger clusters tend to dedicate one or more machines to the role of cluster manager, because the task of dividing the work may take more computational power. It also makes it a bit easier to manage the cluster if the two roles are isolated.
2-8
Types of Clusters
Types of Clusters
As you saw in the previous section, the definition of a cluster is pretty loose. So loose in fact, that there is some confusion about how differing technologies can all be referred to as clusters. The fact is that clusters can be implemented for several different reasons. The most common reasons to create clusters are to pool CPU resources, balance a workload among several machines (load balancing), create high system availability, or provide a backup system in case the primary system fails (fail-over). These represent different types of clusters, although there is quite a bit of overlap. TurboLinux Cluster Server can be used to implement high availability, load balancing, and fail-over. It does not provide shared processing in the usual sense of the term. Instead, it provides load balancing of network services. Each server receives incoming network service requests, processes the requests, and sends the reply back to the client.
Shared Processing
When you hear the term Linux clustering, the first thing you probably think of is the Beowulf project. Beowulf is a clustering system that combines the processing power of several systems to provide a system that has a large amount of processing power. It was designed for scientific and CPU-intensive purposes. Programs must be specially written to conform to an API that allows them to have their work distributed across systems. You can get more information on Beowulf at http://www.beowulf.org/. Cluster Server does not provide this type of clustering. Another package that can be used to provide shared processing is EnFuzion. This TurboLinux product has the advantage that programs do not have to be re-written in
2 -9
Clustering Concepts
order to be used on the system. Instead, it is more of a task-based processing system. You can find more information about EnFuzion at its web site: http:// www.turbolinux.com/products/enf/.
Load Balancing
Load balancing is similar to shared processing, but there is no need for communication between the nodes. With load balancing, each node processes the requests it has been given by the cluster manager. The cluster manager will distribute the requests in some manner that attempts to distribute the workload evenly among all the systems.
Fail-over
Fail-over is similar to load balancing. However, instead of requests being distributed among all the cluster nodes, one system processes all the requests. Only when that system goes down will one of the other systems in the cluster take over.
High Availability
While it would be desirable to have all computers working all the time, the reality is that computers do sometimes go down. In some situations this is merely a nuisance, but in others it can be devastating. Therefore computer companies have devised methods of increasing the availability of systems. High availability is a method by which system resources are kept available as often as possible. Clustering provides a convenient way to do this. Instead of paying exorbitant costs for hardware redundancy, multiple systems can be clustered together to provide the needed resources. If one of the systems fails, the others can take over the workload.
2-10
Types of Clusters
High availability can be implemented with either hardware or software. Hardware systems are usually more expensive, but software solutions are generally not cheap either. The more reliability you require, the more you will end up paying. Availability is often measured in percentage of uptime. A typical server may be up 99% of the time, whereas a system designed for high availability may be up 99.99% of the time. This is often referred to as four nines availability. High availability can be achieved using either load balancing or fail-over.
2-11
Clustering Concepts
How a Cluster Works

The cluster manager is the core of the cluster. It makes the determination of how work is to be divided among the cluster nodes. The cluster manager divides up the workload and sends a piece of the workload to each cluster node. The cluster node then processes that piece of work. It either sends the result back to the cluster manager, or it sends the result directly to the client that requested the result.
Traffic Management
For the service-oriented clustering that TurboLinux Cluster Server implements, the workload management is called traffic management. This is because the work to do is to respond to incoming network service requests. The cluster manager must direct network traffic amongst all the cluster nodes. In this way, it acts much like a traffic cop. The traffic scheduling algorithm used by TurboLinux Cluster Server is called modified weighted round-robin. This mechanism tries to ensure that traffic is distributed evenly among all the nodes in the cluster, proportional to the amount of workload that each cluster can handle. Each server is assigned a weight to specify its performance relative to the other systems. The scheduling algorithm is further enhanced to support client persistency. When this feature (also called the sticky bit) is enabled, a specific client will be bound to a particular server within the cluster. Some services such as SSLenabled services require authentication each time a new client connects to the server. Without persistency, each time the client connects to a different server within the cluster, the user is prompted to re-enter their password.
2-12
How a Cluster Works
Cluster Server provides three different ways to forward traffic from the cluster manager to the nodes. These are: Direct forwarding Tunneling NAT
Direct Forwarding
Direct forwarding can be used when the ATM and the cluster node are attached to the same network segment or subnet. Packets forwarded using this method are sent directly to the MAC address of the cluster node. The IP packet is not modified at all; the cluster node will see it exactly as it arrived at the ATM. This is the preferred method, because it is the fastest and has the least overhead. The direct forwarding method also has the advantage that outbound traffic (responses being returned to the client) does not need to be sent through the ATM; reply packets are sent directly out to their destination.
Tunneling
If a cluster node is not located on the same segment as the ATMs, you can use the tunneling forwarding mechanism. Tunneling is a way to encapsulate IP packets within other network traffic. It is used to make a virtual direct connection between two systems. With this point-to-point connection, you can be sure that the packet will arrive on the cluster node via the virtual connection. The tunneling method only works with Linux and UNIX systems. It uses the IP-IP kernel module to create the point-to-point connection between the traffic manager and the cluster node. The kernel in use on the cluster node must be configured to have IP tunneling support. The kernel supplied with TurboLinux Cluster Server has this support built in, and the Cluster Server
2-13
Clustering Concepts
daemon can automatically configure both ends of the link for you. You can also set up the tunnel interfaces yourself, establishing the point-to-point connection by hand. The encapsulation process introduces some overhead that will reduce performance somewhat as opposed to the direct forwarding method. Like the direct forwarding method, outbound packets do not need to be sent through the ATM; they will be sent directly from the cluster node to the client.
NOTE
The IP tunneling used in Cluster Server is not encrypted, so it is possible for others to intercept any packets traveling from the traffic manager to the nodes. If you need to add nodes that are outside your LAN, you should implement a Virtual Private Network (VPN) in order to secure data transmission.
NAT
NAT is an abbreviation for Network Address Translation. It is often used to hide a private network behind a firewall connected to the Internet. Defined in RFC 1631, NAT was designed to help mitigate the rapid depletion of the IP address space. The NAT box sits between the private network and the public network. It modifies outbound packets from the private network to make them appear to have come from the NAT box itself. When packets are sent to the NAT box, it determines which system on the internal network the packet should go to. It normally does this by keeping a table of connections that have been initiated. For each connection made by a client on the private side, the table directs replies to be sent to that client. The version of NAT used by the ipchains package on Linux is sometimes called IP masquerading. If the operation of NAT sounds familiar, thats because it works much like a cluster traffic manager. Although NAT is normally used to hide client systems,
How a Cluster Works
it is used to hide servers when used in a cluster. This difference is important, because it changes the way the connection table is used. In TurboLinux Cluster Server, the NAT method uses the same connection table that is used by the other two traffic forwarding methods. NAT simplifies configuration, because you do not need to make any special configuration changes to the cluster nodes themselves. All you have to do is make sure that the cluster nodes are on the internal subnet, and have their default gateway set to the NAT gateway address defined in the cluster configuration file. NAT also provides some added security, because the cluster nodes cannot be accessed directly from the outside. The downside is that NAT has slightly reduced performance, because all outbound traffic must go through the NAT box and the address translation process. NAT cannot be used with some network services. For example, FTP cannot be used with NAT because it uses two separate TCP connections on different ports. Other services cannot be used if they include IP addresses or port numbers within the high-level portion of the protocol. See RFC 1631 for more details.
2-15
Clustering Concepts
Cluster Management
Managing a cluster is a bit more complicated than just managing all the systems in the cluster. You must maintain each server as well as the system as a whole. Cluster management concentrates mainly on the cluster manager. Thats where all the interesting functionality is implemented. Cluster management primarily involves monitoring the performance of the cluster. You need to monitor each system as well as the whole cluster. If an individual system is overloaded, you can adjust the cluster configuration so that it doles out less work to that system; or there may be some configuration issue with that particular server. You should also monitor the performance of the cluster as a whole. If all the cluster nodes are heavily loaded, you may want to add an additional node or two to scale up the performance. Another important aspect of managing a cluster is making sure all the systems are running the same software and using the same content. TurboLinux Cluster Server comes with some synchronization tools to help you replicate content, so that all the servers are consistent.
2-16
Shared Data Storage
Shared Data Storage

In order for two or more systems to provide the same access to the same data, they must have some way to share that data. This is actually a much more difficult thing to do than would appear at first glance. If the data changes frequently, there must be some way to keep all the systems synchronized. This section looks at some software and hardware solutions that can be used to share data.
Software
The easiest shared storage mechanisms are done through software. Unfortunately, the hardware solutions are more powerful and robust, but in many instances you will be able to use a simple software method to share data.
Synchronization
The most basic way of sharing data is by copying the data in question to each server. Of course, this will only work if the data is changed infrequently, and always by someone with administrative access to all the servers in the cluster. TurboLinux Cluster Server comes with two synchronization tools. One is used to synchronize the configuration of the servers. The other is used to synchronize content. These tools can be run directly or accessed through the turboclusteradmin program. They will be covered in detail in chapter 7. If you can use the synchronization tools to maintain data consistency, you will probably find them to be the easiest solution. They provide you with data redundancy without the need for any complex administration. There are other replication methods available for data. One of the more common replication systems coming into use is the Lightweight Directory Access Protocol (LDAP). With LDAP, you can keep a database that is
2-17
Clustering Concepts
replicated across several systems. This provides a database system with redundancy and reliability, and is relatively easy to set up. LDAP is not a general-purpose database, and does not implement SQL. It is intended as a directory of network information and is object-based. However, you may find that it can be adapted to fit your needs.
Distributed File Systems

If your data changes too frequently to do manual synchronization, you should consider using a distributed file system. Your options here include NFS, AFS, DFS, Coda, Intermezzo, and GFS. UNIX and Linux systems typically use NFS to share data over the network. NFS is a well-known system and is easy to configure as a server or as a client. However, NFS has many problems. It does not have very good security and has no provisions for replicating the data to multiple systems. Thus, if you use NFS, you will most likely still have a single point of failure, which may be one of the reasons you wanted to create a cluster in the first place. Several newer distributed file systems have been developed to overcome the shortcomings with NFS, but none of them have become significant enough yet to replace NFS. One alternative that has much in common with NFS while replacing its broken authentication mechanism is the Andrew File System (AFS). AFS is an outgrowth of the Andrew Project at Carnegie Mellon University in Pittsburgh. AFS is licensed commercial software. The most important aspect of AFS is its secure authentication mechanism, based on the Kerberos protocol. AFS has a number of other performance, usage, and administration enhancements that make it preferable to NFS, even in secured areas. Closely related to AFS is Transarcs Distributed File System (DFS). Both are available commercially from Transarc. DFS is an enterprise-level shared storage solution with sophisticated replication and load balancing
2-18
Shared Data Storage
capabilities. A key design goal in DFS is transparency across domains and networks within an enterprise, allowing for easy centralized administration. The Coda file system is an Open Source distributed file system that now comes with the Linux kernel. Coda is an attempt to create a system much like AFS, with some more modern features as well. It attempts to fix some of the availability problems by providing disconnected operation, server side replication, continued operation during partial network failures, and scalability and bandwidth adaptation features. Intermezzo is another Open Source distributed file system. One of the advantages of Intermezzo is that it sits in a layer above the native file system, allowing you to use any native file system to store the data. It is more aware of modern computing environments and equipment capabilities than Coda. Like Coda, it stresses high availability, large scale replication, and disconnected networks. Intermezzo is still in the beta stages of development at the time of this writing. You can check it out at http://www.inter-mezzo.org/. One of the best distributed file system solutions is the Global File System (GFS). This solution requires hardware support in addition to the file system software. The hard drives must be directly attached to all the systems participating in the file system (i.e. all the nodes in the cluster). This can be done using either double-ended SCSI or fibre-channel.
Hardware
Most high-end shared storage systems are hardware based. The two primary technologies used are Storage Area Networks (SAN) and Network Attached Storage (NAS). Solutions can also be implemented using fibre-channel and double-ended SCSI chains.
2-19
Clustering Concepts
Storage Area Networks

A Storage Area Network (SAN) is a highly fault tolerant, distributed network in itself dedicated to the purpose of providing absolutely reliable data serving operations. Conceptually, a SAN is a layer which sits between application servers and the physical storage devices, which themselves may be NAS devices, database servers, traditional file servers, or near-line and archival storage devices. The software associated with the SAN makes all this back-end storage transparently available and provides centralized administration for it. The main distinguishing feature of a SAN is that it runs as an entirely separate network, usually employing a proprietary or storage-based networking technology. Most SANs these days are moving towards the use of fibre-channel. It should be clear that implementing a SAN is a non-trivial undertaking. Administering a SAN will likely require dedicated support personnel. Therefore SANs will most likely only be found in large enterprise environments.
Network Attached Storage

A NAS device is basically an old fashioned file server turned into a closed system. Every last clock cycle in a NAS device is dedicated to pumping data back and forth from disk to network. This can be very useful in freeing up application servers (such as mail servers, web servers, or database servers) from the overhead associated with file operations. Another way to think of a NAS device is as a hard drive with an Ethernet card and some file serving software thrown on. The advantage of a NAS box over a file server is that the NAS device is self-contained and needs less administration. Another key aspect is that a NAS box should be platform independent. As an all-purpose storage device, a NAS box should be able to transparently serve Windows and UNIX clients alike.
2-20
Shared Data Storage
High Speed Drive Interfaces

Fail-over clustering would not be practical without some way for the redundant servers to access remote storage devices without taking a large performance hit, as would occur if these devices were simply living on the local network. Two common solutions to this problem are double-ended SCSI and fibre-channel. Double-ended SCSI, also known as differential SCSI, exploits a redundancy in the design of SCSI to allow longer SCSI cables and thus make practical high speed outboard storage devices. On a single-ended SCSI cable, every other signal line is actually grounded. Double-ended SCSI uses these redundant ground lines to carry the same signal as the adjacent signal line, with the voltage inverted. The net effect is a signal with twice the strength and thus a much longer potential cable length, up to several feet, without signal loss. Double-ended SCSI suffices when the computers using the external device are more or less adjacent. Fibre-channel interfaces actually use fiber optic cables to carry the encoded SCSI signals via laser light, in much the same way that high speed network interfaces do. These have essentially unlimited local range (up to 6 miles) at high bandwidth and are a key technology in implementing SANs. Of course they are quite expensive in comparison to strictly local interfaces.
2-21
Clustering Concepts
2-22
Chapter 3
I NSTALLATION
This chapter will show how to install TurboLinux Cluster Server. The installation program is pretty simple and will guide you through the process. Once you have installed the product, you must configure it before it can be used in a cluster. Configuration will be covered in the next chapter. In this chapter we will discuss: Installation overview Installing Cluster Server Post-installation Troubleshooting installation issues
Be sure to perform a complete system backup before attempting to install TurboLinux Cluster Server. Like any software installation, there is a small possibility that something could go wrong and corrupt data on the system.
NOTE
3-1
Installation
Installation Overview
TurboLinux Cluster Server must be installed on every primary and backup ATM within the cluster. Although it does not need to be installed on every cluster node, we recommend that you install the software on every system in the cluster. Running Cluster Server on all the nodes will greatly simplify the amount of configuration and maintenance work you will have to do. You will not have to configure the systems individually if they are running Cluster Server, because the daemon will automatically perform the configuration for you. In addition, the content on systems running Cluster Server can be easily synchronized. Without Cluster Server on the nodes, you will likely have to manually synchronize any content to ensure that the cluster remains consistent. Cluster Server is provided on a CD-ROM. If you do not have a CD-ROM drive on each system in the cluster, you can mount the CD on one system and export it using NFS or some other shared file system. Then mount the network share on the other systems to perform the installation. Once you have the CD-ROM mounted, either locally or from a network share, you can change to the directory containing the software and start the installation program. The program will guide you through the process step by step. In most instances you will be able to choose the defaults and press ENTER to continue on to the next step. When the installation is complete, the program will prompt you to reboot. Make sure that you do not have any other applications with unsaved data running on any other consoles. Press ENTER to reboot. The system will shut down cleanly and reboot.
3-2
Installing Cluster Server

Installing TurboLinux Cluster Server is simple if you follow these steps and allow the installation program to guide you through the process. As with any software installation, you will need to be logged in as root to perform these steps. 1. Mount the CD-ROM:
# mount /mnt/cdrom
2. Change to the directory that the CD-ROM is mounted on:

# cd /mnt/cdrom
3. Read any related documentation and release notes, especially the README and RELEASE.NOTES files. (You can also read these files from within the TLCS-install program -- they are accessible via the main menu.) 4. Start the installation program.
# ./TLCS-install
The installation program first determines what Linux distribution it is running under. The currently supported distributions are TurboLinux Server and Red Hat Linux. If the installation program is unable to detect a supported system, it will exit. You can tell the installer which distribution you have by specifying redhat or turbolinux at the command prompt:
# ./TLCS-install turbolinux
There is a test mode available via the --test or -t option, which will not actually install anything, but will instead validate that all the prerequisites exist in order to install successfully. There is also help available with -help or -h, which gives you the syntax and options available.
3 -3
Installation
5. The welcome screen will appear. Press ENTER or click OK to continue.
Figure 3.1 Installation Welcome Screen 6. Read the entire license when it appears before you continue with the installation. You can use the cursor keys to scroll through the text. Once you've read the license, you can click I agree to continue. If you choose
3-4
not to agree with the license, clicking Exit will exit the installation program and return you to the prompt.
Figure 3.2 License Agreement 7. After you agree to the licensing terms, the program will attempt to determine what distribution of Linux you are running. If it is successful, it
3 -5
Installation
will display the name of the distribution along with the kernel version, as shown in the figure below.
Figure 3.3 Detected Kernel Version and Distribution Click OK or press ENTER to continue. 8. This brings you to the installation menu. Your choices here are the guided install, installation of the modified kernel, installation of the libraries and
3-6
utilities, and LILO configuration. You can also access the documentation files from this menu. The menu is pictured below.
Figure 3.4 Installation Menu You should choose Guided Install, which will walk you through the process and install all the necessary pieces. The other options are primarily used to install portions of the product at a later time, or if something goes wrong. The guided install just takes you through each section in turn. 9. Starting the guided installation will begin by warning you that the kernel will need to be replaced. Installing a new kernel could potentially render your system inoperable. (The original kernel will still be available -- just choose linux at the LILO prompt.) Make sure that you have backed up any important data before proceeding. Click Yes to continue, or No if you need to exit and back up the system.
3 -7
Installation
10. At the next screen, choose the kernel you would like to install. The program will do its best to choose kernels that are newer than the one you are running but have a similar configuration.
Figure 3.5 Choosing the Kernel to Install Unless you have a really good reason, you should choose the newest version listed. If there is no suitable kernel, check the TurboLinux web site to see if there is one available for download that will fit your needs. Otherwise you will have to compile and install a custom kernel. This procedure will be covered in chapter 8.
NOTE
If you are running a 2.0 kernel, you should upgrade to a 2.2 series kernel before installing TurboLinux Cluster Server. Upgrading from 2.0 to 2.2 is a major undertaking, and you should be comfortable with those changes before you install Cluster Server. If you are running a 2.4 kernel, you will need to check the TurboLinux Cluster Server web page to see if there is an acceptable kernel available for download.
3-8
Once you have chosen the appropriate kernel, click Proceed to continue the installation. 11. Next you can choose which pieces of the kernel to install. Unless you are running low on disk space, accept the default, which will include all the extra pieces.
Figure 3.6 Kernel Packages You will definitely want to include the base kernel package and the extra kernel utilities. The kernel sources are required if you want to rebuild the kernel at a later time. The header files are required if you want to build any software on the system. You can probably uncheck the support for PCMCIA and iBCS. PCMCIA is a hardware interface mostly used with notebook computers. It is unlikely that you will need PCMCIA support on a server. The iBCS module allows you to run programs that conform to the Intel Binary Compatibility Standard. It allows you to run portable binaries that were written for SCO and other Intel-based UNIX systems. If you dont have any such programs, it is not required.
3 -9
Installation
Click Proceed once youve selected the kernel packages to install. The kernel and additional modules will be installed. This may take a minute or two. 12. After the kernel has been installed, the installer will present you with the administrative tools available. Accept the default, installing all of the listed packages.
Figure 3.7 Package Installation Menu These packages provide the functionality of the Cluster Server as well as several administration tools. Here is a brief overview of what they do: The Cluster Management Console is a web-based tool that allows you to monitor and modify the cluster. It will be covered in chapter 7 of this manual. The Cluster Server daemon is a key component of the Cluster Server software. Do not uncheck it unless you are certain that it has already been installed. The Cluster Agents (also called ASAs) allow you to monitor different services on the cluster nodes. They will be discussed in chapter 8. You
3-10
should install the cluster agents so that the cluster daemon can determine when a service on a cluster node becomes unavailable. The TLCS Administration tools include the menu-driven configuration programs. Unless you want to do all of your configuration by hand, be sure to install these. Once you've selected the packages, click Install, and the specified packages will be installed. Note that some additional packages, which are required by the selected packages, will be installed as well. The installation process may take a couple minutes. 13. Next you will be asked to enter some information in order to generate a secure key for the CMC web-based management system. Click OK to proceed to the data entry screen. 14. The screen will ask you to enter various information about your organization. Enter the information as requested. Click the Generate button, and an SSL authentication certificate will be generated for you. Click OK to proceed.
Figure 3.8 Information for Generating SSL Certificate
3-11
Installation
15. The next step is to configure LILO. LILO is the Linux Loader, a boot program that loads the kernel and gets the system started. Since we added a new kernel, we need to tell LILO how to load it.
Figure 3.9 LILO Setup Click on the Proceed button. The LILO configuration program will make the necessary configuration changes. These changes will add the ClusterServer image to the list of bootable operating systems, and make it the default system to load. You will be able to boot previous images by typing the name of the image at the LILO prompt. Typically, the original image will be labeled linux. Press TAB at the LILO prompt to get a list of images.
3-12
16. Once LILO has done its work, the installation process is complete. Click on the Finish button.
Figure 3.10 Finished Installation 17. Unless you skipped the kernel installation and LILO configuration steps, you will be prompted to reboot the system so the new kernel can be loaded. Make sure that you do not have any other applications with unsaved data running on any other consoles. Click Reboot or press ENTER to reboot your system. If you did not install the kernel or modify LILO, you will be returned to the shell prompt.
3-13
Installation
Post-Installation
The last step of the installation program requires you to reboot the system. This step is required, because Cluster Server requires a modified kernel to implement some of the functionality. The new kernel can only be started by rebooting. Pressing ENTER when prompted will shut the system down cleanly and reboot. When the system comes back up, the LILO prompt will give you a new selection. (To see all the choices, press TAB when the LILO prompt is shown.) In addition to the linux image, there will be a selection labeled ClusterServer. The linux image loads the version of the kernel that had been in use prior to the installation. The ClusterServer image loads the new kernel, which has the modifications that are required to run TurboLinux Cluster Server. If you do not choose an image to load, the ClusterServer image will be loaded by default. This will allow the Cluster Server product to run on the system. After you have rebooted with the newly-installed Cluster Server kernel, you will need to configure the cluster. Configuration is covered in the next chapter. The clustering software cannot be used until it is configured. Configuration entails two primary steps: designing the cluster and entering the details via the turboclusteradmin program.
3-14
Troubleshooting Installation Issues

The installation program is pretty straight-forward and does a good job of detecting your system settings and making the appropriate decisions. However, there are a few problems that you may run into. In this section, we will look at some of those problems.
Unable to Find Installation Files

The first problem that you may run into is the installation program not being able to find the files it requires to complete the installation. This will happen if you try to run the installation program without changing to the directory containing the installation program. To resolve the problem, simply change to the directory containing the installation CD. For instance, if the Cluster Server installation CD is mounted on /mnt/cdrom, you need to perform the following commands:
# cd /mnt/cdrom # ./TLCS-install
Undetectable Distribution
The Cluster Server installation program may be unable to determine what version of Linux you are running. This may happen if you have changed some
3-15
Installation
of the system files on the computer. You will get a message stating that the program was unable to detect your distribution:
Figure 3.11 Undetectable Distribution If this happens, you will need to specify the name of your distribution at the command line:
# ./TLCS-install turbolinux
or
# ./TLCS-install redhat
Installing on an Unsupported Distribution

Although it is not supported by TurboLinux, it is possible to install Cluster Server on other Linux systems besides TurboLinux Server and Red Hat. You basically have two choices on how to go about the installation:
3-16
1. Convince the installation program that you are running either Red Hat or TurboLinux. Specify redhat or turbolinux at the command line when you start the installation program. While this will force the installation program to run, it wont guarantee that the packages will all install or work successfully. 2. Install the packages manually. Locate the RPM packages on the CD and install each one by hand. You can find them in the usr/support and usr/tlcs directories. Use the rpm command to install each package. If you use this method, you will also have to run the usr/TLCS-install/TLCS-GenCert script on the CD-ROM to generate an SSL certificate for CMC. Either way, you will probably have to build a custom kernel. This is covered in chapter 8.It is highly recommended that you use one of the supported Linux distributions instead of attempting to coerce Cluster Server to run on a different system. If you dont have a system on which to install a different distribution, consider purchasing a new machine to put it on. With the low cost of hardware and the Linux operating system, it will be well worth it to use a supported configuration.
NOTE
Remember that TurboLinux cannot offer any technical assistance for distributions other than those that are officially supported.
3-17
Installation
3-18
Chapter 4
C ONFIGURATION
This chapter introduces the TurboLinux Cluster Server configuration tools and provides instructions on how you can configure your cluster. Due to the flexibility of Cluster Server and the different needs that it can address, there is no standard or default configuration. Each cluster will be configured differently, depending upon how it is used and the resources available to build it. This chapter will teach you how to use the configuration tools and provide the detailed knowledge necessary to design and configure a working cluster. The topics covered in this chapter are: Planning the design Configuration tool overview Services Servers Advanced Traffic Managers Clusters Global Settings
4-1
Configuration
Planning the Design

It is very important to plan the design of your cluster before you begin configuring it. You will need to decide which systems will be your ATMs and which will be your nodes. You may also need to decide which services you want to run on the cluster, although that decision has probably already been made before the purchase of the Cluster Server software. One helpful tool that may assist you in designing your cluster is to draw a diagram of the cluster. Typically you put ATMs at the top of the diagram, with nodes hanging down from the ATMs. This represents the fact that traffic must flow through the primary ATM before reaching the cluster nodes. You may want to include host names and IP addresses in your cluster diagrams to make it clear which server is which. (A single system can be both an ATM and a node. Its best to put these dual-purpose systems at the top of the diagram when trying to conceptualize the cluster.) It is recommended that you start out with a simple cluster and work your way up to a more complex configuration, if that is what you require. A smaller cluster is easier to troubleshoot, and you can more easily debug any problems that come up. After you have set up the small cluster and tested it, you can incrementally add ATMs and nodes until you reach your desired configuration. Cluster Server provides you with this flexibility due to its scalability features.
Typical Scenarios
Most clusters will probably be used as web servers. Web traffic has increased substantially in the past several years, and many sites need added reliability, availability, and scalability. While web servers are a prime candidate for clustering, other services can benefit just as well. The design of a cluster is
4-2
Planning the Design
based more on performance considerations and hardware availability than the particular service that will be running on the cluster. This chapter will walk you through the steps required to configure a cluster. The cluster used as an example is of medium complexity, covering all the basic configuration options. It will not be a typical configuration, because we need to cover every available option.
Small Cluster
The smallest cluster possible consists of only two systems. One system will act as the primary ATM and as a cluster node. The other system will act as a node and probably as a backup ATM. By having each system work as both an ATM and a node, you make the most effective use of the resources available. In the example below, we have configured one system as the ATM and another as a node. Note that the node can send reply traffic without going through the ATM.
C lien t In tern et
eth 0
1 .2 .3 .9 (V irtu al IP A d d ress)
eth 0 :cs0 1 .2 .3 .1 0 0
ATM
N ode 1 (D irect)
Figure 4.1 Simple Cluster with One ATM and 1 Node
4 -3
Configuration
Larger Cluster
Larger clusters will dedicate one or two systems as ATMs, with all other systems acting as dedicated cluster nodes. The primary ATM will handle all the traffic forwarding and nothing else. The backup ATM will sit idle unless it detects that the primary ATM has gone down. The cluster nodes will handle only the services. Below is an example of a larger diagram. Note that Node 4 must use tunneling, because it is not on the same subnet as the ATM.
Internet
Client
Primary ATM
Backup ATM
Node 1 (Direct)
Node 2 (Direct)
Node 3 (Tunnel)
Node 4 (Tunnel)
Figure 4.2 Larger Cluster with 2 Dedicated ATMs and 4 Nodes
Complex Cluster
You can come up with some pretty complicated cluster designs. TurboLinux Cluster Server allows you to define multiple cluster addresses and put different sets of cluster nodes in each virtual cluster address. You can even run multiple clusters that share nodes but use different sets of ATMs.
Planning the Design
It is recommended that you read through this manual before designing a complex cluster. You should also start by configuring a simple cluster to get a feel for your options. You dont want to start out with something too difficult and get in over your head.
4 -5
Configuration
Configuration Tool Overview

There are two ways to configure your cluster -- editing the configuration file by hand or using the configuration tools. It is recommended that you use the configuration tools to start with, until you are familiar with all the available options. The configuration tools provide a menu-driven interface that is easy to use and does a good job of explaining what each parameter is used for. The configuration tools take the information that you enter and create a configuration file. They can also parse the file to display the current settings. This chapter covers the configuration tools. The next chapter will cover the format of the configuration file. There are two different ways to get into the cluster configuration tool. The direct route is to use the tlcsconfig program. It will take you directly to the configuration menu. The other way is through the turboclusteradmin program. This tool allows you to access the tlcsconfig program as well as the synchronization tools and some documentation files. Most people will probably use the turboclusteradmin program because it allows you to manage more aspects of the cluster.
turboclusteradmin
The turboclusteradmin program is the primary configuration tool used with TurboLinux Cluster Server. It lets you access three important tools: tlcsconfig, tlcs_content_sync, and tlcs_config_sync. These tools can also be accessed directly from the command line. The turboclusteradmin program provides an integrated configuration tool that
4-6
lets you access these other tools, as well as some of the documentation files that come with the product.
Figure 4.3 turboclusteradmin Main Menu
tlcsconfig
The tlcsconfig program is where the actual configuration takes place. As we stated above, you can run it directly from the command line or select Cluster Server Configuration from the main menu in turboclusteradmin. Once you get into the TurboLinux Cluster Server configuration utility, you
4 -7
Configuration
will be presented with a menu of various subsystems that you can configure. These are shown in the picture below.
Figure 4.4 tlcsconfig Main Menu There are quite a few configuration settings that you will need to go through. The settings are divided into various menus in order to place related items together. You will see the following choices: Clustered Services Servers Configuration Advanced Traffic Managers Virtual Servers Global Settings
We will cover each of these menu choices in the remainder of this chapter. When entering IP addresses and domain names in the configuration program, the addresses will need to be fully resolvable, in both forward and reverse directions. There are two ways to accomplish this -- you can either
make sure that the addresses are configured in your DNS server, or you can enter the addresses in the /etc/hosts file. You should also make sure that 127.0.0.1 is not associated with the hostname of the system. The only name associated with 127.0.0.1 in the /etc/hosts file should be localhost. Here is a sample /etc/hosts file for a system named atm1.turbolinux.usa with an IP address of 192.168.0.1:
127.0.0.1 192.168.0.1 192.168.0.2 192.168.0.3 192.168.0.4 192.168.0.100 localhost atm1.turbolinux.usa atm2.turbolinux.usa node1.turbolinux.usa node2.turbolinux.usa atm1 atm2 node1 node2
cluster.turbolinux.usa cluster
After you have completed the configuration of your cluster, you should use the tlcs_config_sync program to copy the configuration file to other systems within the cluster. You will also need to configure any cluster nodes that are not running Cluster Server software.
4 -9
Configuration
Services
The first set of options that you will want to configure are the network services that will be running on your cluster. To do so, choose the Clustered Services menu item from the main menu of the configuration tool. From here, you will have two pieces to configure: service agents and the services themselves. Well take a look at each of these in this section.
Agents
An Application Stability Agent (ASA) monitors the health of a particular service on the cluster nodes. To set up an agent, follow these steps: 1. From the tlcsconfig main menu, choose Clustered Services. 2. Choose Application Stability Agents from the next menu. Here you will see listed all the currently configured stability agents.
Figure 4.5 Application Stability Agents
4-10
Services
3. From this menu, you can add a new ASA or edit an existing one. You can also remove an agent. 4. To add an ASA, click Add.
Figure 4.6 Adding an Application Stability Agent 5. Enter the following information as requested. i. The Application Stability Agent name is just a name that will be used later to refer to the ASA. It is best to use the name of the service or something similar. ii. The Check with field specifies the program that will be used to perform the ASA service check. Be sure to include the full path to the program. iii. The Event triggered when down and Event triggered when up fields allow you to have a program run when the ASA determines that the service has gone down or come back up. These are not used in most situations, so you can leave them blank. If you do use them, be sure to specify the full path.
4-11
Configuration
One instance in which you would use a down script is if you have some way of restarting the service. The down script can perform whatever is necessary to try to revive the service. Several programs have been supplied with TurboLinux Cluster Server to perform the ASA checks for you. These programs have names ending with Agent. See chapter 8 for a complete list of these agent programs. That chapter also contains extended information about the ASA programs and up/down scripts, including the command-line arguments that they are called with. When you have entered the required information, click OK to return to the list of ASAs. 4. When you are finished entering all the Application Stability Agents that you will need, click Done.
NOTE
There are three pre-defined ASAs that do not need to be configured in the Application Stability Agents menu. These are http, connect, and none. The http ASA checks HTTP 1.1 servers. The connect ASA tests a simple TCP connection. The none setting skips the ASA check altogether; the ATM assumes that the service is always up. These three ASAs may be sufficient for your cluster.
Service Settings
TurboLinux Cluster Server is a service-oriented cluster. You will need to configure each service so the cluster knows which port numbers to listen on and how to handle incoming connections. To set up a service, follow these steps: 1. From the tlcsconfig main menu, choose Clustered Services.
4-12
Services
2. Choose Service Settings from the Clustered Services menu. Here you will see listed all the currently configured services, along with some information about each one.
Figure 4.7 Service Settings The name of each service will be shown, along with the port number and whether it is a TCP or UDP port. There may be a flag indicating that the service is configured to be persistent (sticky) or to fail-over instead of load balance. Finally, the name of the ASA used by the service will be shown. 3. From this menu, you can add a new service or edit an existing one. You can also remove an existing service.
4-13
Configuration
4. To add a service, click Add.
Figure 4.8 Adding a Service 5. Enter the following information as requested. i. In the Service name field, enter the name by which you want to refer to the service. It is best if you use a standard name, such as http or ftp. ii. In the Port number field, enter the port number that the service runs on. If you are not sure what port number a given service should run on, check the /etc/services file. Note that services do not have to run on their default ports. For instance, some web servers run on port 8080, while the default for HTTP is port 80. iii. Choose whether the service uses the TCP or UDP Protocol. Most services use TCP. One notable exception that uses UDP is DNS. This information can also be found in /etc/services. iv. Choose a Stability Agent from the list. If you do not see the ASA you are looking for try scrolling through the list. Also make sure that the ASA was defined per the directions in the previous section.
Services
v. For most services, you will want to Load Balance the workload between all the boxes configured to support the service. If you would like to implement fail-over instead of load balancing, check the Failover box. vi. The final choice you have to make is whether the service should Allow Session Persistency. If this option is enabled, new connections coming from clients that already have existing connections will be sent to the same server. This is useful with services using SSL, such as Secure HTTP (HTTPS), because they would otherwise require the client to send authentication information to each server that they ended up accessing. When you have filled in the service settings, click OK. 7. When you have added all the services that you want to run on the cluster, click Done.
4-15
Configuration
Servers
After you have determined what services will be running on the cluster, you can configure the servers that will be running those services. These servers are the cluster nodes that will process the service requests. Server configuration is divided into two parts: Servers Configuration Server Groups Configuration
Servers Configuration
The first thing youll need to do is create the list of servers that will be used. Heres how: 1. From the tlcsconfig main menu, choose Servers Configuration.
4-16
Servers
2. Choose Servers Configuration again from the next menu. This will show a list of all the cluster nodes that are to be used in the cluster.
Figure 4.9 Servers List Each line in the list has the name of the node, which forwarding method it is using, and a flag indicating if it should be checked periodically using ping. 3. From this list, you can add a cluster node, edit an existing one, or remove a node. 4. To add a cluster node server, click Add. 5. There are three fields to fill in, plus a flag. i. The Server name field is used to give the node a label, so it can be referred to later. It is best to use the hostname of the node. ii. Enter the IP address or fully qualified domain name of the cluster node on the line labeled Full server name or IP. As we noted above, the address must resolve forward and backwards, even if you enter it by IP address. It is probably best to specify the address by name.
4-17
Configuration
iii. Select the Forward method. Your choices are direct, tunnel, and nat. Well cover each of these in more detail below. iv. If, for some reason, you dont want the ATM to check the node periodically to see if it is alive, uncheck the ping to see if alive box. You should normally leave this checked so that the ATM can remove the node from the cluster if it goes down.
Figure 4.10 Server Settings When you have entered the information about the server node, click OK. 5. After you have added information about all the cluster nodes, click Done. Dont forget that a server node can also act as a primary or backup ATM.
Forwarding Mechanisms
The three traffic forwarding mechanisms are direct forwarding, tunneling, and NAT. Each server will have one of these three forwarding methods. The choice of forwarding method is determined by several factors, including the
Servers
location of the system, the operating system, and the amount of configuration you want to perform on the node. The primary ATM actually uses a subclass of direct forwarding called local, no matter which forwarding mechanism it was configured to use. This is because the incoming packets are delivered locally; they will not have to be actually forwarded anywhere if they are destined for the ATM system itself. Except for NAT, cluster nodes must be configured so that they can be used in the cluster. The particulars depend upon which forwarding method is used and what operating system the node runs. For complete information, see chapter 5.
Direct Forwarding
Direct forwarding is the default forwarding mechanism. It has the least overhead, but may require some modification of some settings on the cluster node. When an incoming packet is received from a client, the primary ATM forwards the packet to the cluster node. With direct forwarding, the packet is forwarded just like any other packet would be transferred. When the cluster node server responds to the client request, it sends the response directly to the client. The return traffic does not need to go through the ATM. Cluster nodes using direct forwarding must be located on the same LAN segment an subnet as the traffic manager. Direct forwarding should work with nodes running just about any operating system.
Tunneling
If a cluster node is on a different LAN than the ATM, you should use the tunnel forwarding method. The packet will be encapsulated and sent through a tunnel point-to-point connection between the ATM and the node. The
4-19
Configuration
tunneling method only works with cluster nodes running Linux or UNIX variants that support IP-IP tunneling. The tunneling method incurs a small amount of overhead due to the encapsulation process. Like direct forwarding, responses are sent directly to the client and do not need to go through the ATM.
NAT
NAT stands for Network Address Translation. This method allows you to use systems as cluster nodes without having to do any special configuration on the nodes. This means that almost any system can be used as a cluster node when using NAT. However, extra attention must be given to configuring the ATMs. NAT is the least efficient of the three forwarding mechanisms. There is a larger amount of overhead required to translate the client addresses. Also, return traffic must be routed through the ATM. This can become a serious bottleneck, because service response packets are often much larger than the request packets. For more information on NAT forwarding, see the NAT Settings section at the end of this chapter.
Server Groups Configuration

Server groups provide a way to collect several server nodes into a group in order to add them to the cluster. This is also where you define the services that each server node will perform. Every server does not have to provide every service; you can selectively choose which servers run which services, and even configure some servers to take a larger share of the workload. Server groups are sometimes also referred to as server pools. To set up a server group, follow these steps: 1. From the tlcsconfig main menu, choose Servers Configuration.
Servers
2. Choose Server Groups Configuration from the next menu. This will show a list of the server pools that have been defined. Initially, there will be no server groups, so you will have to create one. 3. Click Add to create a new server group. 4. This will pull up a new form, as shown below.
Figure 4.11 Server Group Settings 5. Skip down to the data entry fields -- well add servers to the pool after we define the server pool settings. 6. First give the server group a name, in the Server pool name box. 7. The Check server section has two parameters that you can set: the Frequency and the Timeout. Both are defined in seconds. The Frequency specifies how often the server nodes should be pinged to see if they are alive. The Timeout value tells how long to wait for a response before assuming that the server node is down. The Timeout must be shorter than the Frequency. You should accept the default values. 8. The Check service section is similar to the Check server section. It also has Frequency and Timeout settings. These settings tell the ATM how
4-21
Configuration
often to run the ASA, and how long to wait for a response before marking the service on the individual node as being down. Again, make sure the Timeout is shorter than the Frequency. 9. You should generally accept the default values for the server and service checks. You may want to fine tune them later to optimize the performance of the cluster. See chapter 7 for more information on tuning the cluster. 10. Now that we have defined the parameters for the server group, we need to add some servers to the pool. 11. Go back to the top of the form and click Add. 12. Choose a server from the list and click OK. The list should contain all the server nodes that you configured in the Servers Configuration menu. Scroll through the list if you dont see the node you are looking for. 13. Another window will pop up, allowing you to select which services the node will run. This list will appear on the left. To the right are buttons to Add, Edit, and Remove services from this particular server. 14. Click Add to add a service for this server. 15. Choose a service from the list. The list scrolls if there are too many to fit in the window. 16. Set the Weight of this server when performing this service. Normally you will leave this at 1 for all servers. However, if you want some servers to receive a larger workload than others, you should give the more powerful servers a higher weight. 17. Click OK to add the service to the server. 18. Add any additional services for the server node you are editing, then click OK. 19. Click Add again to add other servers to the pool. Click OK when you are finished configuring the server group.
4-22
Advanced Traffic Managers

While the server nodes are important to the operation of the cluster, the ATMs are critical. The primary ATM does all the work of forwarding incoming traffic to the cluster nodes. Backup ATMs stand ready to take over if the primary ATM is unable to fulfill its duties. The Advanced Traffic Managers menu allows you to configure the systems that will act as primary and backup ATMs. Note that you cannot necessarily determine which one will be the primary ATM and which will be the backups. The first ATM system to come up will generally become the primary, and all the other ATMs will become backups. If two ATMs come up at the same time, the one listed first in the configuration file will become the primary ATM. There are two parts to configuring ATMs: configuring the systems that are to be ATMs, and configuring the settings.
Figure 4.12 Advanced Traffic Managers Menu
4-23
Configuration
Advanced Traffic Manager Systems

The Advanced Traffic Manager Systems menu lets you define the computers that you want to act as ATMs. The list of traffic managers works like the other lists in the configuration program. There are Add, Edit, and Remove buttons. To add a system as an ATM, click Add, then enter the IP address or full hostname of the system.
Figure 4.13 Advanced Traffic Manager List Dont forget that one system can work as both an ATM and a cluster node at the same time. The list of ATMs and list of server nodes are completely independent of each other.
Advanced Traffic Manager Settings

Selecting the Advanced Traffic Manager Settings menu allows you to set some parameters for the ATMs in the cluster. Well touch on these settings in
4-24
this section, but you can generally accept the defaults when configuring your cluster. For more information on tuning these settings, consult chapter 7. At the top of the screen, you will see all the ATMs listed. This ATM pool is similar to the server pools, except that there is currently no way to create more than one ATM pool -- all ATMs are automatically added to a single pool.
Figure 4.14 Advanced Traffic Manager Settings Following the list of ATMs you will see the ATM settings. Here is what each of these setting mean: 1. The Send ARP delay (frequency) setting tells the primary ATM how often to send out an ARP broadcast. The ARP broadcast lets other systems on the network know that the primary ATM is associated with the virtual IP address of the cluster. 2. The HeartBeat Frequency is the number of seconds between heartbeat broadcasts. The primary ATM generates heartbeats to let the backup ATMs know that it is working.
4-25
Configuration
3. The backup ATMs listen for the heartbeat broadcasts from the primary ATM. If they miss Max. missed heartbeats in a row, the backup ATMs assume that the primary is down. 4. The Number of services, Number of servers, and Number of connections values specify the size of the kernel tables that should be used. The defaults should be okay, unless you are expecting a lot of traffic or have a large number of servers in your cluster. Make sure that these values are larger than the maximum that will be needed for the server to run. The Number of services setting is just like it sounds -- how many different services are defined, as listed in the Service Settings list in the configuration program. The Number of servers is actually the number of server nodes times the number of services that each server handles. The Number of connections value is the most critical. If the number of concurrent active connections ever exceeds this number, all further connection attempts will fail until the number of connections falls below this value. In most instances, the default settings should work for you. For help in determining the optimum values for these parameters, see chapter 7. 5. The Connection timeout setting tells the ATM how long to leave a connection in the connection table after the connection has closed or has not had any packets sent. It is given in seconds. The default value of 300 should be okay for almost all services. You may want to fine tune it if you are only running one or two services that are expected to have shorter connection times. Again, chapter 7 will help you optimize the cluster settings.
4-26
Clusters
Clusters
Once you have configured the services that the cluster will run and the systems that will serve various roles in the cluster, you are ready to set up the cluster itself. Each cluster is defined by an IP address that it manages and redirects traffic for. While most TurboLinux Cluster Server installations will have only one of these virtual IP addresses, it is also possible to manage multiple cluster addresses. The clusters IP address is also called the virtual server address, because it appears to the outside to be a single server, while actually consisting of several systems. It is considered virtual because the IP address used is in addition to the actual IP addresses of each system in the cluster. To set up your cluster, select Virtual Servers from the tlcsconfig main menu. Then continue with the following procedures.
4-27
Configuration
1. The Virtual Servers menu lists all the virtual servers that have been configured. Here you also have the familiar Add, Edit, and Remove buttons.
Figure 4.15 Virtual Servers List 2. To add a virtual IP address for the cluster, click Add. 3. Here you enter the information required to configure the virtual server. The information requested is as follows: i. Enter the IP address or hostname for the virtual IP address in the Virtual hostname or IP field. This is the IP address that will be used when accessing the cluster from the outside. Like all addresses, it must resolve both backward and forward. ii. Enter your email address in the Send e-mail alerts to field. Alerts will be sent whenever an ATM, server node, or service goes down, or if some other fatal event occurs. If you leave this field blank, no email messages will be sent, and you wont know when something goes wrong with the cluster, unless you check the log files.
4-28
Clusters
iii. Choose the Server pool name from the list. This is the group of servers that you defined earlier in the Server Groups section.
Figure 4.16 Virtual Server Settings Click OK when you have entered the appropriate information for the cluster. 4. Click Done after you have configured all the virtual IP addresses for the clusters. (Normally you will just have the one virtual server.) You have now successfully configured the cluster. Next, you must define a few parameters that are global to all clusters.
4-29
Configuration
Global Settings
There are several settings that are global to all clusters running on the ATM. These parameters can be set in the Global Settings section of the configuration program. The following are the global settings that you can configure, as listed in the Global Settings menu: Security Settings Network Settings NAT Settings
Security Settings
The security settings determine which machines can access the remote administration capabilities of the cluster. They work much like the TCP wrappers configured in /etc/hosts.allow and /etc/hosts.deny. Normally, you will want to restrict this ability to the local network, particularly the system from which you plan to do administration. To restrict access to administration of the cluster, follow these steps: 1. From the tlcsconfig main menu, choose Global Settings.
4-30
Global Settings
2. Choose Security Settings from the Global Settings menu. This will show a list of the security rules in force.
Figure 4.17 Global Security Rules 3. Click Add to create a new rule. 4. Enter the IP address of the host or the network that you want to allow or restrict. 5. Enter the network mask. For a single host, enter 255.255.255.255. If you want to allow or deny a whole network, enter the subnet mask for that network. 6. Check allow if you want this host or network to be allowed to administer the ATM. Check deny if you want to prohibit it from accessing the administrative functions. Click OK when you have entered the settings you desire. 7. You can select each rule and use the Up and Down buttons to rearrange the rules. 8. Click Done after you have configured all the security rules you want.
4-31
Configuration
You should always set an Allow rule for the address 127.0.0.1 (localhost) and a Deny rule for every other address. This allows the CMC daemon to change parameters, but nobody else. Thus, your security rules list should look like this:
127.0.0.1,255.255.255.255,allowed 0.0.0.0,0.0.0.0,denied
Network Settings
The global Network Settings menu has only one parameter that you can modify. This is the subnet mask. This pertains to the physical subnet that contains all the ATMs. In most cases this will be 255.255.255.0, but some networks may vary. Check with your network administrator, or simply look at the subnet mask of one of the ATM systems.
Figure 4.18 Network Mask
4-32
Global Settings
NAT Settings
One of the forwarding methods you can use to forward traffic to the cluster nodes is Network Address Translation (NAT). One of the advantages of NAT is that you do not have to modify any settings on the node itself. However, you do need to set some things up on the ATM. This is primarily done in the NAT Settings section of the Global Settings menu. When using NAT forwarding, your ATM machines should have two network cards installed. One network card will be (indirectly) connected to the Internet. The other will be attached to an internal subnet. This internal subnet can only be accessed from the outside through the Network Address Translation of the primary ATM. There are only three NAT settings, but understanding how they work can be a little tricky. 1. The NAT Subnet should be set to a network address range that does not exist on your network. This range of addresses will be used to perform the address translation. Incoming client connections will appear to come from this address range after they have been translated by NAT and sent to NAT-forwarded cluster nodes. It is suggested that you use 10.0.0.0 as the subnet address, unless you have systems that already exist within that address range. This is one of the address ranges that has been reserved for private use. The other reserved network addresses are 172.16.0.0 and 192.168.0.0. Just make sure that there are no actual systems configured with those addresses.
NOTE
Whatever address range you choose for the NAT Subnet, do not use addresses from that range anywhere on your network. The addresses are used internally by NAT. If you use the addresses for actual network resources, you will not be able to access those resources.
4-33
Configuration
2. The NAT Subnet Mask corresponds with the NAT Subnet setting. However, because the NAT subnet is used to map client connections and is not used for any physical systems, the subnet mask is really telling the ATM how many client connections to be prepared for. For an average cluster, you should use 255.255.0.0, which will allow for over 65,000 connections. If you have higher traffic demands, you can increase the number of connections by decreasing the number of bits in the NAT subnet mask. The maximum value you can use is 255.0.0.0, which will provide 16 million connections. (This is for 10.0.0.0; for 172.16.0.0, the maximum is 255.240.0.0, and for 192.168.0.0 it is 255.255.0.0.) However, each potential connection will require several bytes of memory in the kernel, so you may not be able to reserve 16 million entries unless you have a lot of system memory. A more realistic value would be 255.240.0.0, which will give you more than 1 million virtual connections. 3. When using NAT, packets returning to the client must return through the ATM. This is done by setting the default gateway on the NAT-forwarded nodes. They should be set to an address on the ATM. We could use the IP address of one of the network cards on the primary ATM, but that wouldnt work if the ATM went down and a backup ATM had to take over. Therefore, we need to assign a virtual IP address for this gateway, in addition to the virtual IP address for the cluster itself. While the clusters virtual address is used for client access, the NAT gateway virtual address is used to send packets back from the cluster nodes. Because the nodes are on a different physical subnet than the clusters virtual IP address, they cannot use that address as the default gateway. Hence, we must assign another virtual IP address for the NAT gateway. When implementing NAT, the ATM should have two network cards: one attached to the internal network of NAT nodes, and one interface that is accessible to clients on the outside. To choose the NAT gateway address, select an IP address from the internal subnet that is not in use. If you are replacing an existing router, use the internal IP address that the router had. That way, you will not have to reconfigure the NAT-forwarded nodes at all.
Global Settings
Once you have determined an address that you can use, enter it into the Gateway to NAT Subnet field in the form.
Figure 4.19 NAT Settings If you have no nodes that use the NAT forwarding method, you can accept the default values. The NAT subnet and subnet mask will be 255.255.255.255 and the gateway will be set to 0.0.0.0. Figure 4.20 gives an example of a cluster using NAT. The virtual IP address of the cluster is 1.2.3.100, and the NAT Gateway has been set to 192.168.0.100. Note that eth0 and eth1 could be on opposite sides; the daemon will assign the NAT Gateway to whichever interface is already on the internal NAT subnet.
4-35
Configuration
Client Internet
eth0
1.2.3.9
eth0:cs0 1.2.3.100 (Virtual IP Address)
ATM
eth1
192.168.0.9
eth1:cs0 192.168.0.100 (NAT Gateway)
Node 1 (NAT)
IP Address = 192.168.0.1 Default Gateway = 192.168.0.100
Node 2 (NAT)
IP Address = 192.168.0.2 Default Gateway = 192.168.0.100
Figure 4.20 Example of NAT Forwarding
WARNING
If you configure NAT incorrectly, it can cause a large amount of spurious network traffic. It can also cause the ATM to be unable to reach some network resources.
While configuring NAT on the ATM can be somewhat complicated, it eliminates the need to do any special configuration of the cluster nodes. You can simply replace the default gateway (router) that the nodes were using with the ATM. Just configure the ATM to use the default gateways old IP address as the ATMs NAT gateway address. If you are building a cluster from scratch, just enter the NAT gateway address when the systems ask you to configure their default gateway.
4-36
Global Settings
If you want to get really fancy, you can even have the IP addresses and default gateway settings of the NAT nodes assigned via DHCP. Just set up a DHCP server on the NAT subnet, handing out IP addresses to the nodes. If the nodes have different roles, you will have to make sure that the systems always get the same IP address. If all the cluster nodes handle the same services, and all the addresses that are handed out are configured in the cluster configuration file, it doesnt even matter which IP address each system has.
4-37
Configuration
4-38
Chapter 5
C ONFIGURING C LUSTER N ODES
While the ATMs in your cluster must be Linux boxes running the Cluster Server daemon, TurboLinux Cluster Server allows you to use virtually any system as a cluster node. As long as the system can provide a TCP/IP service, it should work. However, using a Linux system as a cluster node will provide you with much more flexibility and will also simplify cluster management. There are three different forwarding methods that can be used to forward packets from the traffic manager to the cluster node. The NAT method is the simplest method -- you do not need any special configuration of the cluster nodes at all, beyond setting the correct IP address, network mask, and default gateway. Direct forwarding and tunneling do require some special setup of the cluster nodes. Since NAT does not require any special setup, this chapter will only cover configuring cluster nodes using direct forwarding and tunneling. Tunneling can only be used with Linux and UNIX cluster nodes, so the sections on Windows NT, Windows 2000, and other systems will only cover configuration using the direct routing method. In addition to the steps covered in this chapter, you will also need to set up the particular network services that you intend to cluster. We will not cover that in this manual, as it varies depending on which services you are
5-1
Configuring Cluster Nodes
clustering and what products you are using. However, the configuration is not any different than it would be if you were to run that service in a stand-alone non-clustered environment. Just remember to configure it to accept service requests on the clusters virtual IP address. We will cover each of these operating systems: Linux and UNIX Window NT Windows 2000 Other Systems
5-2
Configuring a Linux or UNIX Cluster Node

There are two different ways to configure a Linux system as a cluster node: manual and automatic. The easiest way is to install and run Cluster Server on the machine, which will automatically configure the system as a server node. Using this method, you can easily synchronize the configuration with the rest of the cluster. The cluster server daemon will perform all the configuration that is required on the cluster node. This method is the only way that you can have a system act as both an ATM (primary or backup) and a cluster node. Because the TurboLinux Cluster Server daemon only runs on TurboLinux and Red Hat distributions of Linux, you will have to use the manual configuration method for any other distribution. You will also have to set up any UNIX servers by hand. In order to get a cluster node to respond to requests addressed to the clusters virtual IP address, you will need to set up a network interface alias. To do this, you will have to have IP Aliasing configured into the kernel. Most Linux distributions will already have this compiled into the default kernel. An alias is simply an additional IP address attached to the same network interface. What we want to do here is to trick the network interface into responding when it receives a packet forwarded from the primary ATM. Since the ATM does not modify the destination address of packets as they are forwarded to cluster nodes, we must set the alias IP address of the cluster node to the IP address of the cluster itself. (This is also referred to as the virtual IP address of the cluster.)
5 -3
To create the alias, use the ifconfig program. Lets assume that your real IP address is 10.0.0.3, and the virtual IP address of the cluster is 10.0.0.99. If you display the settings for the eth0 network interface, it will look something like this:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:AB:CD:12:12:3F
IPaddr:10.0.0.3 Bcast:10.0.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:608 errors:0 dropped:0 overruns:0 frame:0 TX packets:992 errors:1 dropped:0 overruns:0 carrier:0 collisions:15 txqueuelen:100 Interrupt:10 Base address:0x210
To add the alias, use the following command:

# ifconfig eth0:1 10.0.0.99 up
The :1 tacked on to the end of the eth0 is the syntax that UNIX and Linux use to denote alias addresses on the same physical network card. You can use any set of up to four characters in the alias portion. The characters have no real meaning. Creating an alias has one problem: when another system on the network wants to send a packet to the IP address 10.0.0.99 on the same subnet, it sends an ARP broadcast to determine which computer has that IP address. The machine with that IP address is supposed to answer back with its IP address and corresponding MAC (hardware) address. But if all the nodes in the cluster have the same IP address, they are all going to answer this broadcast ARP message. So we have to tell all of the systems except for the primary ATM not to reply to those ARP requests. We want all traffic destined for the cluster to go through the primary ATM first.
5-4
Part of the solution to this is to create the alias on the loopback interface instead of the Ethernet interface. The loopback interface is a network interface that has no hardware or physical network associated with it. So instead of creating the alias on eth0:1, you would add the alias to the loopback interface (lo) using the following command:
# ifconfig lo:1 10.0.0.99 netmask 255.255.255.255 up
Next you have to turn off ARP replies on the interface. How you accomplish that depends upon which Linux kernel version you are using. On UNIX systems and Linux 2.0 kernels, you can supply the -arp option to the ifconfig command when you bring up the interface. (Note that some UNIX and Linux systems may use a slightly different syntax, such as using noarp instead of -arp.) So in our example, we would use this command to configure the interface:
# ifconfig lo:1 10.0.0.99 netmask 255.255.255.255 -arp
NOTE
When you use this method, make sure that the NOARP flag shows up when you display the configuration using the ifconfig command. Otherwise ARP replies will still be sent, and the cluster will not work properly.
Unfortunately, this method does not work in any Linux kernels more recent than the 2.0 series. For systems running kernel 2.2.14 and higher the -arp option does not work. Instead, you will have to use the /proc filesystem to turn off ARP replies. To do this, echo a 1 to the hidden file in /proc/sys/net/ipv4/conf/all and the hidden file for the interface you are using. Here is an example that will turn off ARP replies on the loopback interface:
# echo 1 > /proc/sys/net/ipv4/conf/all/hidden # echo 1 > /proc/sys/net/ipv4/conf/lo/hidden
5 -5
Tunneling Cluster Nodes

The tunnel forwarding method can be used only with cluster nodes running a UNIX or Linux operating system. The kernel on the node will need to be configured with IP-IP tunneling. On a Linux kernel using modules, the module may need to be loaded, using the command:
# insmod ipip
Once the IP-IP support has been enabled, you can bring up the IP-IP tunnel interface, which is named tunl0. Simply use the ifconfig command, specifying the virtual IP address of the cluster:
# ifconfig tunl0 10.0.0.99 netmask 255.255.255.255 up
Either add the -arp option if it works, or write to the hidden files in /proc:
# echo 1 > /proc/sys/net/ipv4/conf/all/hidden # echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden
Once this tunnel interface is set up and the ATM is configured to forward traffic to the node using the tunneling method, the cluster node is ready to go.
5-6
Configuring a Windows NT Cluster Node

To configure a Windows NT 4.0 system as a cluster node, you will need to install the Microsoft Loopback Device. To do this, you will need to open the Network control panel. This can be done by going to the Start menu, and selecting Control Panel from the Settings menu. Then open the Network control panel and click the Adapters tab. Click Add and select the MS Loopback Adapter. (See figure 5.1.) Click OK to accept the default frame type and tell NT where your installation files are located.
Figure 5.1 Adding the Loopback Adapter in Windows NT Click Close to have the new adapter added. This will bring up the TCP/IP properties page. Select the MS Loopback Adapter from the drop-down list.
5 -7
Set the IP address to the virtual IP address of the cluster. Set the default gateway as well. The subnet mask needs to be set to 255.255.255.255, but the dialog will not accept that value. So set it to 255.255.255.128 instead. We'll have to change it to 255.255.255.255 in the registry. Click OK on the Properties dialog. When asked to reboot, click No.
Figure 5.2 Configuring the Loopback Adapter in Windows NT Now we have to go into the registry to set the subnet address correctly. From the Start menu, select Run, type in REGEDIT, and click OK. On the left-
5-8
hand side, go to the key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ NDISLoop2\Parameters\TCPIP. Double-click the SubnetMask entry on the right. Change the 128 to 255; do this using the right side, deleting the 128 and entering 255. Be careful that you don't change anything else. Click OK and then close the registry editor. Reboot to ensure that the changes take place.
Figure 5.3 Editing the Registry in Windows NT Once you've rebooted, check the subnet mask in the Loopback adapter's IP settings. It should be 255.255.255.255. If it is not, go back into the registry editor again. Click Cancel to get out of the settings dialog. Clicking OK won't work -- it will say that the subnet mask is invalid. That's why we had to use the registry editor to set it. The Windows NT system is now ready to be used as a cluster node. Of course, you'll have to configure the system to run the appropriate services and set it
5 -9
up to be used in the configuration of the Advanced Traffic Manager. Notice that you do not have to disable ARP replies, as the loopback interface does not reply to ARP broadcasts, because it does not expect any broadcasts.
NOTE
These directions are for the direct forwarding method. If you are using the NAT forwarding method, you only need to set the default gateway on the Windows NT system. Set it to the address that you specified for the NAT Gateway setting on the ATM. Of course, you'll also have to set the IP address of the system to be on the private side of the network.
5-10
Configuring a Windows 2000 Cluster Node

To configure a Windows 2000 system as a cluster node, you will need to install the Microsoft Loopback Device. To do this, you will need to open the Add/Remove Hardware control panel. This can be done by going to the Start menu and selecting Control Panel from the Settings menu. Then choose Add/Remove Hardware. When the Add/Remove Hardware Wizard comes up, click Next to get to the next screen. Select Add/Troubleshoot a device and click Next. The system will attempt to find any new Plug and Play hardware that has been added. Since the Loopback device is only a software device, it won't be detected. Select Add a new device and click Next. When asked if you want to search for new hardware, select No, I want to select the hardware from a list and click Next. Select Network adapters and click Next. On the left-hand side, select Microsoft, and on the right side select Microsoft Loopback Adapter.
5-11
Figure 5.4 Adding the Loopback Adapter in Windows 2000 Click Next until the driver finishes installing, then click Finish to close the wizard. There will now be an additional Local Area Connection listed in the Network and Dial-up Connections control panel. Figuring out which one is the Loopback adapter can be a bit tricky. Usually it will be the highest numbered interface. To be sure, go to the Start menu and select Network and Dial-up Connections from the Settings menu. If you hold the mouse pointer over each Local Area Connection, the driver will be listed in a pop-up window. Select the one that has Microsoft Loopback Adapter listed. This will pull up the status window for that adapter. Click on the Properties button in the status window. If the NetBEUI protocol is listed, remove it; it conflicts with the settings that are required. Select the Internet Protocol (TCP/IP) line and click Properties.
Set the IP address to the virtual IP address of the cluster. Set the default gateway as well. The subnet mask needs to be set to 255.255.255.255, but the dialog will not accept that value. So set it to 255.255.255.128 instead.
Figure 5.5 Configuring the Loopback Adapter in Windows 2000 We'll have to change it to 255.255.255.255 in the registry. Click OK on the two Properties dialogs and Close on the status window. Now we have to go into the registry to set the subnet address correctly. From the Start menu, select Run, type in REGEDIT, and click OK. On the lefthand side, go to the key HKEY_LOCAL_MACHINE\System\CurrentControlSet\ Services\Tcpip\Parameters\Interfaces. There will be several interfaces listed. Look through them all to find the one with the IP address that you set, i.e. the
5-13
cluster's virtual IP address. When you find the correct interface entry, doubleclick on the SubnetMask entry on the right.
Figure 5.6 Editing the Registry in Windows 2000 Change the 128 to 255; do this using the right side, deleting the 1, 2, and 8 digits and replacing them with 2, 5, and 5. Be careful to make sure that you don't change anything else. Click OK and then close the registry editor. Reboot to ensure that the changes take place. Once you've rebooted, check the subnet mask in the Loopback adapter's IP settings. It should be 255.255.255.255. If it is not, go back into the registry editor again. Click Cancel to get out of the settings dialog. Clicking OK won't work -- it will say that the subnet mask is invalid. That's why we had to use the registry editor to set it. The Windows 2000 system is now ready to be used as a cluster node. Of course, you'll have to configure the system to run the appropriate services and set it up to be used in the configuration of the Advanced Traffic Manager.
5-14
NOTE
These directions are for the direct forwarding method. If you're using NAT, you only need to set the default gateway on the Windows 2000 system to the address you specified as the NAT Gateway. Of course, you'll also have to set the IP address of the system to be on the private side of the network.
5-15
Configuring Cluster Nodes on Other Systems

To configure another type of system as a cluster node, there are only a few things youll need to configure. These include the service that is to be clustered, an IP alias, and turning off ARP replies. First, you will need to configure the server to provide the network service that you are trying to cluster. This may involve letting the service know that requests will be coming in on a different IP address. Other than that, the configuration should be similar to that of a stand-alone server. Next add an IP alias with the virtual IP address of the cluster. With most UNIX-like systems, you will be able to do this with some form of the ifconfig command. Check the documentation for specifics. As you saw in the Windows NT and Windows 2000 configurations, you may also be able to use the loopback adapter device as your alias. Finally, make sure that the system does not reply to ARP requests. This will most likely be the tricky part. Check your documentation. If you use the ifconfig command, it will probably have some switch to turn off ARP replies. If you cannot turn off ARP replies, try creating an alias on the loopback interface. To test whether ARP replies are turned off, try pinging the alias address from another machine. If the ping does not go through, then you have successfully turned off ARP replies. If you are unable to configure the system with an IP address alias, or cannot turn off ARP replies, you will be unable to use the direct forwarding method. Use the NAT mechanism instead. Set the IP address of the cluster node to an address within the range of the private side of the NAT network.
5-16
Chapter 6
C ONFIGURATION F ILE
This chapter documents the Cluster Server configuration file. We will take a look at each section of the configuration file and the syntax of each option. TurboLinux recommends that you use the configuration tools tlcsconfig or turboclusteradmin to modify the configuration file instead of editing it by hand. Not only do the tools provide an easy-to-use interface, but they also perform consistency checks on the configuration file to make sure that the syntax is valid. Also be aware that the configuration programs may overwrite any changes that you make manually. The configuration file format for Cluster Server 6 is nearly backwardscompatible with the previous 4.0 version. To convert a 4.0 configuration file to run on Cluster Server 6, simply remove the port number specifications and colons on the AddServer lines.
6-1
Configuration File
The clusterserver.conf File

The main configuration file is clusterserver.conf. All the TurboLinux Cluster Server configuration files are stored in the /etc/clusterserver directory, which also contains some sample configuration files, the SSL certificate used by CMC, and a subdirectory with your license files. The license files are stored in /etc/clusterserver/.licenses and are encoded with a variety of information. The license files can restrict the number of ATMs, nodes, and services. They can also limit the number of concurrent connections and the range of available IP addresses. The license can also be set to expire on a certain date. You can add license files after the product has been installed. This allows you to purchase a license after you have tried an evaluation copy, or to increase the number of allowed nodes and ATMs. You can have multiple license files in the directory -- they will work cumulatively, and obsolete license files will be ignored. The configuration file is similar to the Apache configuration file. There is a global section that applies to the software in general and all clusters. Then there are separate sections to set up specific services, servers, and clusters. The configuration file is in flat ASCII text, and the format is fairly free-form. Spacing does not matter, although several sections require that all the settings for a particular object be on a single line. Lines beginning with a hash character (#) are ignored as comment lines. Keywords are not case-sensitive, but you must specify the sections in the order shown in this chapter. The term section in this chapter refers to sections in the configuration file that describe the various functions. Examples of sections are: The NAT section describes how to set the parameters that will be used if you use the NAT traffic forwarding method The Servers section lists each server node in the cluster The ATMPool section defines the Advanced Traffic Managers that will be used to manage the cluster, and settings that pertain to the cluster itself
Global Settings
Global Settings
There are several settings that are not contained within any section of the configuration file, as well as a couple of sections that apply to all objects. These include the security settings, NAT settings, and network mask.
Security Settings
The security settings allow you to restrict management access to certain machines.
DenyHost badguy.hackers.usa/255.255.255.0 AllowHost partner.business.usa
The addresses can be given as IP addresses or domain names. You can specify a network mask as well, in order to include an entire subnet. If you do not give a subnet, only the single IP address listed will be allowed or denied. You may have multiple AllowHost and DenyHost lines. The lines will be processed in order until a match is found. These settings are configured in the Global Settings | Security Settings section of turboclusteradmin. You should set use the following settings to secure your cluster:
AllowHost 127.0.0.1 DenyHost 0.0.0.0/0.0.0.0
This will allow only the CMC daemon on the local host to access the administration port of the clusterserverd daemon.
6 -3
Configuration File
Network Mask Setting

The NetworkMask parameter is used to specify the network mask of the cluster itself. When the primary ATM assumes the IP address of the cluster, it will use this as its subnet mask.
NetworkMask 255.255.255.0
The default value is 255.255.255.0, which is the network mask for a Class C network. If you are unsure what your network mask should be, ask the network administrator at your site, or look at the network mask setting of a system on the subnet.
NAT Settings
The NAT section describes how to set the parameters that will be used if you use the NAT traffic forwarding method.
NAT Subnet 10.0.0.0 255.255.0.0 Gateway 192.168.0.100 EndNAT
The Subnet line gives the address range that will be used in the NAT translation process. It should specify a range of addresses that are not used anywhere on your network. Most sites will choose to use 10.0.0.0 here. The mask specified on the same line determines how many clients can be translated. Use 255.255.0.0 for an avarage site, and 255.240.0.0 for a larger site. The Gateway parameter specifies a virtual address that will be configured on the internal side of the ATM, and will be used as the default gateway by the NAT-using nodes. For more detailed information on how to choose these NAT parameters, consult the NAT Settings section in chapter 4.
Services
Services
A cluster manages different services. UserCheck sections define the Application Stability Agents (ASAs) that will be used, and the Services section lists all the services and their settings.
UserCheck Settings
You can use the Application Stability Agents described in this section to ensure that services are still running. These are the same items that are configured in Clustered Services | Application Stability Agents in the turboclusteradmin program.
UserCheck ftp Up Down /usr/local/bin/ftp-up /usr/local/bin/ftp-down
Check /usr/bin/ftpAgent EndUserCheck
The ASA name is given in the UserCheck line. Usually, youll just use the name of the service. The name you use will be referenced in the Service lines in the following Services section. The program (or script) name for ASA check is given in the Check line. Be sure to give the full path name. This program should perform a test transaction on the service and return a 1 if the transaction fails. The ASA will be called with the cluster nodes IP address, port number, and socket type (1 for TCP, 2 for UDP). The Up and Down scripts run when a service goes down or comes back up and are called with the same parameters as the ASA. In most cases you will not
6 -5
Configuration File
need to specify the Up and Down scripts. The Down script is most useful when used to try to bring back up a service that has gone down. There are three predefined ASAs that do not need to be defined in the configuration file: The http ASA checks a web site using the HTTP 1.1 protocol. If you need to check an HTTP 1.0 web site, use the external http10Agent program. The connect agent performs a TCP connection, but does not process any transaction. The none agent does not do any checking at all; it always succeeds. When an ASA is called, it will be called with several parameters, as in the following example:
/usr/bin/ftpAgent 192.168.0.7 23 1
In this example, the ASA is checking TCP port 23 on the cluster node with an IP address of 192.168.0.7.
Defining Services
The Services section defines all the network services that you will run on the cluster. There will be one line for each service.
Services Service ftp Service nntp EndServices tcp:21 tcp:119 ftp none sticky
The first parameter after the keyword Service is the name of the service. Then comes the port number, prefixed by either tcp or udp. Following that is the name of the Application Stability Agent, as given in the UserCheck section. Finally, optional flags are listed. The currently available flags are sticky and failover. The sticky flag indicates that this service should
Services
maintain persistent connections, connecting each client to the same server for every connection. The failover flag indicates that the service is not to use load balancing. One server will get all the traffic instead, unless it goes down. Then all traffic will be sent to the next server in the server pool.
6 -7
Configuration File
Servers and ServerPool

In the configuration file, Servers and ServerPool define the server nodes that the cluster will manage, including IP addresses, the forwarding method, and port numbers.
Servers
Each server node in the cluster is listed in the Servers section. Each line begins with the Server keyword and the name of the server. Then comes the IP address or fully qualified domain name of the server. Following this is the forwarding mechanism to use: direct, tunnel, or nat. You may also specify a noping flag, which will tell the daemon not to ping the server periodically to see if it is up. Here is an example of what the Servers section may look like.
Servers Server node1 Server node2 Server node3 EndServers node1.turbolinux.usa 192.168.0.102 systemX.turbolinux.usa direct tunnel noping nat
ServerPool Section
The ServerPool section corresponds to the Servers Configuration | Server Groups Configuration section of the configuration tool. It lets you combine several cluster nodes into a more manageable unit. The name of the server pool is given in the ServerPool line.
ServerPool servergroup1
6-8
Servers and ServerPool
AddServer node1 AddServer node2
ftp/1 ftp/3 nntp/1
CheckServerFrequency 120 CheckServerTimeout CheckPortFrequency CheckPortTimeout EndServerPool 10 240 120
The AddServer lines add cluster nodes to the pool, along with the services they will support, as well as their weighting. The settings are given in the following format:
servicename/weight
where: servicename weight is the name of the entry within the Services section is the weight that you want to give to that server in comparison to other servers. A server with a higher weight will have more packets delivered to it than lower-weighted servers.
In addition to the list of server nodes and provided services, this section is also used to specify the frequency of stability checks. The CheckServer settings specify how often to ping each server. The CheckPort settings tell how often to perform the ASA checks. The Frequency settings specify how far apart to perform the checks. The Timeout values indicate how long to wait for a reply from the nodes before assuming that the check failed. All settings are given in seconds. Be aware that the Frequency values must be greater than the Timeout values.
Configuration File
Clusters
Clusters are defined in two separate sections, AtmPool and VirtualHost. These sections roughly correspond to the Advanced Traffic Managers and Virtual Servers menus in the tlcsconfig program.
AtmPool Section
The AtmPool section defines the ATMs that will be used to manage the cluster, as well as settings that pertain to the cluster itself. The section begins with the AtmPool keyword and the name of the pool.
AtmPool atmgroup1 AddAtm AddAtm SendArpDelay MaxLostHeartbeats HeartbeatDelay NumConnections NumServers NumServices ConnectionTimeout EndAtmPool atm1.turbolinux.usa 192.168.1.10 20 3 1 10000 1000 100 600
NOTE
Only one ATM pool is currently supported.
ATMs are added with an AddAtm line, simply specifying the IP address or fully qualified domain name.
6-10
Clusters
In addition to the ATMs themselves, there are several other settings: SendArpDelay Tells how often to send out an ARP broadcast announcing the IP address of the cluster. (Given in seconds.) Specifies how often the primary ATM should send out a heartbeat broadcast so that backup ATMs know it is still alive. (Given in seconds.) Tells how many heartbeats a backup ATM can miss before it assumes that the primary ATM has gone down and initiates the process to promote the backup to be the primary ATM. Gives the amount of time to keep an inactive connection open and in the cluster daemons tables.
HeartbeatDelay
MaxLostHeartbeats
ConnectionTimeout
You can tune the daemon for a specific maximum number of servers, services, and connections that the cluster will support. Use the NumServers, NumServices, and NumConnections keywords to specify these values. The defaults for these are 1,000 servers, 100 services, and 10,000 connections. It is highly recommended that you configure these to match the needs of your cluster. You can find information on tuning these settings in chapter 7.
NOTE
If the number of connections to your cluster exceeds the maximum specified in the configuration file, all new connections will be dropped. The kernel module creates a table containing only the number of entries that you specify in the configuration file. Be sure to set this value higher than the maximum expected number of concurrent connections.
6-11
Configuration File
VirtualHost Section
The VirtualHost section combines an ATM pool, a server pool, and an IP address. It also allows you to specify the email address of the administrator. The section will look something like this:
VirtualHost 192.168.1.2 AddAtmPool router1
AddServerPool servergroup1 MailTo EndVirtualHost cluster-admin@mybusiness.org
The IP address is included with the header of the section. This is the virtual IP address you will use to connect to the cluster itself. Anything sent to this IP address will be forwarded to the cluster nodes. The address can be given in dotted decimal or as a domain name. If you use a domain name, make sure it resolves to the appropriate IP address. Also make sure that no other machine has that IP address. The AddAtmPool specifies the ATMs that are to be used in this cluster. It simply refers to the ATM pool specified in a previous section. Similarly, the AddServerPool specifies the same thing for cluster nodes. The MailTo parameter is used to send email messages when something goes wrong with the cluster. If you do not specify an email address, no email messages will be sent. Either way, messages will still be logged to the /var/log/clusterserverd.log file. Messages will be sent whenever an ATM, server node, or service goes down. A message will also be sent if there is some other fatal condition, such as the SpeedLink module crashing.
NOTE
You can define multiple virtual IP addresses by using multiple VirtualHost sections, but they will have to share the same ATM pool.
6-12
Chapter 7
ADMINISTRATION
Once you have gotten your cluster configured, you will need to maintain it. This chapter will introduce you to the tools that you can use to monitor the function and performance of your cluster. These tools can also be use to modify the function of the cluster and troubleshoot any problems that may arise. This chapter will focus on the following topics: Administrative tools Synchronization programs Cluster Management Console (CMC) Troubleshooting
7-1
Administration
Administrative Tools
There are several tools that you can use to maintain your cluster. Weve already covered some of the basics in chapter 4, where we introduced the turboclusteradmin and tlcsconfig programs. These are the primary tools used to configure your cluster and make changes to it. Most of the changes you will make to the configuration of the cluster will be adding new servers or services. This works pretty much the same way as the initial configuration. The main difference is that it is easier to add new servers and services after you have gotten the cluster running. Thats one reason we recommend starting out with a simple configuration and adding more servers and services. It is easier to get a simple system working than a complex one. Another reason you might want to change your cluster settings is to tune the cluster to optimize performance.
Tuning the Cluster

Once you have successfully gotten your cluster running, it is a good idea to tune it. There are several parameters that you can modify to optimize the performance of the cluster. The main settings that you can tune are the kernel table sizes and the time settings. If you want to really optimize the performance of the cluster, you can use advanced network monitoring tools, such as network analyzers. You can determine how much overhead network traffic the ATM generates. This overhead includes heartbeat broadcasts, server pings, and ASA service checks. You can then fine-tune the frequency of the system checks. Just remember that when you increase the time between checks you also increase the time that services may be unavailable if one of the servers goes down.
7-2
Kernel Table Sizes

The first group of settings that you should modify are the sizes of the tables that the kernel module uses. The SpeedLink module has tables for servers, services, and connections. You should set these large enough to cover the maximum usage that your cluster will receive, but not too much larger. It should be pretty simple to figure out the optimal number of servers and services. The number of services is pretty self-explanatory. Each named service will need one entry in the table. These services are defined in the Service Settings menu within tlcsconfig, with each service having its own Service line in the configuration file. The number of servers is actually the number of server/service pairs. So if you have a cluster handling FTP and HTTP with 10 nodes, you would set the size of the servers table to 20. You may want to specify a table size slightly larger than what you currently need, to make it easier to add nodes later. You can always modify the settings when you add nodes, but you may forget. The maximum number of connections is quite a bit harder to figure out. You definitely dont want to set it too low, because if the connections table fills up additional incoming connections will not be serviced -- they will simply be ignored. The best way to determine the optimum size of the connections table is to monitor the use of the cluster. Ideally, you want the connections table to be slightly larger than expected maximum number of connections that you will ever have at one time. A good rule of thumb is to take the largest observed number of connections and double it. This should cover just about any situation, unless your cluster suddenly receives a lot of extra traffic. If your ATM is servicing NAT cluster nodes, you will need to double the number of connections. This is because the ATM creates a virtual connection from the client to the NAT-translated address, and a real connection from the NAT-translated address to the cluster node.
Administration
These settings are all configured in the Advanced Traffic Manager Settings menu in the cluster configuration tool. They are defined in the AtmPool section of the configuration file, and are named NumServices, NumServers, and NumConnections.
Time Settings
There are several time settings that you can use to fine-tune your cluster. The ones that pertain to the cluster as a whole are defined in the same place that the table sizes are defined. These include the connection timeout values and the heartbeat frequency. The other time settings are used to define the frequency of system and service checks. The connection timeout value specifies how long to maintain an entry in the connections table for connections that are idle, with no communication occurring. It pertains to all services running on the cluster. Some recommended settings are: HTTP FTP Telnet 30-60 seconds 15-30 seconds 300 seconds
If you are running more than one of these services, choose the longest one to use for the cluster. Another set of parameters you can set are the frequency of service and server checks. Increasing these values reduces network traffic overhead, but increases the amount of time that a server may be down before it is removed from the cluster. You will basically have to decide where to make this tradeoff. The network overhead really is not that great on a 100 Mbps network, unless you have a lot of cluster nodes. So you should probably stick with fairly frequent checks.
7-4
To change the frequencies of the system checks, go into the Server Groups Configuration menu in the tlcsconfig program. There you will find the Frequency settings for server checks and service checks. There are also Timeout values, which indicate how long to wait for a response before assuming that the server or service is down. These settings are called CheckServerFrequency and CheckPortFrequency in the ServerPool section of the configuration file. The Frequency settings must always be longer than the corresponding Timeout values. Otherwise you would send out another ping before receiving an answer back from the first one. If you got an answer back, you wouldnt know if it was an answer to the first ping or the second.
7 -5
Administration
Cluster Server 6 comes with two different synchronization tools: one to synchronize the configuration, and one to synchronize content. These programs are named tlcs_config_sync and tlcs_content_sync. They will help you to maintain consistency among the servers in the cluster. The synchronization tools require that an SSH daemon be running on all the servers that are to receive the updated content or configuration. They will work with any system on which you have installed Cluster Server, as SSH is included. They will not work with Windows NT or other non-Linux systems unless you install a version of SSH yourself. For tlcs_config_sync, this is not a problem, because you will only need the configuration file on systems running the Cluster Server daemon. For content, you will have to synchronize those systems without SSH by hand. To avoid warning messages, be sure to remove any systems without SSH from the list of servers to be synchronized.
tlcs_content_sync
The tlcs_content_sync utility is used to ensure that all of your cluster nodes contain the same content. The program can be started from the command prompt, or it can be started from the turboclusteradmin program. To synchronize your content, simply follow these steps: 1. Start the utility from the command line:
# tlcs_content_sync
You can also start it from the turboclusteradmin program by selecting the Content Synchronization Tool menu item. Both will take you to the same place. 2. A screen will appear with a list of directories and a list of servers. You can use the TAB key to move between the two lists. There may be more entries
7-6
than appear on the screen -- you can use the up and down cursor keys to scroll through the lists.
Figure 7.1 tlcs_content_sync Main Menu 3. The list of directories details the content that you want to synchronize. Everything within each directory listed will be copied, including all subdirectories. Typically, you will have directories such as /home/httpd/html and /home/ftp listed, but any directory that exists on the source system can be copied. 4. To add or remove a directory from the list, click on the Edit Dir button at the bottom of the screen. This will bring up a new screen, allowing you to add, remove, or edit the directory entries. Click Done when you have selected the appropriate directories that you want to be synchronized. 5. To change which servers will receive the synchronized content, click on the Edit Servers button. This will bring up the list of systems along with buttons to add, remove, and edit the server names. Hit Done when you have the proper list of servers that you want to synchronize.
7 -7
Administration
6. Initially all servers that were added as cluster nodes in the configuration program will be listed. If any of these nodes are not running the SSH daemon, you will need to remove them from the list or install SSH. If you leave a server listed that does not have the SSH daemon installed and running, you will get a warning message when you start the synchronization process. The warning message will tell you that the connection was refused by the server, and that synchronization could not be completed due to errors. 7. Once you have the correct directories and servers shown, hit the Start button.
Figure 7.2 Content Synchronization in Progress The program will go through the list of servers, contacting each one, sending the contents of all the directories. 8. If this is the first time that you have synchronized with a particular server, you will be prompted for the root password on that system. The underlying SSH tools require this step the first time, but will remember the password for reuse later.
7-8
9. If any errors occur, you will be prompted with a message telling you what went wrong. Most likely any error will be caused by a system that is down or one that is not running the SSH daemon that comes with Cluster Server. You can remove the server so that it will not give any warning messages in the future. 10. When the synchronization process has completed, you will be told whether the operation was a success or if there was some difficulty connecting to some of the servers. Click on the Close button to exit the utility. If there were any errors, you will want to resolve them and try again.
tlcs_config_sync
The tlcs_config_sync utility is used to synchronize the Cluster Server configuration files. Like the tlcs_config_sync program, it can be run from the command line or started from turboclusteradmin. To synchronize your configuration, simply follow these steps: 1. Start the utility from the command line:
# tlcs_config_sync
You can also start it from the turboclusteradmin program by selecting the Configuration Synchronization Tool menu item. Both will take you to the same place.
7 -9
Administration
2. A list of servers will appear. Initially all servers that were added as nodes or ATMs in the configuration program will be listed.
Figure 7.3 tlcs_config_sync Main Menu 3. You will need to remove any servers listed that do not have TurboLinux Cluster Server installed. To do this, highlight the particular server and hit the Remove button. The system will be immediately removed from the list. If you leave a server listed that does not have Cluster Server installed, you will get a warning message when you start the synchronization process. The warning message will tell you that the connection was refused by the server, and that synchronization could not be completed due to errors. 4. Add any servers that are missing from the list. To do this, click on the Add button and type the name of the server in the box provided. Click OK and the server will be added to the list.
7-10
5. Once you have the correct list of servers shown, click on the Start button.
Figure 7.4 Configuration Synchronization in Progress The program will go through the list of servers, contacting each one, sending the updated configuration file. 6. If this is the first time that you have synchronized with a particular server, you will be prompted for the root password on that system. 7. If any errors occur, you will be prompted with a message telling you what went wrong. Most likely any error will be caused by a system that is down or one that is not running the SSH daemon that comes with Cluster Server. You can remove the server from the list so that it will not give any warning messages in the future. 8. When the synchronization process has completed, you will be told whether the operation was a success or if there was some difficulty connecting to some of the servers. Click on Close to exit the utility. If there were any errors, you will want to resolve them and try again.
7-11
Administration
Cluster Management Console (CMC)

Cluster Management Console is a web-based tool that you can use to monitor the performance of your cluster. It can also be used to view and modify many of the parameters controlling the function of the cluster. It even allows you to make some changes to the cluster dynamically, while the cluster is running. You can also view the Cluster Server documentation through CMC, and submit bug reports. To run CMC, start a web browser and connect to the appropriate URL. CMC uses the secure HTTP protocol (HTTPS) and runs on port 910. Connecting to CMC requires a web browser that supports SSL. Current versions of Netscape and Internet Explorer will work well. If you are connecting to CMC using a system that is not a member of the cluster, you can connect using either the address of the cluster itself or the address of an individual ATM. If you use the clusters virtual IP address, you will end up on the system that is currently the primary ATM. If you are browsing from a system within the cluster, you will have to use the address of an individual ATM. The URL you use to connect to CMC should look something like this:
https://atm1.turbolinux.usa:910
NOTE
If you have trouble connecting to CMC, you may have forgotten the S in https or the port 910 specification.
The first time you connect to the CMC page, your browser will pop up some dialog boxes showing you the SSL site certificate. This may include a warning that the certificate has not been signed by a certificate authority. You can safely ignore this warning, as you can trust that the information you send will only be seen by your own systems. You may also get a warning that the certificate does not contain the correct site name. This can happen if the
ATM has multiple domain names or if you typed incorrect information in when you installed the software. Accept the certificate and follow the prompts that the browser gives you. Each time you connect to CMC, you will be prompted for a login name and password. You should use the user ID tlcsadmin. This account was created when Cluster Server was installed. It will initially have the same password as the root account on the ATM to which you are connecting. Type in the password, and you will be connected to CMC. (You can also log in under other user IDs that exist on the ATM, but you will not be able to make any modifications.) When you first log in, you will be presented with the CMC home page. There are icons along the top that allow you to navigate to various other pages. The icons may vary, depending on what user ID you logged on with.
Figure 7.5 CMC Home Page
7-13
Administration
This page has links to Cluster Server documentation as well as the other pages of the CMC administration program. These pages are: Home Status The Home page has links to man pages and other documentation. The Status page is where you can monitor the cluster. It has info on cluster processes, kernel modules, network interfaces, the /proc/net/cluster files, and log files. It also allows you to modify some of the settings and to stop and restart the daemon. This page lets you view and modify the
/etc/clusterserver/clusterserver.conf
Edit
configuration file. Report The Report page lets you generate an email message containing a large amount of information pertaining to the configuration of your cluster. You can choose who to send the info to and which pieces of information to include. This is useful for reporting bugs to TurboLinux support staff. This page is used to view the license files on your system. The View page is similar to the Edit page. It shows the configuration file, but does not allow you to edit it.
Licenses View
7-14
If you logged in as tlcsadmin, you will see all of the pages except for the View page. If logged in as a different user, you will be able to access the View page, but not the Edit, Report, or Licenses pages. The Status page is where you will spend most of your time in CMC. There are three main sections on the page. The top section shows the output of several utilities: ps, lsmod, and ifconfig. The ps output lets you see whether the Cluster Server daemons are running. The lsmod command lists the kernel modules that have been loaded; the ip_cs module should be listed. The ifconfig program shows the configuration of the network interfaces on the system. You can look through its output to make sure that the proper aliases have been configured. The top section also has buttons that allow you to start and stop the clusterserverd daemon on the active primary ATM.
. Figure 7.6 CMC Status Page
7-15
Administration
The next section shows the output of the /proc/net/cluster files. These provide information about the running cluster. The interesting pieces of information are the Statistics and the Connections. These will change as the cluster is accessed. You can use the Autorefresh button at the top of the screen to have the information periodically updated. The /proc/net/cluster files will be covered in more detail later in this chapter. The last section of the Status page displays the output of the three log files used by Cluster Server. You can look through these logs to determine if there have been any irregularities. However, you do not have access to any of the UNIX filter programs, so you may find it easier to look at the log files from the command line than through the CMC browser interface. The Edit screen allows you to make modifications to the configuration file. After you have made changes, click on the Commit button. This will send the new configuration to all the cluster machines (as long as they are running SSH). If you have not synchronized with any of the servers in the cluster before, you will need to use the form at the bottom to type in the passwords for each server. It is recommended that you use the command-line tlcs_config_sync tool (or run it via turboclusteradmin) before using the CMC synchronization process, so that the system will prompt you for missing passwords instead of requiring you to type them in before running the synchronization process.
WARNING
Do not hit the browsers Refresh or Reload button when using CMC. They may re-send button-press events, causing CMC to run the actions associated with those buttons multiple times.
Another screen called Traffic Monitor is available from the Status page. Traffic Monitor is a Java applet, so your browser must have Java support enabled. The applet shows a graphical representation of the traffic flowing though the ATM. The statistics can be shown by cluster, server, or service.
7-16
Here is a view of what Traffic Monitor looks like monitoring several cluster nodes:
Figure 7.7 Traffic Monitor
7-17
Administration
Troubleshooting
Troubleshooting is an important part of maintaining your cluster. In this section well cover various tools that you can use to troubleshoot problems. These include the log files and the files in the /proc/net/cluster directory. Well also take a look at the daemon startup sequence, which you can use as a reference to determine where things may be going wrong. Finally, well discuss some common problems and the steps that you can take to resolve them. The first important point to remember when troubleshooting is that you must test the cluster from a system outside the cluster. Cluster nodes and ATMs cannot be used to test the cluster, because the aliases they create will cause each local system to respond to service requests locally. The client does not have to be on a separate subnet; it just needs to be a system that is not a member of the cluster.
WARNING
You must test the cluster by accessing it from client systems that are not a part of the cluster. Testing your cluster from a system that is a part of the cluster may lead you to believe that the cluster is working when it is not, or that it is not working when it is. Due to the way Cluster Server is implemented, systems within the cluster will usually process traffic destined for the cluster themselves, without the traffic having ever been looked at and processed by the ATM.
Log Files
Cluster Server writes information to several log files as it works. These log files are stored in /var/log, along with all the other system log files. Poring through log files can be rather tedious, but it can be a powerful tool for locating
7-18
Troubleshooting
trouble areas. One thing that will help you to recognize problems is to observe the log files when the system is operating normally. This will give you a baseline reference, and you will be able to identify irregularities more easily. The primary log file is clusterserverd.log. It contains most of the output from the clusterserverd daemon. We cover some of the output generated in this file in the Daemon Startup section below. The file also contains information about all the server pings and ASA service checks. If any servers or services go down, that information will be listed in this file. The kernmsg file is a standard log file used by the syslog daemon to log kernel messages. The SpeedLink kernel module sends its output to this file, just like any other part of the kernel. If you turn on debugging, the kernel module will generate more output to be sent to this file. This extra information will list each packet that comes in from a client to the ATM and which cluster node it gets forwarded to. We will show you how to turn debugging on in the section on /proc/net/cluster/debug below. The CMC daemon logs some information into the cmc.log file. This file mainly gives information about connections that browsers make to the CMC daemon. This includes SSL password and key exchanges as well as action buttons that are pressed, such as starting and stopping the ATM.
Daemon Startup
The clusterserverd daemon has a well-defined startup procedure that you can monitor to see where things might be failing. You can observe the progress of the daemon, and determine where it has diverged from the normal startup process. You can use the following command to observe the output as it is generated:
# tail -f /var/log/clusterserverd.log
7-19
Administration
If you view the /var/log/clusterserverd.log file as the cluster daemon starts up, you will see something similar to the following sequence: 1. The daemon will start up and issue the message:
Starting Advanced Traffic Manager daemon
2. Version information will then be printed, including the build date. 3. The daemon will display the name of the system it is running on and the IP address:
Running on atm1.turbolinux.usa (192.168.0.1)
4. The configuration file name will be listed. The file used will normally be /etc/clusterserver/clusterserver.conf. 5. The configuration file will be read and parsed. 6. Any invalid lines in the configuration file will be listed, along with the problem with the line. 7. If parsing fails, the daemon will display the following message:
Bad Cluster Server configuration file! Going to idle mode
and not perform any further processing. If you edit the configuration file to correct the error, you can send a HUP signal to the daemon to have it re-read the configuration file and continue the startup process. Use the following command to signal it to re-read the file:
# killall -HUP clusterserverd
8. The clusters broadcast address and network mask will be displayed. 9. The ip_cs module will be loaded if it is not already running. 10. Any stale network interface aliases that exist that were created by Cluster Server (ones that have :cs0 as the alias part of their name) will be taken down. 11. If it is listed in as an ATM in the configuration file, the system will be configured to start out as a backup ATM. 12. If the system was configure as a backup ATM in the previous step, it will attempt to locate a primary ATM.
Troubleshooting
13. If the system is a backup ATM and no primary ATM is found, it will begin the election process. The election process selects the backup ATM that appears highest in the configuration file and currently running and promotes it to primary ATM. 14. The new interface aliases will be configured. If the system is the primary ATM, an alias of the Ethernet card (usually eth0:cs0) will be configured with the clusters virtual IP address. If the system is a direct forwarding node, an alias (lo:cs0) will be created on the loopback interface with the virtual IP address of the cluster. It will also write a 1 to /proc/sys/net/ipv4/conf/all/hidden and /proc/sys/net/ipv4/ conf/lo/hidden in order to squelch ARP replies. If the system is a tunneled node, the tunl interface will be brought up and an alias (tunl0:cs0) with the clusters virtual IP address will be created. Bringing up the tunnel interface will load the kernel IP-IP module. The daemon will also write a 1 to /proc/sys/net/ipv4/conf/all/hidden and /proc/sys/net/ipv4/conf/tunl/hidden to make the tunnel interface ignore ARP requests. If the system is a node using NAT forwarding, no changes will be made to the network interfaces. If the system is the primary ATM and has nodes using the NAT method, the NAT gateway address will be created as an alias on the Ethernet card. This could be eth0 or eth1, depending upon which real IP address is in the same subnet as the gateway address. This alias will be named something like eth0:natg. 15. If the system is the primary ATM, the server and service checks will be started. Each cluster node will be checked, unless configured with the noping option. ASAs will be run for each service on each node. If a server or service is found to be inactive, it will be marked as down and temporarily removed from the kernel tables.
7-21
Administration
16. The daemon will wait until it gets a signal to shut down. If it is the primary ATM, it will continue performing the service and server checks, until it receives the shutdown signal. 17. If it gets a shutdown signal, the daemon will clean up and exit. If the system is configured to be a cluster node, the IP aliases will be left as-is. If the system was acting as an ATM only, the aliases will be removed.
Using /proc/net/cluster
In addition to forwarding cluster traffic and maintaining several internal tables, the SpeedLink kernel module creates a directory in /proc that it uses to provide information and allow dynamic configuration. This /proc/net/cluster directory can be helpful when troubleshooting problems with the cluster. Values written to the files in the /proc/net/cluster directory can directly change the values of variables in the kernel module. These files are the means by which the Cluster Server daemon communicates with the kernel module. Under most circumstances, you should allow the daemon to handle modifying these parameters. However, it is important to know what the parameters mean, so you can read the current values. You can also use these files to help debug problems.
WARNING
Writing incorrect values to the files in /proc/net/cluster can cause your system to crash. You should allow the Cluster Server and CMC daemons modify these files. Only modify them by hand if absolutely necessary.
CMC allows you to look at most of these files and modify a few of the parameters. They are on the Status page in CMC, listed under the Internal Module Status heading. CMC does a good job of indicating what each piece of information in the files means. We will cover the meaning and usage of each of the files in /proc/net/cluster:
7-22
Troubleshooting
config connections debug nat servers services stat timeout
/proc/net/cluster/config
The /proc/net/cluster/config file holds the sizes of the 3 main data structures: the number of services, servers, and client connections, respectively. You can dynamically change these settings by writing to the file. For example, to change the table sizes to 25 services, 10 servers, and 5000 connections, use the following command:
# echo 25 10 5000 > /proc/net/cluster/config
You can verify that the changes took effect by reading the file again:
# cat /proc/net/cluster/config 25 10 5000
WARNING
If you write to this file, the SpeedLink module will be reset, causing all active connections to be dropped.
/proc/net/cluster/connections
The connections file contains a table of client/server pairings. This can be used to display current active connections, as well as persistent connections. Each connection is listed on a single line. Each line in the file has the following format:
Administration
prot client:port cluster:port timeout node:port packets
prot client:port cluster:port timeout node:port
The protocol, either tcp or udp. Source IP address and port number of the client system. Virtual IP address of the cluster and the port number of the service. Number of seconds until the connection times out. Cluster node IP address and port number that the packet was forwarded to. Number of packets forwarded.
packets
The following example shows an HTTP (port 80) connection from a client system at 1.2.3.4 connecting to a cluster with IP address 192.168.0.100. The packets are being forwarded to the cluster node at 192.168.0.4.
tcp 1.2.3.4:9645 192.168.0.100:80 98 192.168.0.4:80 113
Note that NAT connections will have two lines: one for the incoming connection and one for the connection between the ATM and the cluster node. This is a side-effect of the way that RFC 1631 specifies that NAT should be implemented. The connection between the ATM and the cluster node will show an address chosen from the NAT subnet as the source address.
/proc/net/cluster/debug
The debug file lets you determine whether to log additional debugging information or not. Normally this will be set to 0, meaning that only the normal logging information will be output. If you set this to a 1, additional information will be logged. To do this, issue the following command:
7-24
Troubleshooting
# echo 1 > /proc/net/cluster/debug
The additional logging information comes from the ip_cs SpeedLink kernel module, and is written to the /var/log/kernmsg file. The additional information shows new connections to the virtual server and shows which node the traffic gets forwarded to. Activating these extra log messages can create a substantial impact on the performance of the ATM, so you should use it only when debugging problems with the cluster.
/proc/net/cluster/nat
The /proc/net/cluster/nat file contains the configuration settings associated with NAT forwarding. This is the same as the NAT Subnet setting in the configuration file and the turboclusteradmin tool, with one minor difference. While the configuration file uses an IP address and subnet mask, the nat file specifies the IP address and the number of bits in the subnet mask. So if your configuration file looks like this:
NAT Subnet 10.0.0.0 255.255.0.0 EndNAT
the nat file will look like this:

10.0.0.0 16
Note that the NAT Gateway setting does not appear in this file, because that setting is only used by the clusterserverd daemon. The kernel does not need to concern itself with the NAT Gateway.
/proc/net/cluster/servers
The servers file contains a line of information about each service running on each server node in the cluster. This is the same information that is contained
7-25
Administration
in the Servers section of the configuration tool. Each line has the following format:
prot node:port cluster:port up weight method packets
prot node:port cluster:port up
The protocol, either tcp or udp. Cluster node IP address and port number for the service. Virtual IP address of the cluster and the port number of the service. Either up or down depending upon whether the server and service are running or not. A number indicating the weight of this server. A higher number means that the server will receive proportionally more traffic. The forwarding method that is used on the server. Can also be local indicating that the node is also the primary ATM. Number of packets that have been forwarded to the service on this server.
weight
method
packets
/proc/net/cluster/services
The services file contains the virtual IP addresses and port numbers for all of the services that the cluster handles. Each line has the following format:
prot cluster:port up persistence packets
7-26
Troubleshooting
prot cluster:port up persistence
The protocol, either tcp or udp. Cluster virtual IP address and port number for the service. Either up or down. This will be 1 if the service is set to be persistent or sticky. Otherwise it will be set to 0. Number of packets that have been forwarded to the service on the cluster.
packets
/proc/net/cluster/stat
The stat file contains some statistics pertaining to the operation of the cluster. These numbers are updated in real time, allowing you to watch as the traffic manager directs packets to various nodes. Writing to this file has no effect. The six values displayed are as follows: Number of services configured Number of server nodes currently in the cluster, times number of services each server handles. (Same as the number of lines in the servers file.) Current number of active connections Total number of packets received by the cluster Number of dropped packets Number of new connections
/proc/net/cluster/timeout
The timeout file allows the timeout time to be changed. If a connection does not receive any traffic in the given amount of time, the connection will be assumed to be stale and will be closed. You can change the timeout value by writing a number to this file:
Administration
# echo 100 > /proc/net/cluster/timeout
However, any value written to this file will not be remembered if the cluster daemon is restarted. To permanently change the connection timeout value, change it in the Advanced Traffic Manager Settings menu in the cluster configuration tool.
Common Problems
In this section we will list several common problems and provide you with some hints that may help you resolve them. Be sure to also check the RELEASE.NOTES file for more information. It will contain information that was made available more recently than this manual. You can access the release notes and other documentation through the CMC home page or turboclusteradmin.
Synchronization Tools Fail

There are several requirements for the synchronization tools. The server receiving the content must be running sshd, the Secure Shell daemon. Most likely, Windows NT and Windows 2000 systems will not have SSH, so they will not be able to participate in synchronization. Other systems may not have SSH installed either. Any system that has had Cluster Server installed should have the sshd daemon installed and running. This will be more of an issue with tlcs_content_sync, because you any system that is receiving configuration information will be running Cluster Server and should therefore have SSH installed. One thing you may need to check is your /etc/hosts.allow files on all the cluster nodes. They will need to have incoming SSH traffic enabled. The following line will accomplish that:
sshd : ALL
7-28
Troubleshooting
You can also limit SSH connections to just the systems within the cluster or your LAN if you desire. You can eliminate warning message in the synchronization tools by removing any servers that do not have SSH from the list of servers to be synchronized. Just be sure to always synchronize their content by hand.
Verifying That the Cluster is Working

To verify that the cluster is working, you can simply monitor its activity by using CMC or looking at the /proc/net/cluster files directly. The connections and stat files will probably be the most helpful. When generating traffic to test the cluster, always make client connections from systems that do not reside in the cluster. If you try to connect to a clustered service from a node within the cluster, it will not go through the ATM. This is because the traffic is being sent to the virtual IP address of the cluster, but we have convinced the cluster node to accept traffic being sent to that address. Since the traffic is not going through the ATM, it is not subject to the forwarding procedures that make the cluster work. To verify that the cluster will properly handle an ATM or cluster node going down, simply take that system off-line. The easiest way to do this is to remove the network cable from the system. It is a good idea to test your the reliability features of your cluster as soon as you get it configured the way you want it. You dont want to find out that the cluster is misconfigured when something really does go wrong with a system. If you are testing by disabling a cluster node, you should see in the ATMs log file that the pings and ASAs have failed and that the system is being taken out of the cluster. Any open connections that were made with that system will be dropped. Service should otherwise continue as usual. If you have disabled the primary ATM, the backup ATMs should notice this and elect one to be promoted to primary ATM. Within several seconds,
Administration
normal service should have resumed. Any connections that were active when the ATM went down may be lost, but new connections should be made initiated without any problems.
Determining Which ATM is the Primary

The first system that comes up and is listed in the list of ATMs for the cluster becomes the primary ATM. All the other systems listed as ATMs will become backup ATMs. So if you want a particular system to be the primary ATM, make sure it is the first ATM to have the clusterserverd daemon brought up. Note that if the original primary ATM goes down and comes back up, it will not be promoted to primary ATM. The current primary ATM will always remain as primary unless it goes down. The best method of determining which system is the primary ATM is probably to use CMC on the virtual IP address. This will always end up connecting to the primary ATM. The name of the ATM will be printed directly below the row of icons. You can look at the log files on each system to determine what role the system has taken. Another way to determine if a given system is the primary ATM is to look at the output of ifconfig. If the network alias (:cs0) is created on a real network interface (such as eth0) then the system is the primary ATM. Other systems will have the alias on the loopback or tunnel interface.
Cluster Generates a Lot of Extra Traffic

If your NAT settings have been misconfigured, you may notice a large amount of extra traffic on the network. Double-check to see that your NAT settings are correct. If all your NAT systems are working properly, there should not be any spurious traffic.
7-30
Chapter 8
C LUSTER S ERVER A RCHITECTURE
The TurboLinux Cluster Server product is made up of several different components. These components work with each other to implement the clustering features: load balancing, fail-over, and high availability. Some of the pieces run in kernel space, some run as daemons, and some are run from the command line. In this chapter, we will look at each of the pieces that make up the product and discuss how they fit together. The information in this chapter is not critical to getting the cluster to run, but it will help you understand how things work, so you can more effectively troubleshoot any problems if they should arise. The topics that will be covered in this chapter are: SpeedLink kernel module Cluster Server daemon (clusterserverd) Application Stability Agents (ASAs) Synchronization tools Cluster Management Console (CMC) How the pieces fit together
8-1
Cluster Server Architecture
SpeedLink Kernel Module

The heart of Cluster Server is the SpeedLink kernel module. It intercepts incoming IP traffic and makes all the forwarding decisions. There are two pieces to the SpeedLink module: the kernel patch and the module itself.
Kernel Patch
In order for the SpeedLink module to be able to look at incoming packets, a small patch to the Linux kernel must be applied. What this patch does is to allow a kernel module to look at packets very low in the TCP/IP protocol stack. The module can allow packets to pass up through the TCP/IP stack untouched, or it may decide to forward them to another machine. You can look at the kernel patch, which is included on the installation CD. It is named TLCS-ip-cs.patch and is located in the kernel source directories (kernel/TurboLinux/2.2.16-0.4 and kernel/RedHat/2.2.16-3). The patch is in flat ASCII text, so you can view it with less or any other text viewer. The kernel patch is quite small. Its primary task is to create an interface that the SpeedLink module can plug into to do its job. It does this in the lower levels of the IP protocol stack so the packet can be redirected early in the process. Most of the changes in the patch are dedicated to implementing the NAT forwarding method.
ip_cs Module
The SpeedLink module itself is named ip_cs. This module is where the real work gets done. This module plugs into the TCP/IP stack at the location designated by the kernel patch. It then looks at every incoming network packet and determines whether it is directed toward the cluster. It does this by looking at the destination address of the packet. If the destination address
matches the virtual IP address of the cluster, the packet is sent through the ip_cs module for more processing. If the destination address does not match, the packet is sent back to the TCP/IP stack for normal processing. When the ip_cs module gets a packet addressed to the cluster, it has to do some processing to figure out where to forward the packet. It does this by consulting several tables. There are three main tables: services, servers, and current connections. If you recall earlier, the size of these tables can be modified in the configuration. The first thing that happens is that the packet is checked to see if it belongs to an existing connection. If it does, then the packet is sent to the same server node as all the rest of the packets in that connection. This allows a client and server to establish a session. The session is sort of like a conversation. The client initiates the session, and then the client and server exchange packets until one of them decides that the conversation is over. Once the session is closed, the information stays in the current connections table for a short time, in case the same client wants to initiate another similar conversation. This time period is defined in /proc/net/cluster/timeout, in seconds. It is also defined by the ConnectionTimeout setting in the configuration file and in the Advanced Traffic Manager Settings menu in tlcsconfig. If a packet does not belong to an existing session, it is checked to see if it is being sent to a service port that the cluster is handling. If it is, the module looks at the servers list to determine which cluster node will be able to handle the service request. The servers are generally selected in a round-robin fashion, but the weight associated with each server and whether the server is listed as able to handle the given service is also taken into account. The ip_cs module also handles the /proc/net/cluster directory. This directory is used as a way to access the tables the module uses to make its decisions. Most of the files in the directory can be written to as well as read. Reading the files provides you with information, such as what connections currently exist.
8 -3
Writing to the files allows you to configure the servers and services that the module can use when making its decisions. The /proc/net/cluster files were covered in detail in chapter 7. You can find the source code to the SpeedLink module on the CD, in the kernel/speedlink/src directory. The code is fairly complex and consists of several source files. Well show you how to compile it, along with the modified kernel, in the next section.
Compiling the Kernel

You can use the kernel patch and the source code to the ip_cs module to compile the kernel yourself. If you cannot find a suitable kernel in the installation program, you can choose the Custom Kernel selection and follow these procedures to build your own.
WARNING
Building a kernel is not a simple process. Do not attempt to build a Cluster Server kernel as your first kernel build. Get some experience building kernels for non-production boxes first. The process outlined here is intended to be used by system administrators who are experienced compiling the kernel. For more detailed information, consult the Kernel-HOWTO.
One very important point to remember is that you want to keep the kernel configuration as close as possible to the way the vendor shipped it. However, you will also need to ensure that the kernel has all the options compiled in that are required by Cluster Server. Important options that need to be enabled include IP Aliasing and IP-IP tunneling. Most distributions will already come with these options compiled in. The following steps will allow you to build a kernel that has the Cluster Server patch applied, as well as the SpeedLink ip_cs module. These procedures are
8-4
using kernel version 2.2.16-17 as an example. You will have to adjust some of the commands if you are using a different kernel version. 1. Install the kernel sources from your distribution vendor.
# rpm -i kernel-source-2.2.16-17.i386.rpm
You can find the kernel sources on the source CD that came with the distribution, or you can download them from the vendors web site. 2. Apply the SpeedLink kernel patch.
# cd /usr/src/linux # patch -p1 -s < /mnt/cdrom/kernel/RedHat/2.2.16-3/\ TLCS-ip-cs.patch
NOTE
The patch will probably work even if the kernel version numbers do not match exactly. If you dont get any error messages, then the patch applied successfully. If the patch does not apply cleanly, you will have to make the changes by hand, or see if there is an updated patch file on the TurboLinux Cluster Server web site.
3. Compile kernel as usual.

# make clean # make menuconfig # make dep # make bzImage # make modules
4. Install the kernel.

# cp arch/i386/boot/bzImage /boot/vmlinuz-2.2.16-17 # /sbin/lilo # make modules_install
5. Copy the SpeedLink code to the hard drive so you can compile it.
# cp -a /mnt/cdrom/kernel/speedlink/src /usr/src/ip_cs
6. Compile the ip_cs module.

# cd /usr/src/ip_cs # make
7. Install the ip_cs.o module that you just built.

# cp ip_cs.o /lib/modules/2.2.16-17/ipv4/
8. Reboot and select the new kernel from your boot menu. 9. Verify that the new kernel is working properly. Be sure to check the log files and look for anything out of the ordinary. 10. Check to see if you can load the ip_cs module. You can load it by hand with the following command:
# modprobe ip_cs
If it loads with no errors, verify that it shows up in the list of loaded modules:
# lsmod Module ip_cs ppp Size Used by
26532 0 (unused) 19021 0 (autoclean) (unused)
11. Start the Cluster Server daemon:

# /etc/rc.d/init.d/clusterserverd start
If all went well, you will now be running Cluster Server on your new custombuilt kernel.
8-6
Cluster Server Daemon (clusterserverd)
Cluster Server Daemon (clusterserverd)

If the SpeedLink module is the heart of TurboLinux Cluster Server, then the clusterserverd daemon is the central nervous system. The Cluster Server daemon is the second most important part of the whole system. It configures and controls the ip_cs module, primarily through the /proc/net/cluster interface. It handles just about everything required for operation of the traffic manager except for the actual traffic forwarding. When the daemon starts, it reads the configuration file and decides what function the system is to perform: cluster node, ATM, or both. If the system is an ATM, the daemon always starts by making the system a backup ATM. It then determines whether there is a primary ATM up already. If there is no primary ATM, the backup ATMs will select a backup ATM to be promoted to primary ATM. The primary ATM is selected by choosing the first listed ATM in the configuration file that is up. The daemon also handles setting up the network interfaces. You saw this in the troubleshooting section of the last chapter. When the daemon starts up, it can configure the system as a cluster node using direct forwarding, tunneling, or NAT. It sets up the aliased interfaces with the proper IP addresses and options. Thats why we recommended that you use Cluster Server on all the nodes in the cluster, so you dont have to configure any of the nodes by hand. The daemon also performs all the network checks. These include the heartbeat broadcasts that the primary ATM generates to let the backup ATMs know that it is working. And, of course, it is the daemon on the backup ATMs that listens for these broadcasts and promotes the backup to primary if it does not hear from the primary. In addition, the server pings and ASA service checks are performed by the daemon. For the service checks, it uses the external ASA programs defined in the configuration file. If a server or service
8 -7
fails, the daemon temporarily removes it from the list maintained by the kernel module. The clusterserverd daemon uses two ports for communication between systems in the cluster. UDP port 17100 is used for heartbeat broadcasts. The primary ATM sends out the broadcasts on this port, and the backup ATMs listen for the broadcasts to make sure that the primary is still functioning. (This is one reason all ATMs must be on the same subnet -- broadcasts are not routed out of the subnet where they originate.) The daemon uses TCP port 17101 as an administrative port. It is primarily used as a communication channel between clusterserverd and the CMC daemon. You should use the Security Settings in the configuration program to allow only local access on this port. You can change the port numbers that the daemon uses by adding entries to the /etc/services file. The clusterserver service is the UDP heartbeat, and defaults to 17100. The clusterserveradm service is used for the administration channel, with 17101 as the default. If you want to run two separate clusters on the same subnet, you will have to change these port numbers on one of the clusters. Otherwise the clusters will confuse each other with their heartbeat broadcasts. If you change these values, be sure to do it for every ATM in the cluster, or you will have systems listening on the wrong ports.
WARNING
Make sure that you have set the Security Settings to allow access to localhost (127.0.0.1/255.255.255.255) and to deny everyone else (0.0.0.0/0.0.0.0). Otherwise, unauthorized persons will be able to change the configuration of your cluster.
8-8
Application Stability Agents (ASAs)

Application Stability Agents are programs or scripts that perform a simple service check. They run on the ATM and connect to the service ports on each server node in the cluster. They perform a simple transaction in order to ensure that the service is running properly. Without these checks, the ATM would send service requests to cluster nodes even when the nodes are unable to respond. Some other clustering solutions do not have full-featured agents. Instead, they just perform a ping and check to see that the service port is able to make a connection. The advantage of the Application Stability Agent is that it not only ensures that the server is able to answer a port connection, but it also verifies that the service attached to that port is able to answer the request. ASAs run on the primary ATM. They are defined in the UserCheck sections of the configuration file, and the Services section of the tlcsconfig program. They are called periodically, as specified by the Check Service Frequency setting in the Server Groups section of the configuration program. If an ASA does not get an answer within the number of seconds specified by Check Service Timeout, the service is assumed to be down. (These settings are called CheckPortFrequency and CheckPortTimeout in the configuration file.) When an ATM calls an ASA, it passes several arguments to the agent. The first argument is the IP address of the cluster node to check. The second argument is the port number to check. The final argument specifies whether the service runs on a UDP port or a TCP port. If it is a TCP service, the final argument is a 1; if it is UDP, a 2 is passed. The agent will use the information it is given to connect to the service on the specified node. If the ASA finds that the service is up, it must return a 0, otherwise it must return a 1.
8 -9
When a service is found to be down, that service is temporarily removed from the table in the ip_cs module. In addition, another script may be executed when the service goes down. This script is called Event triggered when down in the tlcsconfig program, and is labeled Down within a UserCheck section in the configuration file. There is a corresponding script that gets called when a service that was down comes back up. These up and down scripts are called with the same arguments as the ASAs themselves. The Down script is helpful in that it allows you to make an attempt to bring the service back up. Remember that the Down script will be run on the ATM, but will be passed the name of the server node that had the service go down. Therefore, the script will have to use the server name to contact the server by some other mechanism and try to bring the service up. Using SSH to run commands on the remote service may be helpful when developing such a script. Another possible use of the Up and Down scripts would be to I/O fencing for fail-overs. I/O fencing is used to ensure that only one of the systems ever access shared resources at a given time. You can monitor ASA checks in the /var/log/clusterserverd.log file. The checks are prefixed by info 021. When a service comes up, an info 007 message is logged, giving the server name and the port number. When the service goes down, an info 009 message is logged. Here is an example log, showing just a few ASA checks:
08/06 10:12:01 info 021 Checking cluster1.tl.usa service node1.tl.usa:80 (pid 143) 08/06 10:12:05 info 007 Service node1.tl.usa:80 is up 08/06 10:21:41 info 009 Service node1.tl.usa:80 is down
The following Application Stability Agents come with Cluster Server: DB2Agent dnsAgent ftpAgent
genericAgent httpAgent httpsAgent http10Agent imapAgent nntpAgent oracleAgent popAgent smtpAgent
These agents are built as executables, but they could have been written as shell scripts or Perl scripts just as easily. You can find out more about each of these by reading the man page entry, which is also available on the CMC home page.
8-11
The synchronization tools provide coherency among the nodes in the cluster. We talked about how they are used in the previous chapter. In this section, we will explain how they are implemented. The synchronization tools use Secure Shell, or SSH, to transfer files between systems. The source system is always the system that you are running the utility on. The other systems listed will receive the content. Note that there is no turning back -- once you have committed to sending changes, whatever is on the source system gets transferred to the others. The actual file transfers are performed by the scp program. Like all the programs in the SSH suite, you will be required to type in a password the first time you connect to each system. Use the root password for each system. SSH will remember the connection settings, so you will not have to type in the password after the first successful connection. The password is stored in encrypted form, so you do not need to worry about the password being stored in clear text on the system. SSH does its work securely at all times. The configuration synchronization program (tlcs_config_sync) is just a special case of content synchronization. It copies the files in the /etc/clusterserver directory so that all the systems are configured in exactly the same way. It also copies the license files, so that all the systems will be able to run. The version of SSH that is installed along with TurboLinux Cluster Server is F-Secure SSH 1.3.7. If you have a version of SSH on your other cluster nodes, you can include them in the synchronization process, even if they are not running the Cluster Server software. However, there are some known incompatibilities between different versions of SSH, so you may need to switch versions if you have any difficulties. Systems that do not have SSH
8-12
installed should be removed from the list of servers in the synchronization programs. You will need to synchronize content on those systems by hand. If you have more demanding content synchronization needs, you may want to look into a more robust solution. The rsync system is quite good at synchronizing data between several systems, and can figure out what needs to be synchronized from multiple sources. A secure version called ssync has also been implemented. As was mentioned in chapter 2, the ultimate solution is to deploy a distributed file system to maintain consistent content between all the nodes, or to use some hardware shared storage solution.
8-13

The Cluster Management Console provides you with a web interface to monitor the performance of the cluster and to make dynamic changes to the configuration. We showed you how to use CMC to help administer your cluster in the previous chapter. In this section, we will explain how it works. CMC runs as a daemon on each ATM system in the cluster. It is started with an initialization script, /etc/rc.d/init.d/cmcd. The daemon process is called cmc, which will show up if you do a ps x listing. (Actually, it will show up twice -one process listens for incoming connections, and the other monitors traffic on the ATM.) The CMC daemon listens on TCP port 910 for incoming secure HTTP connections. When a browser initiates a connection on port 910, the CMC daemon requests that the client authenticates itself. Communication is secured using the SSL protocol, so the password will be encrypted before being sent across the network. Information passed between the browser and the ATM will also be encrypted, so none of the information displayed or entered in CMC can be intercepted. The browser will pop up a user name and password request. The tlcsadmin user account should be used when connecting to CMC. You can also use the root account, but may have less functionality when doing so. The tlcsadmin account is created during installation, and will initially have the same password as the root user. The first time you connect to CMC, the browser may also pop up an alert telling you about the security certificate on the CMC system. The SSL certificate was generated when you installed TurboLinux Cluster Server. You can have your certificate validated by a commercial certificate authority such as VeriSign or Thawte. However, the process of obtaining such a certificate is rather involved and is not necessary. If you do obtain such a certificate, store it in the cmc-cert-key.pem file in the /etc/clusterserver directory.
8-14
CMC gathers much of its information from the kernel module via the /proc/net/cluster files. All of the Internal Module Status sections on the Status page of CMC come directly from the /proc files. They are formatted nicely, and even allow you to dynamically change some of the settings. Note that if you change these settings dynamically, the changes will not persist when the ATM is restarted. To change settings permanently, you will need to make the changes to the configuration file. The other information on the Status page is gathered using command line utilities and log files. The commands and file names are listed with each section. CMC allows you to stop and start the clusterserverd daemon on the ATM. Be aware that if you stop or restart the primary ATM, the backup ATMs may take over the primary ATM responsibilities, so the system you are looking at may no longer be the primary ATM after you restart it. As was discussed earlier, the CMC daemon communicates with the Cluster Server daemon via TCP port 17101. This communication channel is used to allow CMC to control some aspects of the clusterserverd daemon.
8-15
Putting All the Pieces Together

The clusterserverd daemon is the ringmaster that gets things going. It reads the configuration file to determine what needs to be done. It makes sure the network interfaces are configured correctly. It loads the kernel module if necessary, and configures it, using the /proc/net/cluster files. The daemon is started via the /etc/rc.d/init.d/clusterserverd startup script. Like other startup scripts, you can specify the start, stop, and restart options to bring the service up or down. The script is normally called at boot time using the Sys V init scripts process. Note that some of the configuration settings are used by the daemon itself, and some are used to configure the SpeedLink kernel module. For example, the ATM settings include ARP settings, delay settings, maximum number of entries in the tables, and the connection timeout value. The table sizes and connection timeout are used to configure the kernel module, while the rest of the settings are used by the daemon itself. The kernel module does most of the traffic management work, deciding where to send each packet, and forwarding the packets. But it is the daemon that tells the SpeedLink module how to do its job. The daemon also monitors the servers and services and removes them from the kernel tables if they are found to be down or unresponsive. CMC is provided as an additional tool to monitor the performance of the cluster. It does this by actively watching the /proc/net/cluster files and charting the progress. However, it can also be used to modify the daemon settings as well as the kernel module settings. It can tell the module to start, restart, or stop. It performs most of these tasks by communicating its wishes to the clusterserverd daemon. CMC actually runs as a daemon itself, named cmc. It has its own startup script, in /etc/rc.d/init.d/cmcd.
8-16
Putting All the Pieces Together
The synchronization tools allow you to keep nodes in the cluster consistent. To run an effective cluster, you must maintain consistency. This includes not only the content, but the configuration settings as well.
Conclusion
By combining these various elements, the Cluster Server system is able to accomplish its goals. Multiple computers can be combined to provide enhanced speed, reliability, and scalability. The workload is distributed among several servers instead of concentrating all the work on one large server. The system can be set up to ensure that there is no single point of failure, greatly increasing the availability of your services. Learning how everything fits together takes some time and effort, but in the end you will have a reliable and scalable system. After you have figured out how everything works, you will find that managing the cluster will come quite naturally. The combination of the technologies involved will help you meet your objectives while reducing costs and administration time.
8-17
8-18
G LOSSARY
This glossary lists acronyms and terms and their definitions used in TurboLinux manuals. TurboLinux gratefully acknowledges the following sources for content included in the Glossary entries. All rights are reserved by the providers of the source content. Content from http://www.whatis.com, Copyright 1996-2000 TechTarget.com, Inc. Content from the Internet Software Consortium, 2000 Internet Software Consortium. The Single UNIX Specification, Version 2, Copyright 1997 The Open Group. The GNU C Library, Copyright 1999 by The Free Software Foundation Webopedia, Copyright 2000 internet.com Corp. OReilly Network, Copyright 2000 O'Reilly and Associates, Inc. Stelias Computing, Copyright 1999 Stelias Computing Inc. IBM web site, (C) Copyright IBM Corporation 1999, 2000. All rights reserved. Connected: An Internet Encyclopedia, its editor, and contributors Operating System Concepts (Fifth Edition) Silberschatz & Galvin
G-1
Developer's Resources Copyright 1999 Emmett Dixson Linux.com website, 1999, 2000 Linux.com Transaction Processing Performance Council documentation Contributors for content available at Linux and other Open Source web sites.
G-2
A
Advanced Traffic Manager (ATM)
The traffic manager for a TurboLinux Cluster Server cluster. It routes traffic destined for the cluster to individual cluster nodes. It makes the determination where each packet should go. The ATM is able to intelligently determine whether each node within its cluster is still available. It continuously probes each system to verify that not only is the system still healthy, but that the application is still healthy, as well. The Advanced Traffic Manager is also able to recognize the capabilities of each individual node and distribute the incoming traffic to the system that's best able to handle the request. See also cluster, node. Related Link(s): http://www.turbolinux.com/products/tcs/
agent
See Application Stability Agent.
Apache
Apache is a freely available Web server that is distributed under an Open Source license. The Apache httpd server is a powerful, flexible, HTTP/1.1 compliant web server that implements the latest protocols. It is highly configurable and extensible with third-party modules. Apache provides full source code and comes with an unrestrictive
G-3
license. It runs on Windows NT/9x, Netware 5.x, OS/2, and most versions of Unix, as well as several other operating systems. Related Link(s): http://www.apache.org/
API
An API (application program interface) is the specific method prescribed by a computer operating system or by another application program by which a programmer writing an application program can make requests of the operating system or another application. An API can be contrasted with a graphical user interface or a command interface (both of which are direct user interfaces) as interfaces to an operating system or a program. Related Link(s): http://www.whatis.com/api.htm
Application Stability Agent (ASA)

Program that is used to determine whether a particular service on a cluster node is active. The agent usually performs a simple transaction in addition to verifying that a connection can be established.
ARP
Address Resolution Protocol. ARP resolves IP addresses into hardware (MAC) addresses. Once a common encapsulation mechanism has been selected for Ethernet, hosts must still convert a 32-bit IP address into a 48-bit Ethernet address. The Address Resolution Protocol (ARP), documented in RFC 826, is used to do this. It has also been adapted for other media, such as FDDI.
G-4 TurboLinux Cluster Server 6 User Guide
ARP works by broadcasting a packet to all hosts attached to an Ethernet. The packet contains the IP address the sender is interested in communicating with. Most hosts ignore the packet. The target machine, recognizing that the IP address in the packet matches its own, returns an answer. Hosts typically keep a cache of ARP responses, based on the assumption that IP-to-hardware address mapping rarely change. Related Link(s): http://webopedia.internet.com/TERM/A/ARP.html
ASA
See Application Stability Agent.
ATM
See Advanced Traffic Manager.
B
backup ATM
System that stands prepared to take over for the primary ATM if it goes down. The backup ATM is basically a fail-over system of the primary ATM.
Bash
Bash is a Unix command interpreter (shell). It is an implementation of the Posix 1003.2 shell standard, and resembles the Korn and System V shells.
G-5
Bash contains a number of enhancements over those shells, both for interactive use and shell programming. Features geared toward interactive use include command line editing, command history, job control, aliases, and prompt expansion. Programming features include additional variable expansions, shell arithmetic, and a number of variables and options to control shell behavior. Bash was originally written by Brian Fox of the Free Software Foundation. The current developer and maintainer is Chet Ramey of Case Western Reserve University. The latest version is 2.04, first made available on Friday, 17 March 2000. Related Link(s): ftp://ftp.cwru.edu/pub/bash/FAQ
Beowulf
A clustering technology used to implement processing clusters on Linux systems. Beowulf is not a product per se, but a collection of technologies. Note that TurboLinux Cluster Server and Beowulf do not implement the same type of clustering. Beowulf is used for CPU-intensive tasks, whereas Cluster Server is used for serviceoriented tasks. Beowulf is an approach to creating a supercomputer made up of a cluster of standard PCs running Linux. The PCs are usually connected via Ethernet and run programs created for parallel processing. A server node feeds data to the rest of the cluster for processing, and serves as an administration system. Related Link(s): http://www.beowulf.org/
G-6
BIND
BIND (Berkeley Internet Name Domain) is an implementation of the Domain Name System (DNS) protocols and provides an openly redistributable reference implementation of the major components of the Domain Name System, including: A Domain Name System server (named) A Domain Name System resolver library Tools for verifying the proper operation of the DNS server Related Link(s): http://www.isc.org/products/BIND/docs/bind8.2_highlights.html
BIOS (Basic Input/Output System)

Accurately called the ROM BIOS, this is a set of read-only memory (ROM) chips which are programmed to take over operation of the computer when it starts. The BIOS coordinates all the messages printed when a PC is first switched on. It gets its configuration data about the number of hard drives, what size floppy disk is present etc. from the CMOS configuration chip.
BOOTP
The booting protocol (BOOTP) allows a client machine to discover its own IP address, the address of a server host, and the name of a file to be loaded into memory and executed. Further information is in RFC 951. Related Link(s): http://www.cis.ohio-state.edu/hypertext/information/rfc.html
G-7
BSD
BSD (originally: Berkeley Software Distribution) refers to the particular version of the UNIX operating system that was developed at and distributed from the University of California at Berkeley. BSD is customarily preceded by a number indicating the particular distribution level of the BSD system (for example, 4.3 BSD). BSD UNIX has been popular and many commercial implementations of UNIX systems are based on or include some BSD code. Related Link(s): http://www.ee.byu.edu/unix-faq/subsubsection3_8_3_2.html
C
client/server, client-server
A common form of distributed system in which software is split between server tasks and client tasks. A client sends requests to a server, according to some protocol, asking for information or action, and the server responds. This is analogous to a customer (client) who sends an order (request) on an order form to a supplier (server) who dispatches the goods and an invoice (response). The order form and invoice are part of the protocol used to communicate in this case. There may be either one centralized server or several distributed ones. This model allows clients and servers to be placed independently on nodes in a network, possibly on different hardware and operating systems appropriate to their function, e.g. fast server/cheap client.
Examples are the name-server/name-resolver relationship in DNS, the file-server/file-client relationship in NFS, and the screen server/client application split in the X Window System. Related Link(s): http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi? query=client%2Fserver&action=Search
cluster, clustering
A cluster is any collection of more than one computers that can be accessed independently but also as a unit. Clustering technology lets users harness multiple servers together to make one high performance server. This technology was originally created by Digital Equipment Corp. Clustering is used for parallel processing, for load balancing and for fault tolerance. Clustering is a popular strategy for implementing parallel processing applications because it enables companies to leverage the investment already made in PCs and workstations. In addition, it's relatively easy to add new CPUs simply by adding a new PC to the network. Related Link(s): http://www.linuxyes.com/en/scenter/cluster.html http://metalab.unc.edu/mdw/HOWTO/Parallel-Processing-HOWTO3.html#ss3.1

A web-based administration program that will let you monitor and modify the cluster from a web browser.
G-9
cluster manager
See Advanced Traffic Manager.
cluster node
A computer within a cluster that does actual processing of service requests. The cluster manager distributes the workload among the cluster nodes. From outside the cluster, clients do not care which cluster node will process their request and are usually unable to determine which node they hit. In TurboLinux Cluster Server, the nodes perform the network services that the cluster supports.
clusterserverd
The daemon that works to implement TurboLinux Cluster Server. On the primary ATM, it works in conjunction with the SpeedLink kernel module to route incoming traffic to the appropriate cluster nodes. On backup ATMs, it monitors the primary ATM, ready to fail-over if the primary ATM fails. On cluster nodes, it sets up the network interfaces in preparation to run as a node.
connection
A connection is a communication session between two network hosts. One host (the client) initiates a conversation with another system (the server). The connection is like a conversation, with the client and server sending data to each other. The connection can be terminated by either side.
CMC
See Cluster Management Console.
G-10
CPU (Central Processing Unit)

Commonly called a microprocessor, this is the part of a computer which controls all the other parts. The job of the CPU is to coordinate information flow between the memory and other hardware components such as hard drives, network and other peripheral cards. TurboLinux runs on Intel CPUs and those compatible with them (such as AMD, Cyrix and others) and also PowerPC CPUs from Motorola and IBM.
D
daemon
In Unix terminology, a daemon is customarily used to denote server programs. Daemons are memory-resident programs executed only when they receive a request from another program. Server programs such as FTP and TELNET are generally implemented as daemons. Program names of most daemons end in the letter d to indicate that they are daemons. Unix systems run many daemons, chiefly to handle requests for services from other hosts on a network. Most of these are now started as required by a single real daemon, inetd, rather than running continuously. Examples are cron (local timed command execution), rshd (remote command execution), rlogind and telnetd (remote login), ftpd, nfsd (file transfer), lpd (printing). Related Link(s): http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?daemon
G-11
datagram
A datagram is, to quote the Internet's Request for Comments 1594, a self-contained, independent entity of data carrying sufficient information to be routed from the source to the destination computer without reliance on earlier exchanges between this source and destination computer and the transporting network. The term has been generally replaced by the term packet. Datagrams or packets are the message units that the Internet Protocol deals with and that the Internet transports. A datagram or packet needs to be self-contained without reliance on earlier exchanges because there is no connection of fixed duration between the two communicating points as there is, for example, in most voice telephone conversations. (This kind of protocol is referred to as connectionless.) Related Link(s): http://www.whatis.com/datagram.htm
DHCP
The Dynamic Host Configuration Protocol (DHCP) provides a framework for passing configuration information to hosts on a TCP/IP network. DHCP is based on the Bootstrap Protocol (BOOTP), adding the capability of automatic allocation of reusable network addresses and additional configuration options. DHCP captures the behavior of BOOTP relay agents, and DHCP participants can interoperate with BOOTP participants. DHCP allows a server to dynamically distribute IP addressing and configuration information to clients.
G-12
DHCP consists of two components: a protocol for delivering hostspecific configuration parameters from a DHCP server to a host and a mechanism for allocation of network addresses to hosts. DHCP is built on a client-server model, where designated DHCP server hosts allocate network addresses and deliver configuration parameters to dynamically configured hosts. Configuration parameters and other control information are carried in tagged data items that are stored in the options field of the DHCP message. The data items themselves are also called options. Further information is in RFCs 1533, 1534, and 951. Related Link(s): http://www.cis.ohio-state.edu/hypertext/information/rfc.html
direct forwarding
Forwarding method where the ATM sends the packet directly to the MAC address of the cluster node. The ATM and node must reside on the same subnet. Reply traffic goes directly from the cluster node to the client, without having to travel through the ATM.
distributed systems
A collection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine. This is in contrast to a network, where the user is aware that there are several machines, and their location, storage replication, load balancing and functionality is not transparent. Distributed systems usually use some kind of clientserver organization.
G-13
DNS (Domain Name Service)

Domain naming, and its most visible component, the Domain Name Service (DNS), is critical to the operation of the Internet. DNS is a general-purpose distributed, replicated, data query service used on Internet for translating fully-qualified domain names (FQDNs) such as fred.test.com.us into numerical IP addresses such as 192.168.23.10. DNS can be configured to use a sequence of name servers, based on the domains in the name being looked for, until a match is found. Under TurboLinux, DNS can be queried interactively using the command nslookup. See also Name Server.
domain name
Domain names usually refer to Internet domains, which form the basis of the common Internet naming scheme. For example, www.cnn.com is a domain name, and cnn.com is a domain. Related Link(s): http://www.freesoft.org/CIE/Topics/10.htm
E
encapsulation
Encapsulation, closely related to the concept of Protocol Layering, refers to the practice of enclosing data using one protocol within messages of another protocol. To make use of encapsulation, the encapsulating protocol must be open-ended, allowing for arbitrary data to placed in its messages.
G-14
Another protocol can then be used to define the format of that data. Related Link(s): http://www.freesoft.org/CIE/Topics/18.htm
Ethernet
Ethernet is a type of local area network and was originally developed by Xerox, Intel, and Digital Equipment Corporation in the late 1970s, with specifications first released in 1980. Ethernet was first designed to transport data at rates up to 10 million bits per second over coaxial cable; the original standard defined the cabling, connectors, and other characteristics for the transmission of data, voice, and video over local area networks at 10 Mbps. Recent improvements have increased the speed to 100 Mbps.
Ext2fs (Second Extended Filesystem)

This is the Linux native filesystem. It is fast and reliable and used by all Linux distributions. Also called a Linux partition. The EXT2 file system uses fragments to prevent too much wasted space through the incomplete use of blocks. The fragment size in Linux must be the physical block size multiplied by a power of two. Files in Linux are, therefore, a sequence of blocks followed by a small sequence of consecutive fragments. Fragments may be grouped into a block if there are enough fragments used at the end of a file to create an entire block. Likewise, when a file decreases in size, the last block may be broken into fragments. Related Link(s): http://e2fsprogs.sourceforge.net/ext2.html
G-15
event monitoring
Event monitoring is gathering information about events that occur during the running of an application. Event monitoring is useful for detecting deadlocks, overflow events, transaction completion, and application disconnections. An event can be a file system running out of space, processor utilization going too high, or anything that can be detected or is measurable. Within a cluster, an event manager monitors how the resources in the cluster are working and informs a parallel or distributed program that events of interest have happened. The event manager, does not, however, respond to those events.
F
fail-over
Method of fault tolerance that has 2 or more systems running in parallel doing some sort of processing. Normally the primary system will process all requests. If the primary system goes down, the backup system will take over processing. The term can also refer to the process of the backup system taking over for the primary system when it has failed. Compare with load balancing. The two terms are similar, but failover implies that only one system will process requests at any one time, whereas load balancing normally has all the systems processing requests in parallel.
G-16
firewall
The firewall features in Linux are fully configurable to allow or deny any type of service from or to any address on the Internet. Undesirable sites can be blocked from incoming or outgoing connections, and internal systems can be protected from outside attack. If you're using the reserved addresses on your LAN, Linux will perform Network Address Translation (NAT) to allow connections to the Internet.
FQDN (Fully Qualified Domain Name)

The full name of a computer on the Internet, consisting of its local hostname and its domain name. For example, smoke is a hostname and smoke.com.test.us is a FQDN. A FQDN should be sufficient to determine a unique Internet address for any host on the Internet. The same naming scheme is also used for some hosts which are not on the Internet, but share the same name-space for electronic mail addressing.
FTP (File Transfer Protocol)

A client-server protocol which allows a user on one computer to transfer files to and from another computer over a TCP/IP network. Also the client program the user executes to transfer files. TurboLinux includes a friendly FTP client called ncftp. Anonymous FTP - An interactive service provided by many Internet hosts allowing any user to transfer documents, files, programs, and other archived data using File Transfer Protocol. A good example is the TurboLinux ftp site at ftp.turbolinux.com. The user logs in using the special username ftp or anonymous and his e-mail address as password. He then has access to a special directory hierarchy containing the publicly accessible files.
TurboLinux Cluster Server 6 User Guide G-17
G
gateway
A gateway is another name for a router. The default gateway is the router that traffic will be routed through if the destination address does not exist on the same subnet. See router.
General Public License

See GNU General Public License.
GNU
GNU is The Free Software Foundation's project to provide a freely distributable replacement for Unix. It stands for GNUs Not UNIX, a recursive acronym. A large amount of GNU software is shipped with TurboLinux, and nearly all the software is under the GNU General Public License. Related Link(s): http://www.gnu.org/
GNU General Public License

The General Public License (GPL) from the GNU project ensures that a software user has freedom to share, modify, and exchange free software. The GNU GPL has comprehensive guidelines and terms for anyone who develops software and feels that it can be of use to the public at large. Related Link(s): http://www.gnu.org/copyleft/gpl.html
G-18
GPL
See GNU General Public License.
H
heartbeat, heartbeat monitoring
Heartbeat monitoring consists of system services that maintain constant communication between all the nodes in a cluster. Heartbeat monitoring ensures that each node is active; a heartbeat message is sent every few seconds from every node in the cluster to its upstream neighbor. When the heartbeat for a node fails, the condition is reported so the cluster can automatically fail over resilient resources to a backup node. Heartbeat monitoring also attempts to reestablish communications in the event of a failure and reports unrecoverable failures to the rest of the cluster.
heterogeneous
Heterogeneous, which is the characteristic of containing dissimilar constituents, is commonly used in information technology to describe a product as able to contain or be part of a heterogeneous network, consisting of different manufacturers' products that can interoperate. Heterogeneous networks are made possible by standard hardware and software interfaces used in common by different products, thus allowing them to communicate with each other. The Internet is an example of a heterogeneous network.
G-19
high availability
A system that maintains availability of a service despite hardware or software faults. This is usually done by implementing redundancy of hardware and software. High availability is often measured in percentage of time that the system is up, such as 99.99% uptime. High Availability (HA) means access to data and applications whenever needed and with an acceptable level of performance. A high availability situation is when all of a network's resources are available for the maximum amount of time. Theoretically, the availability percentage can never be 100% but clustering attempts to bring this percentage of time as close to 100% as possible. HA deals with the service aspect of the system as an unbroken whole and as perceived by its end users. In this context, reliability (of hardware and software components) and performance (responsetime/throughput, transactions per minute, etc.) are parts of system availability. Availability can also be expressed as MTTF/(MTTF+MTTR), where: MTTF (mean-time-to-failure) is the average time that a system runs (without failing) after it has been set up or repaired. MTTR (mean-time-to-repair) is the average time needed to repair (or restore) a failed system. See also single point of failure. Related Link(s): http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Avail ability-HOWTO.html
G-20
host
The term host can be used in several contexts with slightly different meanings in each: On the Internet, the term host means any computer that has full two-way access to other computers on the Internet. A host has a specific host address that, together with the network number, forms its unique Internet Protocol (IP) address. In large mainframe computer environments, a host is a mainframe computer. A host can also indicate a device or program that provides services to some smaller or less capable device or program.
HTML
Hypertext Markup Language (HTML) is the set of markup symbols or codes inserted in a file intended for display on a World Wide Web browser. The markup tells the Web browser how to display a Web page's words and images for the user. The individual markup codes are referred to as elements (but many people also refer to them as tags). The current version of HTML is HTML 4. HTML is a standard recommended by the World Wide Web Consortium (W3C) and adhered to by the major browsers, Microsoft's Internet Explorer and Netscape's Navigator, which also provide some additional non-standard codes. Related Link(s): http://www.w3.org/MarkUp/
HTTP
The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the speed necessary for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object-
G-21
oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). A feature of HTTP is the typing of data representation, allowing systems to be built independently of the data being transferred. HTTP has been in use by the World-Wide Web global information initiative since 1990. You can find more information in rfc1945. Related Link(s): http://www.w3.org/Protocols/
I
ICMP
Internet Control Message Protocol (ICMP) is a message control and error-reporting protocol between a host server and a gateway to the Internet. ICMP uses Internet Protocol (IP) datagrams, but the messages are processed by the IP software and are not directly apparent to the application user. Related Link(s): http://www.whatis.com/icmp.htm
IETF
The Internet Engineering Task Force (IETF) is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. Related Link(s): http://www.ietf.org/
inetd
inetd is one of the most popular super server programs. By default, TurboLinux installs inetd and sets it as the system program at boot time. Instead of using inetd, you can use xinetd, an expanded function version of inetd. Related Link(s): http://www.delorie.com/gnu/docs/glibc/libc_227.html
interface
A boundary across which two systems communicate. An interface might be a hardware connector used to link to other devices, or it might be a convention used to allow communication between two software systems. Often there is some intermediate component between the two systems which connects their interfaces together. For example, two EIA-232 interfaces connected via a serial cable.
Intermezzo
InterMezzo is a distributed file system which lets systems replicate directory trees. Systems make modifications locally and propagate updates to peers when these are available. If networks or peers are down, the system continues to function and the modifications are reintegrated when the system is back up. Applications of this file system span from replicating entire systems, to making home directories transparently available on mobile computers. Related Link(s): http://inter-mezzo.org/ http://linux-ha.org/PhaseII/WhitePapers/braam/intermezzo/ opc99_html/
G-23
IP (Internet Protocol)
The network layer for the TCP/IP protocol suite. If it is the Internet, it's TCP/IP. On TurboLinux, TCP/IP configuration is largely done using TurboNetCfg.
IP Address
An IP address is a 32-bit number that uniquely identifies an Internet host. Related Link(s): http://www.whatis.com/ipaddres.htm http://www.3com.com/nsc/501302.html
ip_cs
Name of the SpeedLink kernel module. It inserts itself into the TCP/IP stack to implement TurboLinux Cluster Server traffic management.
IRQ (Interrupt Request)

Technically speaking this is a feature of the CPU which causes it to suspend normal operation and start doing something else. The something else is usually to do with peripherals such as network cards, sound cards, serial communications and so on. When you move the mouse an interrupt is generated. The reason a TurboLinux user might be interested in this is because each PC device needs to have a unique IRQ number, and there are only so many to choose from. An interrupt conflict is what happens when two devices are designed by the manufacturer to use the same
G-24
interrupt. The more hardware you have in your PC the more likely you are to have to solve this problem.
K
kernel
By definition, a kernel is the essential part of Unix or other operating systems such as Linux; it is responsible for resource allocation, low-level hardware interfaces, security etc. A synonym is nucleus. A kernel can be contrasted with a shell, the outermost part of an operating system that interacts with user commands. Kernel and shell are terms used more frequently in UNIX. Typically, a kernel (or any comparable center of an operating system) includes an interrupt handler that handles all requests or completed I/O operations that compete for the kernel's services, a scheduler that determines which programs share the kernel's processing time in what order, and a supervisor that actually gives use of the computer to each process when it is scheduled. A kernel may also include a manager of the operating system's address spaces in memory or storage, sharing these among all components and other users of the kernel's services. A kernel's services are requested by other parts of the operating system or by applications through a specified set of program interfaces sometimes known as system calls. Because the code that makes up the kernel is needed continuously, it is usually loaded into computer storage in an area
G-25
that is protected so that it will not be overlaid with other less frequently used parts of the operating system. A microkernel is an approach to operating system design emphasizing small modules that implement the basic features of the system kernel and can be flexibly configured. See also Linux kernel. Related Link(s): http://www.uwsg.iu.edu/hypermail/linux/kernel/
L
latency
Latency has different meanings in different contexts. In a network, latency, a synonym for delay, is an expression of how much time it takes for a packet of data to get from one designated point to another. In some usages (for example, AT&T), latency is measured by sending a packet that is returned to the sender and the round-trip time is considered the latency. The latency assumption seems to be that data should be transmitted instantly between one point and another (that is, with no delay at all). The contributors to network latency include: Propagation: This is simply the time it takes for a packet to travel between one place and another at the speed of light. Transmission: The medium itself (whether fiber optic cable, wireless, or some other) introduces some delay. The size of the packet introduces delay in a round trip since a larger packet will take longer to receive and return than a short one.
G-26
Router and other processing: Each gateway node takes time to examine and possibly change the header in a packet (for example, changing the hop count in the time-to-live field). Other computer and storage delays: Within networks at each end of the journey, a packet may be subject to storage and hard disk access delays at intermediate devices such as switches and bridges. (In backbone statistics, however, this kind of latency is probably not considered.) In a computer system, latency is often used to mean any delay or waiting that increases real or perceived response time beyond the response time desired. Specific contributors to computer latency include mismatches in data speed between the microprocessor and input/output devices and inadequate data buffering. Within a computer, latency can be removed or hidden by such techniques as pre-fetching (anticipating the need for data input requests) and multithreading, or using parallelism across multiple execution threads.
LDAP
LDAP is a specification for a client-server protocol to retrieve and manage directory information. It was originally intended as a means for clients on PCs to access X.500 directories, but can also be used with stand-alone and other kinds of directory servers. The first implementation of LDAP was developed at the University of Michigan. The earlier version of the protocol, 2, is supported in software from University of Michigan. Version 2 was published as RFC 1777 and RFC 1778. LDAP does not require the upper layers of OSI stack; it is a simpler protocol to implement (especially in clients), and LDAP is
G-27
under IETF change control and so can more easily evolve to meet Internet requirements. An LDAP directory is organized in a simple tree hierarchy consisting of the following levels: The root directory (the starting place or the source of the tree), which branches out to Countries, each of which branches out to Organizations, which branch out to Organizational units (divisions, departments, and so forth), which branches out to (includes an entry for) Individuals (which includes people, files, and shared resources such as printers) Related Link(s): http://www.ietf.org/rfc/rfc1777.txt?number=1777 http://www.mozilla.org/directory/standards.html
LILO
Although opinions vary, LILO is certainly the most popular boot loader for Linux. It resides on your hard drive, and at boot time it presents you with a boot prompt, where you can choose an operating system to boot, choose a particular Linux kernel to load, and pass special parameters to the Linux kernel when it is loaded. LILO is fast, flexible, and independent, since it does not require any other operating system to be present. This makes it the loader of choice for Linux-only systems. LILO is a kernel boot loader. It can be used as your main boot manager for a system, because it is able to load linux, OS/2, win98, NT, and many other popular OSes. However, it does have one limitation that can be particularly aggravating: It can't boot
G-28
an OS from a partition that is located beyond the 1024th cylinder of the drive. For people with large drives, this can be a problem. The most likely cause of this problem is trying to install linux on the same HD as another OS. If you created your linux partition towards the end of a large drive, Linux won't boot. Related Link(s): http://www.control-escape.com/bootload.html
Linux
Linux is an operating system that was initially created as a hobby by a young student, Linus Torvalds, at the University of Helsinki in Finland. Linus had an interest in Minix, a small UNIX system, and decided to develop a system that exceeded the Minix standards. Version 1.0 of the Linux Kernel was released in 1994. The current full-featured version is 2.2 (released January 25, 1999), and development continues. Linux is developed under the GNU General Public License and its source code is freely available to everyone. This however, doesn't mean that Linux and it's assorted distributions are free -companies and developers may charge money for it as long as the source code remains available. Linux may be used for a wide variety of purposes including networking, software development, and as an end-user platform. Linux is often considered an excellent, low-cost alternative to other more expensive operating systems.
G-29
The central nervous system of Linux is the kernel, the operating system code which runs the whole computer. See kernel. Related Link(s): http://www.linux.org/info/index.html http://hudsucker.easystreet.com/
Linux kernel
The Linux kernel itself is a single monolithic binary. This improves performance since there are no context switches needed for operating system functions or I/O requests. There is, however, modularity built into the Linux kernel, and the kernel can load (unload) modules into itself either at run time, or when the module becomes needed. Modules run in a privileged kernel mode on the system and (like the kernel) have full access to system hardware. Modules are useful to allow binary additions to the kernel (if a piece of hardware is proprietary, a driver may be written to allow the use of this hardware to be included as a module in the kernel without violating the GNU License). Also, if one is writing a driver it can be useful to be able to load and unload the driver without having to reboot the system for testing purposes. See also kernel. Related Link(s): http://web1.linuxhq.com/guides/LKMPG/mpg.html http://web1.linuxhq.com/guides/NAG/node43.html
Linux Virtual Servers

The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer
running on the Linux operating system. The architecture of the cluster is transparent to end users. End users only see a single virtual server. See also Virtual Server. Related Link(s): http://www.linuxvirtualserver.org/
load balancing
Distributing processing and communications activity evenly across a computer network so that no single device is overwhelmed is known as load balancing. Load balancing is especially important for networks where it's difficult to predict the number of requests that will be issued to a server. Busy Web sites typically employ two or more Web servers in a load balancing scheme. If one server starts to get swamped, requests are forwarded to another server with more capacity. Network load balancing serves to balance incoming IP traffic among multi-node clusters.
M
masquerading
IP Masquerading is a form of TCP/IP network address translation, or NAT. See also NAT.
mount
The mount command attaches a named filesystem to the file system hierarchy at the pathname location directory, which must
already exist. If a directory has any contents prior to the mount operation, these remain hidden until the filesystem is once again unmounted. If a filesystem is of the form host:pathname, it is assumed to be an NFS file system (type nfs). The umount command unmounts a currently mounted file system, which can be specified either as a directory or a filesystem.
mount and umount maintain a table of mounted file systems in
/etc/mtab, described in fstab(5). If invoked without an argument, mount displays the contents of this table. If invoked with either a filesystem or directory only, mount searches the file /etc/fstab for a matching entry, and mounts the file system indicated in that entry on the indicated directory.
mount also allows the creation of new, virtual file systems using
loopback mounts. Loopback file systems provide access to existing files using alternate pathnames. Once a virtual file system is created, other file systems can be mounted within it without affecting the original file system. File systems that are subsequently mounted onto the original file system, however, are visible to the virtual file system, unless or until the corresponding mount point in the virtual file system is covered by a file system mounted there. Related Link(s): http://linuxnewbies.editthispage.com/tips/20000118 http://anguilla.u.arizona.edu/doc_link/en_US/a_doc_lib/cmds/ aixcmds3/mount.htm http://uw7doc.sco.com/NET_nfs/nfsT.mount_cmd.html
G-32
MySQL
MySQL is an Open Source relational database management system. The SQL part of MySQL stands for Structured Query Language, the most common standardized language used to access databases. MySQL is also a client/server system that consists of a multithreaded SQL server that supports different back ends, several different client programs and libraries, administrative tools and a programming interface. Related Link(s): http://www.mysql.com/
N
name server
A name server (also called domain server or DNS server) is a computer which knows how to turn a human-readable FQDN into a machine-readable IP address such as 111.222.333.4444. It uses the host name obtained from the IP address or the inverse IP address search to find host names using a distributed database function.
NAS
NAS (Network-Attached Storage) is disk storage that is set up with its own network address rather than being attached to a computer that is serving applications to a network's workstation users. By removing storage access and its management from the department server, both application programming and files can
G-33
be served faster because they are not competing for the same processor resources. The network-attached storage device is attached to a local area network (typically, an Ethernet network) and assigned an Internet Protocol (IP) address. File requests are mapped by the main server to the NAS file server. Network-attached storage consists of hard disk storage, including multi-disk RAID systems, and software for configuring and mapping file locations to the network-attached device. Network-attached storage can be a step toward and included as part of a more sophisticated storage system known as a storage-area network (SAN). NAS software can usually handle a number of network protocols, including Microsoft's IPX and NetBEUI, Novell's Netware IPX, and Sun Microsystems' NFS. Configuration, including the setting of user access priorities, is usually possible using a Web browser.
NAT
NAT (Network Address Translation) is the translation of an Internet Protocol address (IP address) used within one network to a different IP address known within another network. One network is designated the inside network and the other is the outside. Typically, a company maps its local inside network addresses to one or more global outside IP addresses and unmaps the global IP addresses on incoming packets back into local IP addresses. This helps ensure security since each outgoing or incoming request must go through a translation process that also offers the opportunity to qualify or authenticate the request or match it to a previous request. NAT also conserves on the number of global IP addresses that a company needs and it lets the
G-34
company use a single IP address in its communication with the world. NAT is included as part of a router and is often part of a corporate firewall. Network administrators create a NAT table that does the global-to-local and local-to-global IP address mapping. NAT can also be used in conjunction with policy routing. NAT can be statically defined or it can be set up to dynamically translate from and to a pool of IP addresses. Cisco's version of NAT lets an administrator create tables that map: A local IP address to one global IP address statically A local IP address to any of a rotating pool of global IP addresses that a company may have A local IP address plus a particular TCP port to a global IP address or one in a pool of them A global IP address to any of a pool of local IP addresses on a round-robin basis NAT is described in general terms in RFC 1631. NAT reduces the need for a large amount of publicly known IP addresses by creating a separation between publicly known and privately known IP addresses. Related Link(s): http://www.cis.ohio-state.edu/rfc/rfc1631.txt
Network Interface Card

A network interface card (NIC) is a computer circuit board or card that is installed in a computer so that it can be connected to a network. Personal computers and workstations on local area networks (LANs) typically contain a network interface card specifically designed for the LAN transmission technology, such as
G-35
Ethernet or Token Ring. Network interface cards provide a dedicated, full-time connection to a network.
NFS (Network File System)

A protocol developed by Sun Microsystems which allows a computer to access files over a network as if they were on its local disks. Unix systems typically use NFS to share files between one another. NFS is stateless, which means that an NFS server can be rebooted without the clients necessarily losing data.
NIC
See Network Interface Card.
NIS (Network Information Service)

Sun Microsystems' client-server protocol for distributing system configuration data such as user names and passwords and computer names. NIS is somewhat like the NetWare Network Directory Service (NDS), Microsoft Domains or the Lightweight Directory Access Protocol (LDAP) and there are gateways available between NIS and all of these directory schemes.
node
In a network, a node is a connection point, either a redistribution point or an end point for data transmissions. In general, a node has programmed or engineered capability to recognize and process or forward transmissions to other nodes. In clustering, each system within the cluster is often referred to as a node, a cluster node, or a server node. See also cluster node.
G-36
NTP
The Network Time Protocol (NTP) is a family of programs that are used to adjust the system clock on your computer and keep it synchronized with external sources of time. Time data is requested from outside sources (radio clock, network timeservers) and delivered to clients within your domain. It is designed to provide accuracy in the microsecond to millisecond range with hardware available in the mid 1990s.
O
OpenLDAP
OpenLDAP is an open source implementation of the Lightweight Directory Access Protocol (LDAP). The suite includes: slapd - stand-alone LDAP server slurpd - stand-alone LDAP replication server Libraries implementing the LDAP protocol Utilities, tools, and sample clients.
Open Source
Open Source is a certification mark owned by the Open Source Initiative (OSI). Developers of software that is intended to be freely shared and possibly improved and redistributed by others can use the Open Source trademark if their distribution terms conform to the OSI's Open Source Definition. To summarize, the Definition model of distribution terms require that:
G-37
The software being distributed must be redistributed to anyone else without any restriction The source code must be made available (so that the receiving party will be able to improve or modify it) The license can require improved versions of the software to carry a different name or version from the original software
P
packet
A packet is the unit of data that is routed between an origin and a destination on the Internet or any other packet-switched network. Packet and datagram are similar in meaning. A protocol similar to TCP, the User Datagram Protocol (UDP), uses the term datagram. Related Link(s): http://www.whatis.com/packet.htm
patch
A patch is file that collects changes to other files. It is often used to make small modifications to a source code tree. The Linux kernel is available in patches that allow you to upgrade from one version to the next.
PCMCIA (Personal Computer Memory Card International Association)

An international trade association and the standards they have developed for devices that can be plugged into notebook computers such as modems and external hard disk drives. A PCMCIA card is about the size of a credit card. Since 1995 these
cards have also been known as PC Cards. It is possible to plug PCMCIA cards into desktop computers, communication racks and other equipment when fitted with a suitable chassis. TurboLinux automatically detects PCMCIA cards as they are swapped in and out of a running system.
persistency
Within the context of clustering, persistency allows a client to always connect to the same server within the cluster. While it usually does not matter which server a client access within the cluster, some application services maintain state on the server. In order for the client to be able to access the cluster multiple times, the cluster must maintain that state between connections. The way this is done is by flagging the particular service and ensuring that clients will always connect with the same sever in the cluster.
ping
The ping program is a basic Internet utility that lets you verify that a particular Internet address exists and can accept requests. Ping is used diagnostically to ensure that a host computer you are trying to reach is actually operating. If, for example, a user can't ping a host, then the user will be unable to use the File Transfer Protocol (FTP) to send files to that host. Ping can also be used with a host that is operating to see how long it takes to get a response back. Using ping, you can learn the number form of the IP address from the symbolic domain name.
G-39
Loosely, ping means to get the attention of or to check for the presence of another party online. Ping operates by sending a packet to a designated address and waiting for a response. Related Link(s): http://www.FreeSoft.org/CIE/Topics/53.htm
PostgreSQL
PostgreSQL is a sophisticated Object-Relational DBMS, supporting almost all SQL constructs, including subselects, transactions, and user-defined types and functions. It is the most advanced open-source database available anywhere. Related Link(s): http://www.postgresql.org/
PPP (Point to Point Protocol)

The Internet standard for transmitting network layer datagrams (e.g. IP packets) over serial point-to-point links. If you are using a modem then PPP is the only protocol you should consider. Once PPP is running the only difference between it and a TCP/IP link over ethernet is speed. TurboPPPCfg allows simple configuration of PPP modem links.
PPTP
A protocol or set of communication rules called Point-to-Point Tunneling Protocol (PPTP) has been proposed that would make it possible to create a virtual private network through tunnels over the Internet. This would mean that companies would no longer need their own leased lines for wide-area communication but could securely use the public networks. See also Tunneling. PPTP, sponsored by Microsoft and other companies, and Layer 2 Forwarding, proposed by Cisco Systems, are among the main
proposals for a new Internet Engineering Task Force (IETF) standard. With PPTP, which is an extension of the Internet's Pointto-Point Protocol (PPP), any user of a PC with PPP client support will be able to use an independent service provider (ISP) to connect securely to a server elsewhere in the user's company. Related Link(s): http://www.protocols.com//pbook/ppp2.htm#PPTP
primary ATM
The ATM that is currently in charge of routing traffic. See Advanced Traffic Manager and backup ATM.
protocol
A set of formal rules describing how to transmit data, especially across a network. Low level protocols define the electrical and physical standards to be observed, bit- and byte-ordering and the transmission and error detection and correction of the bit stream. High level protocols deal with the data formatting, including the syntax of messages, the terminal to computer dialogue, character sets, sequencing of messages etc. Many protocols are defined by RFCs or by OSI. See also Handshaking. Related Link(s): http://www.protocols.com/pbook/index.htm http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?protocol
G-41
protocol layering
Protocol layering is a common technique used to simplify networking designs by dividing them into functional layers, and assigning protocols to perform each layer's task. For example, it is common to separate the functions of data delivery and connection management into separate layers, and therefore separate protocols. Protocol layering produces simple protocols, each with a few well-defined tasks. These protocols can then be assembled into a useful whole. Individual protocols can also be removed or replaced as needed for particular applications.
proxy
An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them, with possible translation, on to other servers. A proxy must interpret and, if necessary, rewrite a request message before forwarding it. Proxies are often used as client-side portals through network firewalls and as helper applications for handling requests via protocols not implemented by the user agent.
R
RADIUS
RADIUS (Remote Authentication Dial-In User Service) is a client/server protocol and software that enables remote access servers to communicate with a central server to authenticate dialin users and authorize their access to the requested system or
G-42
service. RADIUS allows a company to maintain user profiles in a central database that all remote servers can share. It provides better security, allowing a company to set up a policy that can be applied at a single administered network point. Having a central service also means that it's easier to track usage for billing and for keeping network statistics. Created by Livingston (now owned by Lucent), RADIUS is a de facto industry standard used by Ascend and other network product companies and is a proposed IETF standard. Related Link(s): http://www.livingston.com/marketing/whitepapers/ radius_paper.html
RAID
RAID (redundant array of independent disks) is a way of storing the same data in different places (thus, redundantly) on multiple hard disks. By placing data on multiple disks, I/O operations can overlap in a balanced way, improving performance. Since multiple disks increases the mean time between failure (MTBF), storing data redundantly also increases fault-tolerance. A RAID appears to the operating system to be a single logical hard disk. RAID employs the technique of striping, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order. In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all
G-43
disks and can be accessed quickly by reading all disks at the same time. In a multiuser system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives. There are several levels and types of RAID. Related Link(s): http://www.whatis.com/raid.htm http://www.uni-mainz.de/~neuffer/scsi/what_is_raid.html http://linas.org/linux/raid.html http://linas.org/linux/Software-RAID/Software-RAID.html
rcp
rcp stands for remote copy and allows you to transfer files to and from another system over the network. It works like a copy command, where you specify a source and a destination, except that the source or destination of the copy can be another system. Unlike FTP, it is totally non-interactive and does not require you to log in or specify a password for the other system. It can also copy multiple files and recursively copy entire directory trees. The other system must be running a remote shell daemon (rshd) that supports rcp.
rexecd
rexec stands for remote exec daemon. It is a program that services the rexec command and originated on Unix systems. It listens for connections coming from an rexec command (over TCP/IP) and when it receives a connection, it validates access, then executes the specified program. Unlike the remote shell
G-44
daemon, the rexec daemon requires that the client specify a valid password before access is granted, so it is more secure than rshd.
RFC
The Internet Request For Comments (RFC) documents are the written definitions of the protocols and policies of the Internet. You can view the RFC documents at one of the following web sites. Related Link(s): http://www.cis.ohio-state.edu/hypertext/information/rfc.html http://www.ietf.org/rfc.html http://www.rfc-editor.org
router
A router is a device that chooses different paths for network packets, based on the addressing of the IP frame it is handling. Different routes connect to different networks. The router will have more than one address as each route is part of a different network. A router creates or maintains a table of the available routes and their conditions and uses this information along with distance and cost algorithms to determine the best route for a given packet. Typically, a packet may travel through a number of network points with routers before arriving at its destination. A router does not propagate Ethernet broadcasts, because the router is a Network Level device, and Ethernet is a Data Link Level protocol. Therefore, an Internet host must use its routing protocols to select an appropriate router, that can be reached via Ethernet ARPs (Address Resolution Protocol). After ARPing for the IP address of the router, the packet (targeted at some other
G-45
Destination Address) is transmitted to the Ethernet address of the router.
routing
Routing is a method of path selection. Routing assumes that addresses have been assigned to facilitate data delivery. In particular, routing assumes that addresses convey at least partial information about where an Internet host is located. This permits routers to forward packets without having to rely either on broadcasting or a complete listing of all possible destinations. At the IP level, routing is used almost exclusively, primarily because the Internet was designed to construct large networks in which heavy broadcasting or huge routing tables are infeasible. Routing can be static or dynamic. Static routing is performed using a pre-configured routing table which remains in effect indefinitely, unless it is changed manually by the user. This is the most basic form of routing, and it usually requires that all machines have statically configured addresses, and definitely requires that all machines remain on their respective networks. Otherwise, the user must manually alter the routing tables on one or more machines to reflect the change in network topology or addressing. Usually at least one static entry exists for the network interface, and is normally created automatically when the interface is configured. Dynamic routing uses special routing information protocols to automatically update the routing table with routes known by peer routers. These protocols are grouped according to whether they are Interior Gateway Protocols (IGPs) or Exterior Gateway Protocols. Interior gateway protocols are used to distribute
routing information inside of an Autonomous System (AS). An AS is a set of routers inside the domain administered by one authority. See RFC 1716 for more information on IP router operations. Related Link(s): http://www.ietf.org/rfc/rfc1716.txt?number=1716
RPM
RPM is the RPM Package Manager. It is an open packaging system available for anyone to use. It allows users to take source code for new software and package it into source and binary form such that binaries can be easily installed and tracked and source can be rebuilt easily. It also maintains a database of all packages and their files that can be used for verifying packages and querying for information about files and/or packages. RPM is quite flexible and easy to use, though it provides the base for a very extensive system. It is also completely open and available, though we would appreciate bug reports and fixes. Permission is granted to use and distribute RPM royalty free under the GPL. Related Link(s): http://www.rpm.org/
rsh
rsh stands for remote shell and allows you to execute a noninteractive program on another system. The remote program's standard output and standard error output will be shown on your screen. The other system must be running a remote shell daemon (rshd) to handle the incoming rsh command. The rsh command does not require you to enter a password for the other system.
rshd
rshd stands for remote shell daemon. It is a program that services the rsh command and originated on Unix systems. It listens for connections coming from an rsh command (over TCP/IP) and when it receives a connection, it validates access, then executes the specified program. The remote shell daemon also handles servicing the rcp command. The remote shell daemon does not require the client to supply a password; it grants or denies access based on host equivalence; that is, a user on one system is equivalent to a user on another system and no password is necessary. Because of this, the remote shell daemon should only be used on networks where users are generally trusted and convenience is more important than security.
S
Samba
Samba is an open source software suite that provides seamless file and print services to Server Message Block (SMB)/CIFS clients. Samba is freely available under the GNU General Public License. The source code is available to the public with versions available for free UNIX ports such as Linux and FreeBSD as well as for commercial ports such for Solaris and HP-UX UNIX systems. The Samba suite of programs gives a TurboLinux system the ability to speak the Server Message Block (SMB) protocol. SMB is the protocol used to implement file sharing and printer services between computers running OS/2, Windows NT, Windows 95, and Windows for Workgroups. Recent benchmark tests results
G-48
show Samba substantially outperforms standard Windows NT based systems. The stable, most recent version as of March, 1999, is 2.0.3. Related Link(s): http://us4.samba.org/samba/samba.html
SAN
See Storage Area Network.
SCSI (Small Computer Systems Interface)

SCSI stands for Small Computer Systems Interface. It's a standard for connecting peripherals to your computer via a standard hardware interface, which uses standard SCSI commands. The SCSI standard can be divided into SCSI (SCSI1) and SCSI2 (SCSI wide and SCSI wide and fast) and now SCSI-3 which is made up of at least 14 separate standards documents. SCSI2 is the most popular version of the SCSI command specification and allows for scanners, hard disk drives, CD-ROM players, tapes and many other devices. SCSI-3 resolves many long time gray areas and adds much new functionality and performance improvements. It also adds new types of SCSI busses like fibre channel which uses a 4 pin copper connection or a pair of glass fibre optic cables instead of the familiar ribbon cable connection. SCSI can connect up to seven devices to a single controller (or host adaptor) on the computer's bus, using a cable that goes from one device to the next. The cable can be up to 6 meters
G-49
long. A common problem with SCSI hardware is incorrect termination. Related Link(s): http://www.scsifaq.org/scsifaq.html
server
In general, a server is a computer program that provides services to other computer programs in the same or other computers. The computer that a server program runs in is also frequently referred to as a server (though it may contain a number of server and client programs). In the client/server programming model, a server is a program that awaits and fulfills requests from client programs in the same or other computers. A given application in a computer may function as a client with requests for services from other programs and a server of requests from other programs. Specific to the Web, a Web server is the computer program (housed in a computer) that serves requested HTML pages or files.
shared storage
Within the context of clustering, shared storage means shared resources; RAID (redundant array of independent disks) is, for example, an example of shared storage. Shared-storage clustering allows for a shorter failover time--on the order of 5 to 15 seconds. The primary disadvantage of sharedstorage clustering is that the two computers must be physically next to each other. They must be within the maximum distance allowed by SCSI.
G-50
shell
A shell is the outermost part of an operating system that interacts with user commands. The shell is the layer of programming that understands and executes the commands a user enters. In some systems, the shell is called a command interpreter. A shell usually implies an interface with a command syntax (think of the DOS operating system and its C:\> prompts and user commands such as dir and edit). As the outer layer of an operating system, a shell can be contrasted with the kernel, the operating system's inmost layer or core of services. Kernel and shell are terms used more frequently in UNIX. All shells provide for piping and redirection of information 'streams' as well as 'glob' expansions (file wildcards) and running a utility program (command). Each shell also has its own syntax. You can type echo $SHELL to find out what shell youre using. The shells that are normally available for Linux/Unix are: bash Bourne' again shell, the most frequent shell on linux. It makes a history of the commands and enables you to edit them. Berkeley C-shell; it does not enable you to edit the command line. Korn shell (improved Bourne); It is a well-known shell under UNIX(tm) systems. Almquist 'lite' shell
csh ksh
ash
G-51
zsh tcsh
Z-shell (kitchen sink version) is the most recent. An improved version of C-shell
Related Link(s): http://www.linux.com/tuneup/database.phtml/Shells/ http://developer.ecorp.net/Unix_and_Linux/Shell_Scripts/
single point of failure

A single point of failure is a single element of hardware or software, which, if it fails, brings down the entire computer system. Single points of failure are to be avoided at all costs when high availability is desired. See also high availability.
SMB (Server Message Block)

The Server Message Block protocol was invented by Xerox, worked on by 3Com, and eventually taken over by Microsoft, and is now best known as Microsoft Windows Networking. SMB is a protocol for sharing files, printers, serial ports, and communications abstractions such as named pipes and mail slots between computers. SMB is a client server, request-response protocol, the native file-sharing protocol in the Microsoft Windows 95, Windows NT, and OS/2 operating systems. CIFS is an enhanced version of SMB. TurboLinux includes client (smbmount and smbfs) and server (samba) implementations of SMB which allow TurboLinux to participate as a first-class Windows citizen. The Microsoft Windows machines never know the difference as TurboLinux machines access and share files and printers using these protocols. The
TurboFSCfg tool allows smbmount connections to be made in this way. The link below takes you to a site with more information about CIFS, since CIFS is an enhanced version of SMB. Related Link(s): http://msdn.microsoft.com/workshop/networking/cifs/default.asp
SMP
Symmetric Multiprocessors (SMP) provides fast performance by making multiple CPUs available to complete individual processes simultaneously (multiprocessing). Unlike asymmetrical processing, any idle processor can be assigned any task, and additional CPUs can be added to improve performance and handle increased loads. A variety of specialized operating systems and hardware arrangements are available to support SMP. Specific applications can benefit from SMP if the code allows multithreading. SMP uses a single operating system and shares common memory and disk input/output resources. UNIX, Linux, and Windows NT support SMP. Related Link(s): http://www.linuxdoc.org/HOWTO/SMP-HOWTO.html
SpeedLink
The heart of TurboLinux Cluster Server. It wedges into the kernels TCP/IP stack, and looks at every packet coming into the system and determines if the packet is destined for the cluster. If the destination IP address is the same as the virtual IP address of
G-53
the cluster and the port number is one that the cluster has registered, then the packet is immediately forwarded to one of the cluster nodes. The module maintains several tables that it uses to make the determination of which packets to send to which cluster node. Most of these tables can be accessed via the /proc/net/cluster directory. The speed of Cluster Server is due to the low level at which SpeedLink intercepts incoming packets.
Squid
Squid is a proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests. Squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be hierarchically linked to other Squid-based proxy servers for streamlined caching of pages. Squid consists of a main server program squid, a Domain Name System (DNS) lookup program (dnsserver), some optional programs for rewriting requests and performing authentication, and some management and client tools. Related Link(s): http://www.squid-cache.org/Doc/FAQ/FAQ-1.html#ss1.1
SSH
SSH (Secure Shell) is a program to log into another computer over a network, to execute commands in a remote machine, and to move files from one machine to another. It provides strong
G-54
authentication and secure communications over insecure channels. It is intended as a replacement for rlogin, rsh, and rcp. SSH protects the user from illicit network snooping (packet sniffing), whereby unencrypted passwords and text can be read by unscrupulous persons. SSH is most useful for logging into a UNIX computer from an Windows computer or from another UNIX computer, where the traditional 'telnet' and 'rlogin' programs would not provide password and session encryption. SSH contains a suite of three utilities: slogin, ssh, and scp. These utilities are secure versions of the earlier UNIX utilities, rlogin, rsh, and rcp. OpenSSH is a FREE version of the SSH suite of network connectivity tools and also provides a myriad of secure tunnelling capabilities. Related Link(s): http://www.ssh.org/index.html http://www.openssh.com/
SSL
Secure Sockets Layer (SSL) protocol is a security protocol that provides communications privacy over the Internet. The SSL program layer was created by Netscape for managing the security of message transmissions in a network. The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. The Secure Sockets Layer protocol layer may be placed between a reliable connection-oriented network layer protocol (e.g.
G-55
TCP/IP) and the application protocol layer (e.g. HTTP). SSL provides for secure communication between client and server by allowing mutual authentication, the use of digital signatures for integrity, and encryption for privacy. The protocol is designed to support a range of choices for specific algorithms used for cryptography, digests, and signatures. This allows algorithm selection for specific servers to be made based on legal, export or other concerns, and also enables the protocol to take advantage of new algorithms. Choices are negotiated between client and server at the start of establishing a protocol session. Related Link(s): http://www.freesoft.org/CIE/Topics/121.htm http://sitesearch.netscape.com/products/security/technology/ index.html http://www.modssl.org/
stateless
An application or service that does not maintain state. That is, it can process requests individually, without any dependence on previous requests. TurboLinux Cluster Server works best with stateless services. If the service is not stateless, you should set the persistence (sticky) flag so that client requests from a single client will always be routed to the same server. Stateless and stateful are adjectives that describe whether a computer or computer program is designed to note and remember one or more preceding events in a given sequence of interactions with a user, another computer or program, a device, or other outside element.
G-56
Stateful means the computer or program keeps track of the state of interaction, usually by setting values in a storage field designated for that purpose. Stateless means there is no record of previous interactions and each interaction request has to be handled based entirely on information that comes with it. Stateful and stateless are derived from the usage of state as a set of conditions at a moment in time.
Storage Area Network (SAN)

A Storage Area Network (SAN) is a high-speed network (or subnetwork) that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users. Typically, a storage area network is part of the overall network of computing resources for an enterprise. A storage area network is usually clustered in close proximity to other computing resources such as IBM S/390 mainframes, but may also extend to remote locations for backup and archival storage, using wide area network carrier technologies such as ATM (asynchronous transfer mode) or SONET (Synchronous Optical Network). SAN is based on a systematic approach to data storage management pioneered by IBM in the S/390 environment almost 30 years ago. Now SANs are rapidly being integrated into distributed network environments using Fibre Channel technology. SANs support disk mirroring, backup and restore, archival and retrieval of archived data, data migration from one storage device to another, and the sharing of data among different servers in a
G-57
network. SANs can incorporate subnetworks with networkattached storage (NAS) systems. Related Link(s): http://www.storage.ibm.com/ibmsan/basics.htm
superuser
A superuser has advanced privileges on a system and access to anything any other system user has. Typically, the system administrator has superuser privileges and can create new accounts, change passwords, and perform other administrative tasks.
T
TCP
TCP (Transmission Control Protocol) is a method (protocol) used along with the Internet Protocol (IP) to send data in the form of message units between computers over the Internet. While IP takes care of handling the actual delivery of the data, TCP takes care of keeping track of the individual units of data (called packets) that a message is divided into for efficient routing through the Internet. Related Link(s): http://www.whatis.com/tcp.htm
TCP/IP
TCP/IP is a suite of protocols that defines the format of data packets sent across the Internet, and is the communications standard for data transmission between different platforms.
G-58
The TCP/IP family consists of many protocols on different layers. The lowest level is the IP protocol, which establishes the means by which hosts can contact each other. Above IP are the UDP and TCP protocols. UDP allows for connectionless communication. TCP creates a connection between two systems. Higher level application protocols include email (SMTP, POP, IMAP), file transfer (FTP), remote login (Telnet) and web access (HTTP).
TFTP
Trivial File Transfer Protocol (TFTP) is a very simple protocol used to transfer files. Each non-terminal packet is acknowledged separately. It has been implemented on top of the Internet User Datagram protocol (UDP), so it may be used to move files between machines on different networks implementing UDP. TFTP can also be implemented on top of other datagram protocols. It is designed to be small and easy to implement. Therefore, it lacks most of the features of a regular FTP. The only thing it can do is read and write files (or mail) from/to a remote server. It cannot list directories, and currently has no provisions for user authentication. In common with other Internet protocols, it passes 8 bit bytes of data. Related Link(s): http://www.ietf.org/rfc/rfc1350.txt?number=1350
TCP wrappers
TCP wrappers are used to restrict access to network services. They handle the access management controls for running the target
G-59
server programs. TCP wrappers are implemented by inserting the tcpd program in the inetd.conf file for each service for which you want to enable access restrictions. When tcpd starts up, it reads the service permission file /etc/hosts.allow and the deny-permission file /etc/hosts.deny. Related Link(s): ftp://ftp.porcupine.org/pub/security/tcp_wrappers_7.6.BLURB
tunneling, tunnel
Relative to the Internet, tunneling is using the Internet as part of a private secure network. The "tunnel" is the particular path that a given company message or file might travel through the Internet. A tunnel is also an intermediary program which acts as a blind relay between two connections. Once active, a tunnel is not considered a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request. The tunnel ceases to exist when both ends of the relayed connections are closed. Tunnels are used when a portal is necessary and the intermediary cannot, or should not, interpret the relayed communication. Related Link(s): http://linas.org/linux/iptunnel.html
U
UID
The UID is the user ID number or username of the person who owns a process.
G-60
UNIX
An interactive, time-sharing operating system invented in 1969 by Ken Thompson after Bell Labs left the Multics project, originally so he could play games on his scavenged computer. Dennis Ritchie, the inventor of C, is considered a coauthor of the system. The turning point in Unix's history came when it was reimplemented almost entirely in C during 1972 - 1974, making it the first source-portable OS. Unix subsequently underwent mutations and expansions at the hands of many different people, resulting in a uniquely flexible and developer-friendly environment. By 1991, Unix had become the most widely used multi-user general-purpose operating system in the world. Unix is now offered by many manufacturers and is the subject of an international standardization effort with the Unix trademark being owned by X/Open. Unix or Unix-like operating systems include OSF, Version 7, BSD, USG Unix, Xenix, Ultrix, Linux, and GNU. "UNIX" is a trademark, and it is a name and not an acronym. "Unix" is often used interchangeably. Since the OS is case-sensitive and exists in many different versions, it is fitting that its name should reflect this. Related Link(s): http://www.msoe.edu/~taylor/4ltrwrd/Uoverview.html http://wwwhost.cc.utexas.edu/cc/services/unix/index.html
G-61
V
virtual IP address
The IP address of a cluster. It is virtual because it represents a logical entity instead of a physical node.
virtual server
Another name for a cluster. It acts as one server, but is really made up of several cluster nodes working as one.
VPN
Virtual Private Networks (VPN) typically use the Internet as the transport backbone to establish secure links with business partners, extend communications to regional and isolated offices, and significantly decrease the cost of communications for an increasingly mobile workforce. VPNs serve as private network overlays on public IP network infrastructures such as the Internet. Related Link(s): http://www.wolfenet.com/~jhardin/ip_masq_vpn.html http://www.internetwk.com/VPN/default.html
W
WAN
A Wide Area Network (WAN) is used to connect together LANs that are geographically placed far apart. There are many
G-62
technologies capable of performing this task; ATM (asynchronous transfer mode) is one of these.
web server
The web server is a program offered in a hypertext environment. The program that serves requested HTML pages or files. A Web client is the requesting program associated with the user. The Web browser in your computer is a client that requests HTML files from Web servers. A web server is also known as the WWW (World Wide Web) server and the HTTP server. In the case of HTTP, the name derives from the home pages or web pages that files are written in the HTML language; the protocol used to create and edit files. If you want to obtain web pages, the web server program is an essential program for the server side.
weighting
Method by which you can assign more work to one system than another. For example, suppose system A is assigned a weight of 1, system B has a weight of 2, and system C has a weight of 3. The total of all the weights is 6. So system A would get 1/6 of the work, system B would get 2/6, and system C would get 3/6 of the work.
G-63
G-64
INDEX
Symbols
/etc/clusterserver 6-2, 8-12, 8-14 /etc/hosts 4-9 /etc/hosts.allow 1-10, 4-30, 7-28 /etc/hosts.deny 1-10, 4-30 /etc/services 4-14, 8-8 /proc 5-5, 7-22, 8-15 /proc/net/cluster 7-14, 7-16, 7-18, 7-22, 7-29, 8-3, 8-4, 8-7, 8-15, 8-16 /var/log 7-18
7-2, 7-3, 7-10, 7-12, 7-13, 7-16, 7-18, 7-20, 7-22, 7-24, 7-25, 7-29, 7-30, 8-7, 8-9, 8-10, 8-14, 8-16 AtmPool 6-10, 7-4 Availability 1-4, 1-17, 2-10, 2-11, 4-2
B
Backup 1-16, 3-1 Backup ATM 1-14, 3-2, 4-3, 4-4, 4-18, 4-23, 4-25, 4-34, 6-11, 7-20, 7-21, 7-29, 7-30, 8-7, 8-8, 8-15 Bandwidth 1-16, 2-19 Beowulf 1-3, 2-6, 2-9 Broadcast 4-25, 4-26, 5-4, 5-10, 6-11, 7-2, 7-20, 8-7, 8-8 Bugs Reporting 7-14
A
Access Restricting 1-10, 4-30, 4-31, 6-3, 8-8 AddAtm 6-10 AddAtmPool 6-12 AddServer 1-12, 6-1, 6-9 AddServerPool 6-12 Administrator 1-3, 6-12, 8-4 Advanced Traffic Manager (ATM) 2-7, 4-23, 4-24, 4-25 AFS 2-18, 2-19 Agent 1-4, 4-10, 4-12 Alias 5-3, 5-4, 5-16, 7-15, 7-18, 7-20, 7-21, 7-22, 7-30, 8-4, 8-7 AllowHost 6-3 Andrew File System (AFS) 2-18 Apache 1-15, 6-2 Application 1-4, 1-16, 2-7 Application Stability Agent (ASA) 1-8, 1-9, 4-10, 4-11, 4-12, 6-5, 6-6, 8-9, 8-10 ARP 4-25, 5-4, 5-5, 5-10, 5-16, 6-11, 7-21, 8-16 ASA 3-10, 4-10, 4-11, 4-12, 4-13, 4-14, 4-22, 6-5, 6-6, 7-2, 7-19, 7-21, 7-29, 8-7, 8-9, 8-10 ATM 1-14, 1-15, 1-17, 2-7, 2-13, 2-14, 4-2, 4-3, 4-4, 4-18, 4-19, 4-20, 4-21, 4-23, 4-24, 4-25, 4-28, 4-30, 4-31, 4-32, 4-33, 4-34, 4-36, 5-3, 6-4, 6-10, 6-11, 6-12,
C
ccNUMA 2-5 Certificate 3-11, 3-17, 6-2, 7-12, 7-13, 8-14 Check 6-5 CheckPortFrequency 6-9, 7-5, 8-9 CheckPortTimeout 6-9, 8-9 CheckServerFrequency 6-9, 7-5 CheckServerTimeout 6-9 Client 1-2, 1-4, 2-6, 2-12, 2-13, 2-14, 2-20, 4-15, 4-19, 4-20, 4-33, 4-34, 7-3, 7-18, 7-24, 7-29, 8-3 Client/server 2-6 Cluster Management Console (CMC) 1-6, 1-11, 1-12, 3-10, 7-12, 8-14 Cluster manager 1-4, 2-7, 2-8, 2-10, 2-12, 2-13, 2-16 Cluster node 1-4, 1-5, 1-9, 1-10, 1-14, 1-16, 2-6, 2-7, 2-8, 2-12, 2-13, 2-14, 2-15, 2-16, 3-2, 4-2, 4-3, 4-4, 4-9, 4-10, 4-16, 4-17, 4-19, 4-20, 4-24, 4-33, 4-34, 5-1, 5-3, 5-6, 5-9, 5-11, 5-14, 5-16, 6-6, 6-8, 6-12, 7-3, 7-6, 7-18, 7-21, 7-22, 7-24, 7-26, 7-29, 8-3, 8-7, 8-9, 8-12
I-1
clusterserver.conf 1-8, 1-12, 6-2, 7-14, 7-20 clusterserveradm 8-8 clusterserverd 1-8, 6-3, 7-15, 7-19, 7-25, 7-30, 8-6, 8-7, 8-8, 8-15, 8-16 clusterserverd.log 1-8, 6-12, 7-19, 7-20, 8-10 CMC 1-8, 1-11, 3-11, 3-17, 4-32, 6-2, 6-3, 7-12, 7-13, 7-15, 7-16, 7-19, 7-22, 7-28, 7-29, 7-30, 8-8, 8-11, 8-14, 8-15, 8-16 cmc 8-14, 8-16 cmc.log 7-19 cmcd 8-14, 8-16 Coda 2-18, 2-19 Configuration 1-9, 1-10, 1-11, 2-15, 2-16, 2-17, 3-2, 3-11, 4-1, 4-6, 4-7, 4-8, 4-9, 4-10, 4-16, 4-17, 4-19, 4-26, 4-36, 5-1, 5-2, 5-16, 6-1, 7-2, 7-6, 7-9, 7-10, 7-14, 7-16, 7-22, 7-28, 8-16 Configuration file 1-11, 1-12, 4-6, 4-9, 4-23, 4-37, 6-1, 6-2, 6-8, 6-11, 7-3, 7-5, 7-9, 7-11, 7-14, 7-16, 7-20, 7-25, 8-7, 8-9, 8-10, 8-15, 8-16 Connection 1-2, 2-14, 2-15, 4-15, 4-26, 4-33, 4-34, 6-2, 6-7, 6-10, 6-11, 7-3, 7-4, 7-16, 7-23, 7-24, 7-25, 7-27, 7-28, 7-29, 7-30, 8-3, 8-16 ConnectionTimeout 6-10, 6-11, 8-3
Direct forwarding 1-8, 2-13, 2-14, 4-18, 4-19, 4-20, 5-1, 5-10, 5-15, 5-16, 7-21, 8-7 Distributed file system 2-18, 2-19, 8-13 Distributed File System (DFS) 2-18 Distributed processing 2-3, 2-6 Distribution 1-6, 1-7, 1-15, 3-3, 3-5, 3-6, 3-16, 3-17, 5-3, 8-4 DNS 1-5, 1-17, 4-14 Documentation 1-11, 3-7, 4-6, 4-7, 7-12, 7-14, 7-28 Down 6-5, 6-6, 8-10
E
Email 1-2, 1-5, 7-14 EnFuzion 1-3, 2-7, 2-9, 2-10 Enterprise 1-2, 1-9, 2-20
F
Fail-over 1-2, 1-8, 1-9, 2-9, 2-10, 2-11, 4-13, 4-15, 6-7, 8-1, 8-10 Fibre-channel 2-19, 2-20, 2-21 Firewall 1-8, 2-14 Flexibility 1-7, 1-17, 4-1, 4-2, 5-1 FTP 1-2, 1-5, 2-15, 7-3, 7-4, 7-7
G
Gateway 6-4 Default 1-9, 2-15, 4-34, 4-36, 4-37, 5-1, 5-8, 5-10, 5-13, 5-15, 6-4 GFS 2-18, 2-19 Global File System (GFS) 2-19 Global settings 4-30, 4-31, 6-3
D
Daemon 1-11, 3-2, 3-10, 3-11, 4-32, 5-3, 6-3, 6-8, 6-11, 7-6, 7-14, 7-18, 7-19, 7-20, 7-21, 7-22, 7-30, 8-1, 8-6, 8-7, 8-8, 8-14, 8-15, 8-16 Database 1-5, 1-9, 2-17, 2-18, 2-20 DB2 1-9 Default gateway 1-9, 2-15, 4-34, 4-36, 4-37, 5-1, 5-8, 5-10, 5-13, 5-15, 6-4 DenyHost 6-3 DFS 2-18, 2-19 DHCP 1-17, 4-37
H
Hardware 1-15, 1-16, 2-2, 2-11, 2-19, 3-17, 4-3, 8-13 Heartbeat 4-25, 4-26, 6-11, 7-2, 8-7, 8-8 HeartbeatDelay 6-10, 6-11 Heterogeneous 2-2 High availability 1-2, 1-17, 2-9, 2-10, 2-11, 8-1 Homogeneous 2-2 hosts.allow 1-10, 4-30, 7-28
I-2
hosts.deny 1-10, 4-30 HTTP 1-5, 4-12, 4-14, 4-15, 6-6, 7-3, 7-4, 7-7, 7-12, 7-24, 8-14 HTTPS 1-5, 4-15, 7-12
I
iBCS 3-9 ifconfig 5-4, 5-5, 5-6, 5-16, 7-15, 7-30 IMAP 1-5 insmod 5-6 Installation 1-7, 3-1, 3-2, 3-3, 3-5, 3-6, 3-7, 3-9, 3-11, 3-13, 3-14, 3-15, 3-16, 3-17 Intermezzo 2-18, 2-19 Internet 1-2, 1-8, 1-9, 1-16, 1-17, 2-14, 4-33 Internet Service Providers 1-2, 1-17 IP address 1-10, 1-17, 2-14, 2-15, 4-2, 4-8, 4-17, 4-24, 4-25, 4-27, 4-28, 4-31, 4-34, 4-36, 4-37, 5-1, 5-4, 5-8, 5-10, 5-13, 5-15, 5-16, 6-3, 6-4, 6-5, 6-6, 6-8, 6-10, 6-11, 612, 7-20, 7-21, 7-24, 7-25, 7-26, 8-7, 8-9 ip_cs 7-15, 7-20, 7-25, 8-2, 8-3, 8-4, 8-6, 8-7, 8-10 IP-IP 2-13, 4-20, 5-6, 7-21, 8-4 ipip 5-6
LDAP 1-5, 2-17, 2-18 Licensing 1-6, 1-12 Lightweight Directory Access Protocol (LDAP) 2-17 LILO 3-7, 3-12, 3-13, 3-14 Linux 1-3, 1-4, 1-6, 1-7, 1-14, 1-15, 2-4, 2-9, 2-13, 2-14, 2-18, 2-19, 3-3, 3-5, 3-15, 3-16, 3-17, 4-20, 5-1, 5-3, 5-4, 5-6, 8-2 Load balancing 1-2, 1-9, 2-9, 2-10, 2-11, 4-13, 4-15, 6-7, 8-1 localhost 4-9, 4-32, 8-8 Log file 1-11, 1-12, 1-15, 7-14, 7-16, 7-18, 7-19, 7-30, 8-6, 8-10, 8-15 Loopback interface 5-5, 5-16, 7-21, 7-30 lsmod 7-15, 8-6
M
MAC address 2-13, 5-4 MailTo 6-12 man page 1-11, 7-14, 8-11 Masquerading 2-14 MaxLostHeartbeats 6-10, 6-11 MPI 2-6 MPP 2-3, 2-5, 2-6
N
NAS 2-19, 2-20 NAT 1-8, 1-9, 1-15, 2-13, 2-14, 2-15, 4-18, 4-19, 4-20, 4-33, 4-34, 4-36, 4-37, 5-1, 5-10, 5-15, 5-16, 6-3, 6-4, 7-3, 7-21, 7-24, 7-25, 7-30, 8-2, 8-7 NAT gateway 2-15, 4-35, 4-36, 6-4, 7-21, 7-25 NAT subnet 4-33, 4-34, 4-35, 6-4, 7-24, 7-25 NetBEUI 5-12 Network Address Translation (NAT) 1-8, 2-14, 4-20, 4-33 Network Attached Storage (NAS) 2-19, 2-20 Network card 1-15, 4-33, 4-34, 5-4 NetworkMask 6-4 News 1-2, 1-5 NFS 1-16, 2-18, 3-2
J
Java 7-16
K
Kerberos 2-18 Kernel 1-4, 2-13, 2-19, 3-6, 3-7, 3-8, 3-9, 3-10, 3-12, 3-13, 3-14, 4-26, 4-34, 5-3, 5-5, 5-6, 7-2, 7-3, 7-14, 7-15, 7-19, 7-21, 7-22, 7-25, 8-1, 8-2, 8-4, 8-6 Compiling 8-4 Custom 3-17, 8-4 Kernel module 2-13, 8-2
L
LAN 1-17, 4-19
I-3
NNTP 1-5 noping 6-8, 7-21 NUMA 2-3, 2-4, 2-5 NumConnections 6-10, 6-11, 7-4 NumServers 6-10, 6-11, 7-4 NumServices 6-10, 6-11, 7-4
O
Operating system 1-6, 1-7, 1-14, 1-15, 1-16, 2-2, 2-4, 4-19 Oracle 1-9
REGEDIT 5-8, 5-13 Registration 1-13, 1-16 Registry 5-8, 5-9, 5-13, 5-14 Reliability 1-1, 1-2, 4-2 Requirements 1-14, 1-16, 7-28 RFC 1631 1-9, 2-14, 7-24 Round-robin 2-12, 8-3 Router 1-15, 1-17, 4-34, 4-36 rsync 8-13
S
SAN 2-19, 2-20, 2-21 Scalability 1-2, 2-19, 4-2 scp 8-12 SCSI 2-19, 2-21 Secure Shell 1-10, 7-28, 8-12 Security 1-6, 1-9, 1-10, 2-15, 2-18, 4-30, 4-31, 6-3, 8-8, 8-12, 8-14 SendArpDelay 6-10, 6-11 Serial number 0-ix, 1-13 Server 6-8 Server group 4-20, 4-21, 4-22, 4-29, 6-8 Server pool 4-20, 4-21, 4-22, 4-25, 4-29, 6-7, 6-8, 6-12 ServerPool 6-8, 7-5 Servers 6-8, 7-26 Service 1-5, 6-6 Services 4-10, 6-5, 6-6, 6-9 Shared data storage 2-17, 2-19 Shared memory 2-5 Single point of failure 1-4 SMP 2-3, 2-4 SMTP 1-5 Source code 8-4, 8-5 SpeedLink 6-12, 7-3, 7-19, 7-22, 7-23, 7-25, 8-2, 8-4, 8-5, 8-7, 8-16 SQL 2-18 SSH 1-10, 1-11, 7-6, 7-8, 7-9, 7-11, 7-16, 7-28, 7-29, 8-10, 8-12 sshd 7-28 SSL 2-12, 3-11, 3-17, 4-15, 6-2, 7-12, 7-19, 8-14 Sticky 2-12, 4-13, 6-6, 7-27 Storage Area Network (SAN) 2-19, 2-20 Subnet 1-16, 1-17, 2-13, 4-19, 4-31, 4-32, 4-33, 4-34, 5-4, 5-8, 5-9, 5-13,
P
Parallel processing 2-2, 2-3 Patch 8-2, 8-4, 8-5 PCMCIA 3-9 Performance 1-1, 1-2, 1-4, 1-9, 1-11, 1-15, 1-16, 1-17, 2-14, 2-15, 2-16, 2-21, 4-3, 4-22, 7-1, 7-2, 7-12, 8-14, 8-16 Persistency 2-12, 4-13, 4-15, 6-7, 7-27 Point-to-point 2-13, 2-14, 4-19 POP 1-5 Port number 1-12, 2-15, 4-12, 4-13, 4-14, 6-1, 6-5, 6-8, 7-24, 7-26, 7-27, 8-8, 8-9 Primary ATM 1-14, 3-2, 4-2, 4-3, 4-4, 4-18, 4-19, 4-23, 4-25, 4-26, 4-33, 4-34, 5-3, 5-4, 6-4, 6-11, 7-12, 7-15, 7-20, 7-21, 7-22, 7-26, 7-29, 7-30, 8-7, 8-8, 8-9, 8-15 Private network 1-15, 2-14 ps 7-15, 8-14 PVM 2-6
Q
Quality-of-service 1-2
R
RAID 1-2, 1-16 Reboot 3-2, 3-13, 3-14, 5-8, 5-9, 8-6 Red Hat Linux 1-6, 1-7, 1-14, 3-3, 3-16, 5-3 Redundancy 1-2, 1-4, 1-17, 2-10, 2-17 I-4 TurboLinux Cluster Server 6 User Guide
5-14, 6-3, 6-4, 7-18, 7-21, 7-25, 8-8 Synchronization 1-10, 2-16, 2-17, 2-18, 3-2, 4-6, 5-3, 7-6, 7-8, 7-9, 7-10, 7-11, 7-16, 7-28, 7-29, 8-12, 8-13, 8-17
UNIX 1-3, 1-14, 2-13, 2-18, 2-20, 3-9, 4-20, 5-3, 5-4, 5-6, 5-16, 7-16 Up 6-5, 6-6 UPS 1-16 Uptime 1-2, 1-16, 2-11 UserCheck 6-5, 6-6, 8-9, 8-10
T
TCP 6-5, 8-9, 8-14, 8-15 TCP wrappers 1-10, 4-30 TCP/IP 1-2, 1-3, 1-4, 1-5, 8-2 TCSWAT 1-8, 1-11 Telnet 7-4 Threads 2-4 tl_sync 1-8 tlcs_config_sync 1-8, 4-6, 4-9, 7-6, 7-9, 7-16, 8-12 tlcs_content_sync 1-8, 4-6, 7-6, 7-7, 7-28 tlcsadmin 7-13, 7-15, 8-14 tlcsconfig 4-6, 4-7, 4-8, 4-10, 4-12, 4-16, 4-20, 4-27, 4-30, 6-1, 6-10, 7-2, 7-3, 7-5, 8-3, 8-9, 8-10 TLCS-GenCert 3-17 TLCS-install 3-3, 3-15, 3-16 Traffic management 2-12 Traffic manager 1-4, 1-14, 1-15, 4-19 Traffic Monitor 1-11, 7-16 Transarc 2-18 Troubleshooting 1-3, 1-6, 1-12, 3-15, 4-2, 7-1, 7-18, 8-1, 8-7 Tunneling 1-8, 1-17, 2-13, 4-18, 4-19, 4-20, 5-1, 5-6, 7-21, 7-30, 8-4, 8-7 turbocluster.conf 1-8 turbocluster_sync 1-8 turboclusteradmin 2-17, 3-14, 4-6, 4-7, 6-1, 6-3, 6-5, 7-2, 7-6, 7-9, 7-16, 7-25, 7-28 turboclusterd 1-8 turboclusterd.log 1-8 TurboLinux Server 1-6, 1-7, 1-14, 1-15, 3-3, 3-16, 5-3
V
VAX 2-2 Virtual connection 2-13 Virtual IP address 1-17, 4-27, 4-28, 4-29, 4-34, 4-35, 5-2, 5-3, 5-4, 5-6, 5-8, 5-13, 5-14, 5-16, 6-12, 7-12, 7-21, 7-24, 7-26, 7-27, 7-29, 7-30, 8-3 Virtual Private Network 2-14 Virtual server 1-4, 4-27, 4-28, 4-29 VirtualHost 6-10, 6-12 VMS 2-2 VPN 2-14
W
Web 1-2, 1-5, 1-11, 1-12, 1-15 Web server 1-5, 1-15, 2-20, 4-2, 4-14 Web site 1-17, 6-6, 8-5 Weight 4-22, 7-26, 8-3 Windows 2000 1-14, 5-11, 5-12, 5-13, 5-14, 5-16, 7-28 Windows NT 1-14, 2-4, 5-7, 5-9, 5-10, 5-16, 7-6, 7-28
U
UDP 6-5, 8-8, 8-9 UMA 2-4
I-5
I-6

Turbo Linux Cluster Server Users Guide

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Turbo Linux Cluster Server Users Guide

Transféré par

Droits d'auteur :

Formats disponibles

019.453.655.

744.335.695.7 451.009.658.007 122.342.981.161 549.326.784.677 738.309.304.390

547.383.211.231 679.345.245.667 767.234.679.565

C HAPTER 1 I NTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Why Use Cluster Server? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4

Whats New In This Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6

TurboLinux Cluster Server 6 User Guide

1-11 1-11 1-12 1-12 1-12 1-13

C HAPTER 2 C LUSTERING C ONCEPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

Components of a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7

Types of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

How a Cluster Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12

Cluster Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16 Shared Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17

TurboLinux Cluster Server 6 User Guide

2-19 2-20 2-20 2-21

C HAPTER 3 I NSTALLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

C HAPTER 4 C ONFIGURATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Configuration Tool Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6

Advanced Traffic Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23

TurboLinux Cluster Server 6 User Guide

Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27 Global Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30

C HAPTER 5 C ONFIGURING C LUSTER N ODES . . . . . . . . . . . . . . . . . . . . . . 5-1

C HAPTER 6 C ONFIGURATION F ILE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

Servers and ServerPool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

C HAPTER 7 ADMINISTRATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

TurboLinux Cluster Server 6 User Guide

Synchronization Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6

Cluster Management Console (CMC) . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18

C HAPTER 8 C LUSTER S ERVER A RCHITECTURE . . . . . . . . . . . . . . . . . . . . . 8-1

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server Contents

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

What Is Cluster Server?

What Is Cluster Server?

TurboLinux Cluster Server 6 User Guide

Why Use Cluster Server?

TurboLinux Cluster Server 6 User Guide

Why Use Cluster Server?

What Services Can Be Clustered?

TurboLinux Cluster Server 6 User Guide

Whats New In This Release

TurboLinux Cluster Server 6 User Guide

Whats New In This Release

Runs on Red Hat or TurboLinux

TurboLinux Cluster Server 6 User Guide

TURBOLINUX CLUSTER SERVER 6 NAME

Whats New In This Release

Stateless Fail-over Support

Delay Settings Separated

More Application Stability Agents

TurboLinux Cluster Server 6 User Guide

TurboLinux Cluster Server 6 User Guide

Whats New In This Release

Cluster Management Console

Configuration File Format

TurboLinux Cluster Server 6 User Guide

Whats New In This Release