Vous êtes sur la page 1sur 32

GEC Engineering Training 2010

for
Advanced SCADA engineering'

Redundant system
using HAC

Venue: YEF-NLS
Date: Mar. 1, 2010 (Mon) - Mar. 5, 2010
(Fri)
Conducted by: YHQ GEC ETC Support
Copyright Yokogawa Electric Corporation

GEC Engineering Training 2010


Advanced SCADA engineering

Intro
Requirements
How it works
Configuration pitfalls

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Intro, what is a HAC system

HAC stands for High Availability


Computing
Software redundant FAST/TOOLS system
Keeps databases of backup system up to
date
Switchover by starting FAST/TOOLS on
backup system
Can be used on top of fault tolerant
hardware

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Intro, fault tolerant hardware

A fault tolerant hardware:

Protects against single hardware failures


No protection against software failures
No protection against database corruption
Nearly no takeover time
Master and backup system very close
together (Same building)
Upgrade windows or FAST/TOOLS needs
system down period

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Intro, HAC system

A HAC system:
Protects against single hardware failures
Medium protection against software failures
Some protection against database
corruption
Takeover time in order of seconds (after
error detection)
Master and backup system can be in
different buildings/cities
Upgrading windows or patching FAST/TOOLS
needs mostly only switchover

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Intro
Requirements
How it works
Configuration pitfalls

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Requirements

Requirements for a HAC system are:


Similar operating system, e.g. both windows
Reliable connection between the systems
(direct cable or redundant network)
Preferably use a separate network card for the
inter-serve communications

Enough bandwidth on this connection


Equipment reachable from both systems:
Use e.g. TCP/IP to serial convertors to connect
serial lines

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Intro
Requirements
How it works
Configuration pitfalls

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works
System
1

System
2

Explanation normal operation:


1. Boot system 1 and start FAST/TOOLS

Will become online

2. Boot system 2 and start FAST/TOOLS

Will start BUS/FAST and some HAC software


Will detect that other system is online
Will synchronize all its databases

3. When synchronize ready

System 2 will become standby


All database changes will be copied immediate

4. When system 1 disappears

System 2 will start remaning FAST/TOOLS and will


become online

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works

HAC contains the following functions:

File copy function


File caching
Monitor own system
Report to and monitor other system
User interface
Debug logging

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, file copy

File copy function:

Needed to keep backup disk up to date


Done by HACMIR process
Directories to be copied can be selected
Possible to exclude files
In standby mode:
For ISAM database files: Each changed record is
sent immediate to backup
For other files: Complete file is copied after file
closed

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, file copy

During synchronizing:
First a .zip of all related files on backup system
is created
Then per file difference is checked between
systems:
Missing and different files are copied on bucket basis
Key file is re-created on backup system

When last file is synchronized:


Mode to standby
.zip files are removed

When primary system fails during synchronize:


.zip files are restored (will take extra time)
Rest of FAST/TOOLS is started
Copyright Yokogawa Electric
Corporation

GEC Engineering Training

How it works, file caching

To speedup the starting of FAST/TOOLS:


On standby system each 5 minutes all
important files are read in cache (since 9.01)
Done by HACCACHE process
Simply reads the files specified in its set-up file
from the first to the last byte
When system has little memory available the
first files read will be pushed out of cache
again
Specify most important files at the end of the set-up
file!

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, monitor own system

Monitor own system


Done by process HACREC
Monitors:

Presence of processes
CPU load of processes
Queue size of processes
Responsiveness of some specific processes

On detection of a defect FAST/TOOLS health


will be considered BAD
During FAST/TOOLS startup checks are not
done for a period of time
Allow for peaks and bursts during initialization
Copyright Yokogawa Electric
Corporation

GEC Engineering Training

How it works, monitor own system

Presence of processes
The list of running processes is compared to
the list of processes which should run at
standby or active state
Regularly checks if all processes in current
mode are running
When a process is missing, FAST/TOOLS
health is considered BAD

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, monitor own system

CPU load of processes


Per process a CPU load limit and a
maximum time during which the CPU load is
allowed to be above that limit can be
configured
When the CPU load is higher than the limit
during the time period FAST/TOOLS health is
considered BAD

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, monitor own system

Queue usage of processes


Per process a queue filling limit and a
maximum time during which the queue
filling is allowed to be above that limit can
be configured
When the queue filling is higher than the
limit during the time period, FAST/TOOLS
health is considered BAD

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, monitor own system

Responsiveness of processes
Some processes have a build in I am alive
mechanism
When the alive flag is missing during the
configured time period, FAST/TOOLS health
is BAD
The configuration of this mechanism for a
process which does not support it will be
ignored

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, monitor own system

In a redundant system, FAST/TOOLS


health BAD will always cause a
switchover
Configured actions are ignored

If not running in a HAC configuration,


HACREC can be used to recover
Action to take on FAST/TOOLS health BAD can
be specified:

Copyright Yokogawa Electric


Corporation

ALERT
Print only alert message on UMH
PROCESS
Restart the process
TOOL
Restart the tool
REBOOT
Re-boot the system
SHUTDOWN Shut down the system

GEC Engineering Training

How it works, report to and monitor other


system

Reporting own status to partner system


and examining the status of the partner
system is done by HACWDG
Switchover will be activated when own
status is BAD and other system is standby
and OK
When partner system is not OK, a BAD
system will continue running with its
available functionality
By default a failing system will re-boot

HACWDG also distributes the system


status to the HAC HMIs
Copyright Yokogawa Electric
Corporation

GEC Engineering Training

How it works, user interface

The HAC HMI process can be used to


monitor and control the HAC functions
Monitor current mode, status and health of
both servers
Manually change mode of the servers
View monitor logging output

Can run on any of the servers and on all


connected workstations and front-end
systems
Can also run on non Windows systems

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

How it works, debug logging

The following debug logging can be


generated:
File transfer logging
All traffic between the HACMIR processes is logged
Each system has its own logging
Files tls\lst\hacmir*.log contain this logging

Monitor logging

All important changes and detections are logged


Each system has its own logging
Files tls\lst\hac_log_*.txt contain this logging
File tls\lst\hac_log_0.txt contains the number of the
actual logging file

State change loggings


System startup etc. logged in tls\lst\hacw_*.log
Copyright Yokogawa Electric
Corporation

GEC Engineering Training

Intro
Requirements
How it works
Configuration pitfalls

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Configuration pitfalls

CPU load of processes


Temporally high CPU load for a process can
be normal, e.g. for:

Copyright Yokogawa Electric


Corporation

RPTGEN during report generation


UMH/UMHLOG/LGHUMH during error burst
OPCEXE when EQP connections go up and down
ITM on debugger request save all items to disk

GEC Engineering Training

Configuration pitfalls

Queue usage of processes


Temporally message queuing for a process
can be normal, e.g. for:
ALM during alarm burst
LGHUMH during error burst
OPCEXE when EQP connections go up and down

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Configuration pitfalls

Responsiveness of processes
Some processes set this alive trigger only
once per minute (e.g. ALM)
Some processes can have unexpected long
processing periods (e.g. ITM on debugger
request save all items to disk)

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Configuration pitfalls

FAST/TOOLS startup limit


When FAST/TOOLS does not startup within 5
minutes the system will be stopped
This parameter is not configurable
A FAST/TOOLS startup time of 5.1 minutes
will result in a forever switching system!
So optimize FAST/TOOLS startup if long
start-up times are expected

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Configuration aspects

Key file repair time during synchronize


Repairing big key file (GB) can take minutes
During this time slave system will not
respond
The file will be synchronized correctly,
however the master system can get timeout
causing synchronization to be restarted
Increase data to master parameter when
this happens too often.
hac.sup, tab mirror (general)

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Configuration pitfalls

Backup during synchronization


Creating a .zip file of the his directory can
take a lot of time and/or the .zip file
becomes bigger that the limit of 2GB
It is possible to backup only the *000.* files
or *000000.* files
Long term history should usually be configured to
be stored on a separate disk anyway

Be sure to change both scripts


hacm_create_backup and
hacm_restore_backup

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Configuration pitfalls

Known problems
Do not use * definitions in process name
specifications
This can result in BAD health ghost messages

Do not define mirror directories more than


once
Exclude file commands for directories do not work

HACMIR for windows is case insensitive while


web HMI browser is case sensitive
HACMIR can change the case of a display name when
it is copied to the backup system (to lower case)
When web HMI client is not running on the server it
needs the browser and can not find the display
Recommend using a separate HMI server for HAC
configurations
Copyright Yokogawa Electric
Corporation

GEC Engineering Training

Exercise

Exercise: Create a HAC configuration


VMWARE system HAC1 contains a filled
FAST/TOOLS system, must become the primary
system
IP address = 192.168.0.x

VMWARE system HAC2 contains an empty


FAST/TOOLS system, must become the
secondary system
IP address = 192.168.0.y

When running in redundant mode kill process


HIS on the master: Switchover should occur

Copyright Yokogawa Electric


Corporation

GEC Engineering Training

Thank you for your


attention

Commitment means building the future to


last
Copyright Yokogawa Electric
Corporation

GEC Engineering Training