Académique Documents
Professionnel Documents
Culture Documents
Software reliability
probability a software component will produce an incorrect output software does not wear out software can continue to operate after a bad result
Operator reliability
probability system user makes an error
Failure Probabilities
If there are two independent components in a system and the operation of the system depends on them both then P(S) = P(A) + P(B) If the components are replicated then the probability of failure is P(S) = P(A)n meaning that all components fail at once
Time Units
Raw Execution Time
non-stop system
Calendar Time
If the system has regular usage patterns
Number of Transactions
demand type transaction systems
Availability
Measures the fraction of time system is really available for use Takes repair and restart times into account Relevant for non-stop continuously running systems (e.g. traffic signal)
Failure Classification
Transient - only occurs with certain inputs Permanent - occurs on all inputs Recoverable - system can recover without operator help Unrecoverable - operator has to help Non-corrupting - failure does not corrupt system state or data Corrupting - system state or data are altered
Examples
Failure Class Example Metric
ATM fails to Permanent Non-corrupting operate with any
ROCOF = .0001 card, must restart to Time unit = days correct POFOD = .0001 Time unit = transactions
undamaged card
Specification Validation
It is impossible to empirically validate high reliability specifications No database corruption really means POFOD class < 1 in 200 million If each transaction takes 1 second to verify, simulation of one days transactions takes 3.5 days
Safety Specification
Each safety specification should be specified separately These requirements should be based on hazard and risk analysis Safety requirements usually apply to the system as a whole rather than individual components System safety is an an emergent system property
Safety Processes
Hazard and risk analysis
assess the hazards and risks associated with the system
Safety validation
check overall system safety
Hazard decomposition
seek to discover potential root causes for each hazard
Fault-tree Analysis
Hazard analysis method that starts with an identified fault and works backwards to the cause of the fault Can be used at all stages of hazard analysis It is a top-down technique, that may be combined with a bottom-up hazard analysis techniques that start with system failures that lead to hazards
Risk Assessment
Assess the hazard severity, hazard probability, and accident probability Outcome of risk assessment is a statement of acceptability
Intolerable (can never occur) ALARP (as low as possible given cost and schedule constraints) Acceptable (consequences are acceptable and no extra cost should be incurred to reduce it further)
Risk Acceptability
Determined by human, social, and political considerations In most societies, the boundaries between regions are pushed upwards with time (meaning risk becomes less acceptable) Risk assessment is always subjective (what is acceptable to one person is ALARP to another)
Risk Reduction
System should be specified so that hazards do not arise or result in an accident Hazard avoidance
system designed so hazard can never arise during normal operation
Damage limitation
system designed to minimized accident consequences
Security Specification
Similar to safety specification
not possible to specify quantitatively usually stated in system shall not terms rather than system shall terms
Differences
no well-defined security life cycle yet security deals with generic threats rather than system specific hazards
Threat assignment
identified threats are related to assets so that asset has a list of associated threats