Académique Documents
Professionnel Documents
Culture Documents
F1
F2 F3
Fl3
Fl4 Fl5
3. Containment of failures
1. Defect Prevention
Root-Cause Based Defect Prevention Activities:
Human Misconception or Lack of Knowledge - (root cause)
Education/training - (prevention)
Education/Training
People are the most relevant factor in software development and support. Thus educating and training people will support the prevention of errors
same
Domain specific and product specific knowledge Software development & support knowledge Software process knowledge Software methodology, technology, and tool knowledge
Improved software tools such as development platform that provides code frameworks and dynamic syntax checker/editor helps in preventing errors
2. Defect Reduction
It is not possible to have a 100% effective error prevention activity. Thus we need to perform fault removal to get defect reduction:
Inspection/Review Testing
Inspections
Less formal ones include walkthrough and reviews Most successful one is the formal Fagan Inspection methodology which includes multiple steps conducted by a team of inspectors:
Identification and assignment of moderator and inspectors Moderator prepares for the inspection; inspectors prepares for the inspection (different preparations) Actual reading/inspection of the material by inspectors --- possibly with the author in attendance Recording of the faults found, of severity status assigned, of fix target and schedule by moderator Fault fixes tracked and inspection report completed by moderator
Testing
Testing involves:
Development of test cases Execution of the test cases Observing the software behavior Recording and reporting the results
My additions
Testing Completion:
Based on coverage of test cases & test scenarios Based on pre-set reliability or quality goals
3. Defect Containment
In some systems such as medical, nuclear equipment, or aerospace industries even a little number of defects that escaped through prevention and reduction can be extremely harmful We need to contain the failures by reducing the resulting damage
Fault Tolerance
Fault tolerance originated with hardware systems with spare parts and back-up capabilities. Software we use the same concept : Recovery blocks : a section of the code or software is re-executed after a failure occurs and after some parameters are re-set. N-Version Programming: execution of several same functionality programs in parallel, thus a program failure may be localized and possibly bypassed
Safety Critical
Safety is a key characteristic of medical, nuclear, industrial, transportation and many other types of systems. Thus we view it from a containment perspective also:
After Hazard Prevention and Hazard Reduction of errors and faults, we need to be able to separate out or lockout the defective parts at containment time. Hazard Control and Damage Control are post accident activities of preventing the damage from further spreading and causing more than the original harm.