Ieee Recommended Practice On Software Reliability

IEEE Recommended Practice on
Software Reliability
IEEE Reliability Society
Sponsored by the
Standards Committee
IEEE
3 Park Avenue IEEE Std 1633™-2016
New York, NY 10016-5997 (Revision of IEEE Std 1633-2008)
USA
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633™-2016
(Revision of IEEE Std 1633-2008)

Sponsor
Standards Committee
of the
IEEE Reliability Society
Approved 22 September 2016
IEEE-SA Standards Board
Abstract: The methods for assessing and predicting the reliability of software, based on a life-
cycle approach to software reliability engineering (SRE), are prescribed in this recommended
practice. It provides information necessary for the application of software reliability (SR)
measurement to a project, lays a foundation for building consistent methods, and establishes the
basic principle for collecting the data needed to assess and predict the reliability of software. The
recommended practice prescribes how any user can participate in SR assessments and
predictions.
Keywords: IEEE 1633™, software failure modes, software reliability
The Institute of Electrical and Electronics Engineers, Inc.

3 Park Avenue, New York, NY 10016-5997, USA
Copyright © 2017 by The Institute of Electrical and Electronics Engineers, Inc.

All rights reserved. Published 18 January 2017. Printed in the United States of America.
IEEE is a registered trademark in the U.S. Patent & Trademark Office, owned by The Institute of Electrical and Electronics
Engineers, Incorporated.
Capability Maturity Model Integrated and CMMI are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.
COCOMO is a registered trademark of Barry W. Boehm.
Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries.
Java is a trademark of Sun Microsystems, Inc. in the United States and other countries.
Price is a registered trademark of Price Systems, L.L.C.
217Plus is a trademark of Quanterion Solutions Incorporated.
PDF: ISBN 978-1-5044-3648-9 STD22370

Print: ISBN 978-1-5044-3649-6 STDPD22370
IEEE prohibits discrimination, harassment, and bullying.

For more information, visit http://www.ieee.org/web/aboutus/whatis/policies/p9-26.html.
No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior written permission
of the publisher.
2
Copyright © 2017 IEEE. All rights reserved.
Important Notices and Disclaimers Concerning IEEE Standards Documents
IEEE documents are made available for use subject to important notices and legal disclaimers. These
notices and disclaimers, or a reference to this page, appear in all standards and may be found under the
heading “Important Notices and Disclaimers Concerning IEEE Standards Documents.” They can also be
obtained on request from IEEE or viewed at http://standards.ieee.org/IPR/disclaimers.html.
Notice and Disclaimer of Liability Concerning the Use of IEEE Standards

Documents
IEEE Standards documents (standards, recommended practices, and guides), both full-use and trial-use, are
developed within IEEE Societies and the Standards Coordinating Committees of the IEEE Standards
Association (“IEEE-SA”) Standards Board. IEEE (“the Institute”) develops its standards through a
consensus development process, approved by the American National Standards Institute (“ANSI”), which
brings together volunteers representing varied viewpoints and interests to achieve the final product. IEEE
Standards are documents developed through scientific, academic, and industry-based technical working
groups. Volunteers in IEEE working groups are not necessarily members of the Institute and participate
without compensation from IEEE. While IEEE administers the process and establishes rules to promote
fairness in the consensus development process, IEEE does not independently evaluate, test, or verify the
accuracy of any of the information or the soundness of any judgments contained in its standards.
IEEE Standards do not guarantee or ensure safety, security, health, or environmental protection, or ensure
against interference with or from other devices or networks. Implementers and users of IEEE Standards
documents are responsible for determining and complying with all appropriate safety, security,
environmental, health, and interference protection practices and all applicable laws and regulations.
IEEE does not warrant or represent the accuracy or content of the material contained in its standards, and
expressly disclaims all warranties (express, implied and statutory) not included in this or any other
document relating to the standard, including, but not limited to, the warranties of: merchantability; fitness
for a particular purpose; non-infringement; and quality, accuracy, effectiveness, currency, or completeness
of material. In addition, IEEE disclaims any and all conditions relating to: results; and workmanlike effort.
IEEE standards documents are supplied “AS IS” and “WITH ALL FAULTS.”
Use of an IEEE standard is wholly voluntary. The existence of an IEEE standard does not imply that there
are no other ways to produce, test, measure, purchase, market, or provide other goods and services related
to the scope of the IEEE standard. Furthermore, the viewpoint expressed at the time a standard is approved
and issued is subject to change brought about through developments in the state of the art and comments
received from users of the standard.
In publishing and making its standards available, IEEE is not suggesting or rendering professional or other
services for, or on behalf of, any person or entity nor is IEEE undertaking to perform any duty owed by any
other person or entity to another. Any person utilizing any IEEE Standards document, should rely upon his
or her own independent judgment in the exercise of reasonable care in any given circumstances or, as
appropriate, seek the advice of a competent professional in determining the appropriateness of a given
IEEE standard.
IN NO EVENT SHALL IEEE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO:
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE PUBLICATION, USE OF, OR RELIANCE
UPON ANY STANDARD, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE AND
REGARDLESS OF WHETHER SUCH DAMAGE WAS FORESEEABLE.
3
Translations
The IEEE consensus development process involves the review of documents in English only. In the event
that an IEEE standard is translated, only the English version published by IEEE should be considered the
approved IEEE standard.
Official statements
A statement, written or oral, that is not processed in accordance with the IEEE-SA Standards Board
Operations Manual shall not be considered or inferred to be the official position of IEEE or any of its
committees and shall not be considered to be, or be relied upon as, a formal position of IEEE. At lectures,
symposia, seminars, or educational courses, an individual presenting information on IEEE standards shall
make it clear that his or her views should be considered the personal views of that individual rather than the
formal position of IEEE.
Comments on standards
Comments for revision of IEEE Standards documents are welcome from any interested party, regardless of
membership affiliation with IEEE. However, IEEE does not provide consulting information or advice
pertaining to IEEE Standards documents. Suggestions for changes in documents should be in the form of a
proposed change of text, together with appropriate supporting comments. Since IEEE standards represent a
consensus of concerned interests, it is important that any responses to comments and questions also receive
the concurrence of a balance of interests. For this reason, IEEE and the members of its societies and
Standards Coordinating Committees are not able to provide an instant response to comments or questions
except in those cases where the matter has previously been addressed. For the same reason, IEEE does not
respond to interpretation requests. Any person who would like to participate in revisions to an IEEE
standard is welcome to join the relevant IEEE working group.
Comments on standards should be submitted to the following address:
Secretary, IEEE-SA Standards Board

445 Hoes Lane
Piscataway, NJ 08854 USA
Laws and regulations
Users of IEEE Standards documents should consult all applicable laws and regulations. Compliance with
the provisions of any IEEE Standards document does not imply compliance to any applicable regulatory
requirements. Implementers of the standard are responsible for observing or referring to the applicable
regulatory requirements. IEEE does not, by the publication of its standards, intend to urge action that is not
in compliance with applicable laws, and these documents may not be construed as doing so
Copyrights
IEEE draft and approved standards are copyrighted by IEEE under U.S. and international copyright laws.
They are made available by IEEE and are adopted for a wide variety of both public and private uses. These
include both use, by reference, in laws and regulations, and use in private self-regulation, standardization,
and the promotion of engineering practices and methods. By making these documents available for use and
adoption by public authorities and private users, IEEE does not waive any rights in copyright to the
documents.
4
Photocopies
Subject to payment of the appropriate fee, IEEE will grant users a limited, non-exclusive license to
photocopy portions of any individual standard for company or organizational internal use or individual,
non-commercial use only. To arrange for payment of licensing fees, please contact Copyright Clearance
Center, Customer Service, 222 Rosewood Drive, Danvers, MA 01923 USA; +1 978 750 8400. Permission
to photocopy portions of any individual standard for educational classroom use can also be obtained
through the Copyright Clearance Center.
Updating of IEEE Standards documents

Users of IEEE Standards documents should be aware that these documents may be superseded at any time
by the issuance of new editions or may be amended from time to time through the issuance of amendments,
corrigenda, or errata. An official IEEE document at any point in time consists of the current edition of the
document together with any amendments, corrigenda, or errata then in effect.
Every IEEE standard is subjected to review at least every ten years. When a document is more than ten
years old and has not undergone a revision process, it is reasonable to conclude that its contents, although
still of some value, do not wholly reflect the present state of the art. Users are cautioned to check to
determine that they have the latest edition of any IEEE standard.
In order to determine whether a given document is the current edition and whether it has been amended
through the issuance of amendments, corrigenda, or errata, visit the IEEE Xplore at
http://ieeexplore.ieee.org/ or contact IEEE at the address listed previously. For more information about the
IEEE-SA or IEEE’s standards development process, visit the IEEE-SA Website at http://standards.ieee.org.
Errata
Errata, if any, for all IEEE standards can be accessed on the IEEE-SA Website at the following URL:
http://standards.ieee.org/findstds/errata/index.html. Users are encouraged to check this URL for errata
periodically.
Patents
Attention is called to the possibility that implementation of this standard may require use of subject matter
covered by patent rights. By publication of this standard, no position is taken by the IEEE with respect to
the existence or validity of any patent rights in connection therewith. If a patent holder or patent applicant
has filed a statement of assurance via an Accepted Letter of Assurance, then the statement is listed on the
IEEE-SA Website at http://standards.ieee.org/about/sasb/patcom/patents.html. Letters of Assurance may
indicate whether the Submitter is willing or unwilling to grant licenses under patent rights without
compensation or under reasonable rates, with reasonable terms and conditions that are demonstrably free of
any unfair discrimination to applicants desiring to obtain such licenses.
Essential Patent Claims may exist for which a Letter of Assurance has not been received. The IEEE is not
responsible for identifying Essential Patent Claims for which a license may be required, for conducting
inquiries into the legal validity or scope of Patents Claims, or determining whether any licensing terms or
conditions provided in connection with submission of a Letter of Assurance, if any, or in any licensing
agreements are reasonable or non-discriminatory. Users of this standard are expressly advised that
determination of the validity of any patent rights, and the risk of infringement of such rights, is entirely
their own responsibility. Further information may be obtained from the IEEE Standards Association.
5
Participants
At the time this IEEE recommended practice was completed, the IEEE 1633 Working Group had the
following membership:
Ann Marie Neufelder, Chair

Martha Wetherholt, Vice Chair
Debra Haehn, Secretary
Lou Gullo, Sponsor Chair
Jacob Axman Nathan Herbert Allen Nikora

Bakul Banerjee Claire Jones Mark Ofori-kyei
David Bernreuther Burdette Joyner Robert Raygan
Nematollah Bidokhti Ahlia T. Kitwana Ying Shi
Robert Binder Peter Lakey Marty Shooman
Sonya Davis Ming Li Mark Sims
Mary Ann DeCicco Andy Long Michael Siok
Lance Fiondella Debra Greenhalgh Lubas Shane Smith
Willie Fitzpatrick Andrew Mack George Stark
Kevin Frye Franklin Marotta Kishor Trivedi
Loren Garroway Kevin Mattos Thierry Wandji
Richard E. Gibbs III Brian McQuillan Martin Wayne
Michael Grottke Rajesh Murthy Yuan Wei
Darwin Heiser Harry White
The following members of the individual balloting committee voted on this recommended practice.
Balloters may have voted for approval, disapproval, or abstention.
Johann Amsenga Debra Haehn Ann Marie Neufelder

Bakul Banerjee Jon Hagar Michael Newman
Pieter Botman Werner Hoelzl Mark Ofori-Kyei
Bill Brown Bernard Homes Howard Penrose
Keith Chow Noriyuki Ikeuchi Iulian Profir
Paul Croll Cheryl Jones Stephen Schwarm
Sonya Davis Piotr Karocki Jeremy Smith
Mary Ann DeCicco Chad Kiger Thomas Starai
Neal Dowling Ahlia Kitwana Eugene Stoudenmire
Richard Doyle Edward McCall Walter Struppler
Lance Fiondella Jeffrey Moore Eric Thibodeau
Debra Greenhalgh Rajesh Murthy Martha Wetherholt
Randall Groves Andrew Nack Paul Work
Louis Gullo Daidi Zhong
6
When the IEEE-SA Standards Board approved this recommended practice on 22 September 2016, it had
the following membership:
Jean-Philippe Faure, Chair

Ted Burse, Vice Chair
John D. Kulick, Past Chair
Konstantinos Karachalios, Secretary
Chuck Adams Gary Hoffman Mehmet Ulema

Masayuki Ariyoshi Michael Janezic Yingli Wen
Stephen Dukes Joseph L. Koepfinger* Howard Wolfman
Jianbin Fan Hung Ling Don Wright
Ronald W. Hotchkiss Kevin Lu Yu Yuan
J. Travis Griffith Gary Robinson Daidi Zhong
Annette D. Reilly
*Member Emeritus
7
Introduction
This introduction is not part of IEEE Std 1633-2016, IEEE Recommended Practice on Software Reliability.
Software is, from a materials viewpoint, both malleable and ductile. This means there are multiple ways to
introduce failures, intentional and un-intentional. Fixing a software defect can introduce a potential defect.
In many cases the failures that result from software defects are both predictable and avoidable but they still
occur because of the following:
a) Lack of available calendar time/resources to find all of the defects that can result in failures
b) Exceedingly complex event driven systems that are difficult to conceptualize and therefore
implement and test
c) Organizational culture that neglects to support sufficient rigor, skills, or methods required to find
the defects
d) Technical decisions that result in incorrect architecture or design decision that cannot support the
stakeholders specifications
e) Insufficient project or risk management that leads to schedule delays that lead to less time for
reliability testing
f) Operations—Contract issues, interoperability due to bad specifications and stakeholder
communications
Even a small number of software failures can lead to monetary catastrophes such as a cancelled project.
Hardware (HW) failures can be random, due to wear-out or the result of a systematic design flaw.
Reliability maintainability availability (RMA) is used to prevent and deal with hardware failures. Software
failures may result from systematic flaws in the requirements, design, code or interfaces. Hence, software
failure does not require an RMA but instead a corrective action to an existing installation. Software failures
can be common cause failures in that the same failure mode can cause multiple failures in more than one
part of the software.
Software reliability engineering (SRE) is an established discipline that can help organizations improve the
reliability of their products and processes. It is important for an organization to have a process discipline if
it is to produce high reliability software. These are specific practices and recommendations, each of which
has a context within the software engineering life cycle. A specific practice may be implemented or used in
a particular stage of the life cycle or used across several stages. Figure 1 shows how the focus of SRE shifts
as a project progresses from inception to release. The size of each bubble on this figure corresponds to how
much the particular SRE practices are being executed during each particular phase of development or
operation. For example in software engineering projects, the failure modes and effects analysis (FMEA) is
typically performed earlier in the life cycle.
8
Figure 1 —SRE focus by stage
The scope of this recommended practice is to address software reliability (SR). It does not specifically
address systems reliability, software safety, or software security. However, it does recognize that safety and
security requirements are part of the initial risk assessment. The recommended practice only briefly
addresses software quality. This recommended practice provides a common baseline for discussion and
prescribes methods for assessing and predicting the reliability of software. The recommended practice is
intended to be used in support of designing, developing, and testing software and to provide a foundation
on which practitioners can build consistent methods for assessing the reliability of software. It is intended
to meet the needs of software practitioners and users who are confronted with varying terminology for
reliability measurement and a plethora of models and data collection methods. This recommended practice
contains information necessary for the application of SR measurement to a project. This includes SR
activities throughout the software life cycle (SLC) starting at requirements generation by identifying the
application, specifying and analyzing requirements, and continuing into the implementation.
This standard includes guidance on the following:
 Common terminology
 Assessment of software reliability risks that pertain to the software or project
 Software failure mode analyses that can help to identify and reduce the types of defects most likely
to result in a system failure
 Models for predicting software reliability early in development
 Models for estimating software reliability in testing and operation
 Test coverage and test selection
 Data collection procedures to support SR estimation and prediction
 Determining when to release a software system, or to stop testing the software and implement
corrections
 Identifying elements in a software system that are leading candidates for redesign to improve
reliability
9
Revisions to the document and notes
 This document is a revision of IEEE Std 1633-2008.

 Addition of models that can be used early in development, before testing, to predict software
reliability
 Addition of failure modes analysis
 Revision of the software reliability growth models so as to be more practical
 Addition of practical guidance for selecting the best models
 Addition of techniques for applying SRE in incremental development
 Addition of methods to assess the SRE related risks
Structure of the recommended practice
This recommended practice contains six clauses and seven annexes as follows:
 Clause 1 provides the overview.

 Clause 2 provides the normative references.
 Clause 3 provides the definitions, acronyms, and abbreviations
 Clause 4 provides the roles, approaches, and concepts related to SRE.
 Clause 5 provides the SRE procedures.
 Clause 6 provides the predictive and estimation SRE models.
 Annex A contains software failure modes effects analysis (SFMEA) templates.
 Annex B provides methods for predicting EKSLOC (Effective 1000 Source Lines of Code), which
is necessary for the reliability predictions as well as for models that predict software defect density
and defects prior to the testing phase.
 Annex C provides additional software reliability growth models and provides the results of a survey
of most used software reliability growth models.
 Annex D provides the estimated cost of the SRE tasks.
 Annex E contains a list of tools that pertain to SRE tasks.
 Annex F contains examples.
 Annex G contains an informative Bibliography.
Copyrights and Permissions

Permissions have been granted as follows: 1
Content appearing in Clause 4 Roles, approach, concepts, including all subclauses; 5.4.3 Measure test
coverage, 5.5 Support release decision; adapted with permission of Robert V. Binder, Beware of Greeks
bearing data, 2014.
1
Every effort has been made to secure permission to reprint borrowed material contained in this document. If omissions have been
made, please bring them to our attention.
10
Content and tables appearing in 5.1.1.1, 5.1.1.3, 5.1.3.4, 5.1.3.5, 5.3.1, 5.3.2, 5.3.3, 5.3.5.2, 5.3.5.3, 5.3.8,
5.4.7, 6.1, 6.2.1.1, 6.2.1.2, 6.2.2.1, B.2.1, B.2.3, B.3, Table 12 “Keywords associated with common root
causes for defects,” Annex D, F.3, F.4.4, F.4.5, Table 48 “Average defect densities by application type
(EKSLOC),” Table 45 “Factors in determining root cause inaccuracies” reprinted with permission of Ann
Marie Neufelder, Softrel, LLC “Software Reliability Toolkit” © 2015.
Content and tables appearing in 5.4.5, 6.3.2, 6.3.3, F.3, F.6 reprinted with permission of Ann Marie
Neufelder, Softrel, LLC “Advanced Software Reliability” © 2015.
Figure 27 “SFMEA process,” 5.2.2, F.4.3, and all tables in Annex A reprinted with permission from Ann
Marie Neufelder, Softrel, LLC “Effective Application of Software Failure Modes Effects Analysis” ©
2014.
Table 8 “Relationship between risks and outcome” reprinted with permission of Softrel, LLC. “Four things
that are almost guaranteed to reduce the reliability of a software intensive system,” Huntsville Society of
Reliability Engineers RAMS VII Conference © 2014.
A portion of 5.5.1 has been reprinted with permission from Lockheed Martin Corporation article entitled
“Determine Release Stability” © 2015 Lockheed Martin Corporation. All rights reserved.
A portion of 5.5.4 has been reprinted with permission from Lockheed Martin Corporation article entitled
“Perform a Reliability Demonstration Test (RDT)” © 2015 Lockheed Martin Corporation. All rights
reserved.
Table 47 Shortcut Model Survey and Table F.9 Example of the Shortcut Model Survey reprinted with
permission Softrel, LLC “A Practical Toolkit for Predicting Software Reliability” © 2006.
Table 49 reprinted with permission from Capers Jones, “Software Industry Blindfolds: Invalid Metrics and
Inaccurate Metrics,” Namcook Analytics, 2005.
“Elevator Example” in F.4.2 reprinted with permission from Peter B. Lakey, Operational Profile Testing
© 2016.
11
Contents
1. Overview ...................................................................................................................................................14
1.1 Scope ..................................................................................................................................................14
1.2 Purpose ...............................................................................................................................................14
2. Normative references.................................................................................................................................14
3. Definitions, acronyms, and abbreviations .................................................................................................15

3.1 Definitions ..........................................................................................................................................15
3.2 Acronyms and abbreviations ..............................................................................................................18
4. Role, approach, and concepts ....................................................................................................................21

4.1 What is software reliability engineering? ...........................................................................................21
4.2 Strategy ...............................................................................................................................................21
4.3 Project-specific SRE tailoring ............................................................................................................23
4.4 Life-cycle considerations for SRE ......................................................................................................30
5. Software reliability procedure ...................................................................................................................34

5.1 Plan for software reliability ................................................................................................................34
5.2 Develop failure modes model .............................................................................................................56
5.3 Apply software reliability during development ..................................................................................67
5.4 Apply software reliability during testing ..........................................................................................110
5.5 Support release decision ...................................................................................................................136
5.6 Apply software reliability in operation .............................................................................................142
6. Software reliability models ......................................................................................................................148

6.1 Overview ..........................................................................................................................................148
6.2 Models that can be used before testing .............................................................................................149
6.3 Models that can be used during and after testing ..............................................................................159
Annex A (informative) Software failure modes effects analysis templates .................................................168

A.1 Templates for preparing the software failure modes effects analysis (SFMEA) .............................168
A.2 Templates for analyzing the failure modes and root causes.............................................................170
A.3 Template for consequences ..............................................................................................................174
A.4 Template for mitigation ...................................................................................................................175
Annex B (informative) Methods for predicting software reliability during development ...........................176
B.1 Methods for predicting code size .....................................................................................................176
B.2 Additional models for predicting defect density or defects..............................................................180
B.3 Factors that have been correlated to fielded defects.........................................................................186
Annex C (informative) Additional information on software reliability models used during testing ...........189
C.1 Models that can be used when the fault rate is peaking ...................................................................189
C.2 Models that can be used when the fault rate is decreasing ...............................................................189
C.3 Models that can be used with increasing and then decreasing fault rate ..........................................194
C.4 Models that can be used regardless of the fault rate trend ...............................................................196
C.5 Models that estimate remaining defects ...........................................................................................197
C.6 Results of the IEEE survey ..............................................................................................................198
Annex D (informative) Estimated relative cost of SRE tasks......................................................................200
Annex E (informative) Software reliability engineering related tools .........................................................203
12
Annex F (informative) Examples ................................................................................................................205
F.1 Examples from 5.1 ...........................................................................................................................205
F.2 Examples from 5.2 ...........................................................................................................................209
F.3 Examples from 5.3 ...........................................................................................................................220
F.4 Examples from 5.4 ...........................................................................................................................230
F.5 Examples from 5.5 ...........................................................................................................................251
F.6 Examples from 5.6 ...........................................................................................................................252
Annex G (informative) Bibliography ..........................................................................................................254
13
1. Overview
1.1 Scope
This recommended practice defines the software reliability engineering (SRE) processes, prediction
models, growth models, tools, and practices of an organization. This document and its models and tools are
useful to any development organization to identify the methods, equations, and criteria for quantitatively
assessing the reliability of a software or firmware subsystem or product. Organizations that acquire
software subsystems or products developed with consideration to this recommended practice will benefit by
knowing the reliability of the software prior to acquisition. This document does not seek to certify either
the software or firmware or the processes employed for developing the software or firmware.
1.2 Purpose
The purpose for assessing the reliability of a software or firmware subsystem or product is to determine
whether the software has met an established reliability objective and facilitate improvement of product
reliability. The document defines the recommended practices for predicting software reliability (SR) early
in development so as to facilitate planning, sensitivity analysis and trade-offs. This document also defines
the recommended practices for estimating SR during test and operation so as to establish whether the
software or firmware meets an established objective for reliability.
2. Normative references
The following referenced documents are indispensable for the application of this document (i.e., they must
be understood and used, so each referenced document is cited in text and its relationship to this document is
explained). For dated references, only the edition cited applies. For undated references, the latest edition of
the referenced document (including any amendments or corrigenda) applies.
IEEE Std 12207™-2008, ISO/IEC/IEEE Standard for Systems and Software Engineering—Software Life
Cycle Processes. 1, 2
1
The IEEE standards or products referred to in this clause are trademarks of The Institute of Electrical and Electronics Engineers, Inc.
2
IEEE publications are available from The Institute of Electrical and Electronics Engineers (http://standards.ieee.org/).
14
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
3. Definitions, acronyms, and abbreviations

For the purposes of this document, the following terms and definitions apply. The IEEE Standards
Dictionary Online should be consulted for terms not defined in this clause. 3
3.1 Definitions
agile (software development): (A) IEEE definition: software development approach based on iterative
development, frequent inspection and adaptation, and incremental deliveries, in which requirements and
solutions evolve through collaboration in cross-functional teams and through continuous stakeholder
feedback. (B) Agile Consortium definition: software development approach based on iterative
development, frequent inspection and adaptation, and incremental deliveries, in which requirements and
solutions evolve through collaboration in cross-functional teams and through continuous stakeholder
feedback.
assessment: (A) An action for applying specific documented criteria to a specific software module,
package, or product for the purpose of determining acceptance or release of the software module, package,
or product. (ISO/IEC/IEEE 24765™-2010) (B) Determining what action to take for software that fails to
meet goals (e.g., intensify inspection, intensify testing, redesign software, and revise process).
NOTE—The formulation of test strategies is also part of assessment. Test strategy formulation involves the
determination of priority, duration, and completion date of testing, allocation of personnel, and allocation of computer
resources to testing. 4
calendar time: Chronological time, including time during which a computer may not be running.
clock time: Elapsed wall-clock time from the start of program execution to the end of program execution.
defect: (A) A problem that, if not corrected, could cause an application to either fail or to produce incorrect
results. (ISO/IEC/IEEE 24765-2010)
NOTE—For the purposes of this standard, defects are the result of errors that are manifest in the system requirements,
software requirements, interfaces, architecture, detailed design, or code. A defect may result in one or more failures. It
is also possible that a defect may never result in a fault if the operational profile is such that the code containing the
defect is never executed.
defect pileup: When residual defects in the software are not removed, and over time, the number increases
to the point of adversely affecting the reliability and schedule of the software releases.
error: A human action that produced an incorrect result, such as software containing a fault. Examples
include omission or misinterpretation of user requirements in a software specification, incorrect translation,
or omission of a requirement in the design specification. (ISO/IEC/IEEE 24765-2010)
NOTE—For the purposes of this standard, an error can also include incorrect software interfaces, software
architecture, design, or code.
evolutionary development: Developing and delivering software in iterative drops to the customer with a
concentration on developing the most important or least understood requirements first, and once those are
working and approved by the customer, improve the requirements, design, and testing to increase the
functionality to meet the stakeholders’ desired functionality and performance. This is especially useful for
development of new products. See also: incremental development.
NOTE—See Larman [B51]. 5
3
IEEE Standards Dictionary Online is available at: http://ieeexplore.ieee.org/.
4
Notes in text, tables, and figures of a standard are given for information only and do not contain requirements needed to implement
this standard.
5
The numbers in brackets correspond to those of the bibliography in Annex G.
15
IEEE Std 1633-2016
execution time: (A) The amount of actual or central processor time used in executing a program. (B) The
period of time during which a program is executing.
NOTE—Processor time is usually less than elapsed time because the processor may be idle (for example, awaiting
needed computer resources) or employed on other tasks during the execution of a program.
failure: (A) The inability of a system or system component to perform a required function within specified
limits. (B) The termination of the ability of a product to perform a required function or its inability to
perform within previously specified limits. (ISO/IEC/IEEE 24765-2010) (C) A departure of program
operation from program requirements. A failure may be produced when a fault is encountered and a loss of
the expected service results.
NOTE 1—A failure may be produced when a fault is encountered and a loss of the expected service to the user results.
NOTE 2—There may not be a one-to-one relationship between faults and failures. This can happen if the system has
been designed to be fault tolerant. It can also happen if a fault does not result in a failure either because it is not severe
enough to result in a failure or does not manifest into a failure due to the system not achieving that operational or
environmental state that would trigger it.
failure intensity: Total failures observed over total operational hours experienced.
failure rate: (A) The ratio of the number of failures of a given category or severity to a given period of
time; for example, failures per second of execution time, failures per month. Syn: failure intensity. (B) The
ratio of the number of failures to a given unit of measure, such as failures per unit of time, failures per
number of transactions, failures per number of computer runs.
failure severity: A rating system for the impact of every recognized credible software failure mode.
fault: (A) A defect in the code that can be the cause of one or more failures. (B) A manifestation of an error
in the software. (ISO/IEC/IEEE 24765-2010)
NOTE—There may not necessarily be a one-to-one relationship between faults and failures if the system has been
designed to be fault tolerant or if a fault is not severe enough to result in a failure.
fault tolerance: (A) The survival attribute of a system that allows it to deliver the required service after
faults have manifested themselves within the system. (B) The ability of a system or a component to
continue normal operation despite the presence of hardware or software faults. (ISO/IEC/IEEE 24765-
2010)
firmware: The combination of a hardware device, software, and data that are incorporated into the
hardware device.
NOTE—For compliance with this standard, firmware is treated as software in a programmable device. (Adapted from
IEEE Std 610.12™-1990)
function point: (A) A unit of measurement to express the amount of business functionality an information
system (as a product) provides to a user. (B) A unit that expresses the size of an application or of a project
(ISO/IEC/IEEE 24765-2010) Function points measure software size. The functional user requirements of
the software are identified, and each one is categorized into one of five types: outputs, inquiries, inputs,
internal files, and external interfaces.
incremental development: A software development technique in which requirements definition, design,

implementation, and testing occur in an overlapping, iterative (rather than sequential) manner, resulting in
incremental completion of the overall software product. Contrast: waterfall model.
integration: The process of combining software elements, hardware elements, or both into an overall
system.
16
IEEE Std 1633-2016
life-cycle model: A framework containing the processes, activities, and tasks involved in the development,
operation, and maintenance of a software product, spanning the life of the system from the definition of its
requirements to the termination of its use. (ISO/IEC/IEEE 24765-2010) Contrast: software development
cycle.
maximum likelihood estimation: A form of parameter estimation in which selected parameters maximize
the probability that observed data could have occurred.
module: (A) A program unit that is discrete and identifiable with respect to compiling, combining with
other units, and loading; for example, input to or output from an assembler, compiler, linkage editor, or
executive routine. (B) A logically separable part of a program.
operational: (A) Pertaining to the status given a software product once it has entered the operation and
maintenance phase. (B) Pertaining to a system or component that is ready for use in its intended
environment.
reliability growth: (A) The amount the software reliability improves from operational usage and other
stresses. (B) The improvement in reliability that results from correction of faults.
requirement reliability risk: The probability that requirements changes will decrease reliability.
scrum: (A) An iterative and incremental agile software development methodology for managing product
development. (B) The iterative project management framework used in agile development, in which a team
agrees on development items from a requirements backlog and produces them within a short duration of a
few weeks.
sprint: (A) The basic unit of development in agile/scrum. The sprint is restricted to a specific duration. The
duration is fixed in advance for each sprint and is normally between one week and one month, with two
weeks being the most common. (B) The short time frame, in which a set of software features is developed,
leading to a working product that can be demonstrated to stakeholders.
software development cycle: (A) The period of time that begins with the decision to develop a software
product and ends when the software is delivered. (ISO/IEC/IEEE 24765-2010) For the development part of
the software life-cycle processes this would include practices for planning, creating, testing, and deploying
a software system.
software engineering: (A) The application of a systematic, disciplined, quantifiable approach to the
development, operation, and maintenance of software; that is the application of engineering to software. (B)
The systematic application of scientific and technological knowledge, methods, and experience to the
design, implementation, testing, and documentation of software. (ISO/IEC/IEEE 24765-2010)
software quality: (A) The totality of features and characteristics of a software product that bear on its
ability to satisfy given needs, such as conforming to specifications. (B) The degree to which software
possesses a desired combination of attributes. (C) The degree to which a customer or user perceives that
software meets the user’s composite expectations. (D) The composite characteristics of software that
determine the degree to which the software in use will meet the expectations of the customer. (E)
Capability of the software product to satisfy stated and implied needs when used under specified
conditions. (ISO/IEC/IEEE 24765-2010)
software reliability (SR): (A) The probability that software will not cause the failure of a system for a
specified time under specified conditions. (B) The ability of a program to perform a required function under
stated conditions for a stated period of time.
NOTE—For definition (A), the probability is a function of the inputs to and use of the system, as well as a function of
the existence of defects in the software. The inputs to the system determine whether existing defects, if any, are
encountered.
17
IEEE Std 1633-2016
software reliability engineering (SRE): (A) The application of statistical techniques to data collected
during system development and operation to specify, estimate, or assess the reliability of software-based
systems. (B) The application of software reliability best practices to enhance software reliability
characteristics of software being developed and integrated into a system.
software reliability estimation: The application of statistical techniques to observed failure data collected
during system testing and operation to assess the reliability of the software.
software reliability model: A mathematical expression that specifies the general form of the software
failure process as a function of factors such as fault introduction, fault removal, and the operational
environment.
software reliability prediction: A forecast or assessment of the reliability of the software based on
parameters associated with the software product and its development environment.
NOTE—Reliability predictions are a measure of the probability that the software will perform without failure over a
specific interval, under specified conditions.
3.2 Acronyms and abbreviations
API Application Programmers Interface
ASIC Application Specific Integrated Circuit
BIOS Basic Input Output System
BOM bill of materials
CASRE computer-aided software reliability estimates
CIL critical items list
CMMI® Capability Maturity Model Integration® 6
COTS commercial-off-the-shelf software
CR capture/recapture
CSCI computer software configuration item
CTMC Continuous Time Markov Chain
DD defect density
DDL dynamically linked libraries
DDN defect days number
DLOC defects per line of code
EFSM extended finite state machine
EKSLOC Effective 1000 Source Lines of Code
6
Capability Maturity Model Integrated and CMMI are registered in the U.S. Patent and Trademark Office by Carnegie Mellon
University. This information is given for the convenience of users of this standard and does not constitute an endorsement by the
IEEE of these products. Equivalent products may be used if they can be shown to lead to the same results.
18
IEEE Std 1633-2016
FD fault days
FDIR fault/failure detection, isolation, and recovery
FDSC failure definition and scoring criteria
FI fault injection
FMEA failure modes and effects analysis
FOSS free open source software
FPGA field programmable gate array
FTA fault tree analysis
FW firmware
GFS government furnished software
GO Goel-Okumoto
HW hardware
IDD interface design document
IR intermediate representations
JM Jelinski-Moranda
JVM Java™ Virtual Machine 7
kB kilobytes
KSLOC 1000 source lines of code
LCM life-cycle model
LRU line replaceable unit
MCDC multiple condition decision coverage
MCUM Markov Chain Usage Model
MTBCF mean time between critical failure
MTBEFF mean time between essential function failure
MTBF mean time between failure
MTBSA mean time between system abort
MTSWR mean time to software restore
MTTR mean time to repair
7
Java is a trademark of Sun Microsystems, Inc. in the United States and other countries. This information is given for the
convenience of users of this standard and does not constitute an endorsement by the IEEE of these products. Equivalent products
may be used if they can be shown to lead to the same results.
19
IEEE Std 1633-2016
OP operational profile
OS operating system
PHA preliminary hazards analysis
PIE propagation, infection, execution
RBD reliability block diagram
RCA root cause analysis
RCM reliability criticality metric
RDT Reliability Demonstration Test
RePs reliability prediction system
RFP request for proposal
RG reliability growth
RMA reliability maintainability availability
RPN risk priority number
RT requirements traceability
SCODE software code
SDD software design document
SDP software development plan
SFMEA software failure modes effects analysis
SFTA software fault tree analysis
SLOC source lines of code
SR software reliability
SRE software reliability engineering
SRGM Software Reliability Growth Model
SRM Software Reliability Model
SRPP software reliability program plan
SRS software requirements specification
SUT system under test
SW software
SWDPMH software defects per million hours
SyRs system requirements specification
20
IEEE Std 1633-2016
TNI trouble not identified
UI user interface
UPM Unified Process Model
WG Working Group
4. Role, approach, and concepts
4.1 What is software reliability engineering?
IEEE Std 610.12-1990 defines reliability as: “The ability of a system or component to perform its required
functions under stated conditions for a specified period of time.” Software reliability engineering (SRE)
supports cost-effective development of high-reliability software.
While reliability is a critical element of software quality, it is not the only one. SRE does not directly
address modeling or analysis of other important quality attributes, including usability, security, or
maintainability. There are many proven practices and technologies that address the full spectrum of
software quality as well as specific aspects. SRE should be applied in concert with them.
SRE can include analyses to prevent or remove software defects that lead to failures. SRE can also provide
unambiguous and actionable information about likely operational reliability throughout the software
development cycle. Just as the accuracy of weather forecasts is generally better for the very near term, SRE
predictions are more accurate as the software development nears completion.
4.2 Strategy
SRE allows developers to have confidence that a software system will meet or exceed its quantitative and
qualitative reliability requirements. SRE uses a data-driven strategy to achieve this, which includes but is
not limited to the following process areas, analyses, and metrics:
a) Planning—Determine the scope of the software reliability (SR) program based on the availability
of resources and the needs of the project. Identify the software line replaceable units (LRUs) that
will be included in the SRE.
b) Analyzing—Assess the software related risks and perform the failure mode analyses early to
develop requirements and design that support more reliable software.
c) Prediction—Predict SR in accordance with the system operational profile (OP) and performing the
sensitivity analysis early to identify appropriate goals for the SR and identify weaknesses in the
planned development activities that could affect the reliability of the software. Define requirements
for acceptable failure rate, residual defects, defect backlog, availability, or other quantitative
measure taking into account the criticality and interface and architectural dependencies.
d) Testing—Conduct testing that is representative of field usage while also covering the requirements,
design, and code so that the software is tested as per its OP. Develop test suites that achieve
representative sampling of the operations, modes, and stresses. Count the number of defects
discovered (each event is a fault) and corrected, and monitor the discovery rate. As needed revise
the predictions. Estimate the system’s operational reliability from the frequency and trend of
observed failures using a reliability growth model or failure intensity graph. Determine if the
estimated number of residual defects is acceptable.
21
IEEE Std 1633-2016
e) Release—Use SRE to support a release decision. Analyze trends in faults and failures to support
transitioning from software milestone decisions. Report the release readiness of the software based
on its estimated operational reliability and estimated number of residual defects.
f) Operation—Monitor and model the number and rate of software failures so as to improve upon
future SR analyses.
Each process area is presented in Clause 5. The SR procedures shown in Clause 5 can be applied regardless
of the life-cycle model. The tasks are performed whenever the associated development activity is
performed unless specified otherwise. The clauses of this document are aligned with the six activities
shown in Figure 2.
Figure 2 —Summary of the SRE activities

All life-cycle models consist of a sequence of general activities, known by many names, including phase,
increment, and sprint. For simplicity, they are called stages. As a project progresses through its stages, its
assumptions, plans, and commitments change. Therefore, the Software Reliability Program Plan (SRPP)
and its activities should be reconsidered and revised or elaborated to maintain alignment and completeness.
These transition points occur least frequently in a phased project, more frequently in an incremental project,
and most frequently in an evolutionary project. At each stage transition, it is critical to consider what has
changed and what these changes imply for all aspects of the reliability plan. Figure 3 is a stage transition
checklist.
22
IEEE Std 1633-2016
a) System functions and features typically are added, revised, or dropped. Are the operations and
usage assumptions of the OP the same? If not, has the OP and the reliability test plan been revised
accordingly?
b) Has the reliability test suite been maintained so that there is no loss of coverage owing to changes
in the system under test?
c) If any of the following have been changed, have the related models, plans, or test design been
changed accordingly?
1) Reliability requirements
2) Identified risks
3) Identified failure modes and effects
4) Critical operating modes
5) Security, performance, vulnerability, and other related risks
6) Component dependencies
7) Components sourced from a different supplier
d) How do the assumed risk factors (e.g., code size, code complexity, number of interfaces, or team
capability) compare to actual risk metrics in the present and upcoming increment/sprint?
e) For completed components with adequate reliability testing, how does the actual failure data
compare with predicted failure intensity?
f) If component testing was limited or blocked, in what future increment or sprint will it be resolved?
g) Does the reliability model appear to be accurately tracking the observed failure discovery rate?
Figure 3 —Stage transition checklist

The questions should be expanded to include any project-specific factors that have a direct bearing on the
outcome of the SRE Plan.
4.3 Project-specific SRE tailoring
The purpose of SRE is to assure that released software meets its reliability requirements using both data-
driven quantitative modeling and analysis as well as qualitative analyses. All of the practices in this
recommended practice support that goal. Some of them are essential, some are typically useful, and some
are project specific. SRE tailoring is the process of deciding which recommended practices to apply to a
specific project. Following the tailoring process defined in IEEE Std 12207™-2008, Annex A, is
recommended. 8 Taking these general conditions into consideration, the practitioner should answer the
following questions in Figure 4.
8
Information on references can be found in Clause 2.
23
IEEE Std 1633-2016
a) Which SRE activities are to be included in my project?

b) Which SRE activities are relevant for my particular role?
c) For projects that follow an incremental or evolutionary software development, to what extent will
partial data failure data be collected and analyzed before the release candidate is ready? Each of the
tasks in Clause 5 discusses application of that SRE task to incremental or evolutional software
development models when applicable.
Figure 4 —SRE tailoring checklist
Table 1 lists the recommended practices defined in Clause 5 of this recommended practice, indicating the
tailorability of each, as follows:
 Essential activities should be performed to achieve the basic goals of SRE. If an essential activity
is omitted, the risk of an incorrect prediction and inadequate operational reliability is significant.
 Typical activities are usually performed to support essential activities and should be included
unless there is a good reason to omit them. For example, in a system with just a single software
component, multi-component allocation is not necessary.
 Project specific activities are usually performed only when a specific condition warrants their
inclusion. For example, in a non-critical system, developing and testing with a separate OP of
critical operations is not needed.
Table 1 also illustrates the SRE activities based on the role of the personnel who will typically perform or
assist with the activity. Organizations vary with regard to engineering roles, hence the following are the
typical roles and responsibilities.
 Reliability engineers—These engineers typically have a background in hardware reliability but not
necessarily a background in software engineering or SR. Their role is typically to do predictions
and merge the predictions into the reliability block diagram (RBD) to yield a system reliability
prediction. They also perform allocations that include both software and hardware.
 Software management—These engineers manage the day-to-day development activities, which
also include scheduling of personnel. Software management uses SR predictions to predict the
effort required to maintain the software, schedule the spacing of the releases so as to avoid “defect
pileup,” schedule the testing and corrective action resources necessary to determine that the
software meets the required objective, and perform a sensitivity analysis to determine the
development practices that are most and least related to reliable software.
 Software quality assurance or testing—These engineers are responsible for audits and
assessments of the software deliverables as well as testing of the software. They are typically the
persons who use the SR growth models during testing to determine how many more defects are
remaining and the test effort required to discover them.
 Acquisitions—Acquisitions personnel can be either commercial or government employees. They
are purchasing or acquiring software components or an entire system containing software. They
may use SR assessments to select or evaluate subcontractors, contractors, or suppliers. They may
also use the predictions to establish a system reliability objective to be used for contractual
purposes. They also use the SR models to monitor the progress of the software and system
reliability.
Reliability engineering and software engineering historically have not had defined relationships. Hence, a
SR liaison may be defined to coordinate the efforts of the software organization with the reliability
engineering organization. The SR liaison may be a reliability engineer who is knowledgeable about
24
IEEE Std 1633-2016
software or a software engineer who is knowledgeable about reliability engineering or a system engineer
who is knowledgeable about both.
Typically the SR work is performed at the beginning of a program and at every milestone during the
program. It is important that the reliability engineers schedule their efforts to work with all involved parties
in preparation for those milestones. Depending on the life-cycle model (LCM), the milestones can vary,
hence in 4.4, the SRE effort is illustrated for several different software LCMs.
The following figure illustrates a summary of the role of the software managers, reliability engineers,
software quality engineers, and acquisitions. Clause 5 describes the data flow in more detail. Figure 5
illustrates some of the key data that is input to the reliability models and data as well as the key metrics that
are retrieved from the SRE models and data.
Figure 5 —Relationship of SRE stakeholders

Table 1 illustrates the tasks typically performed by each role. The key to the table is as follows:
Role
RE—Reliability engineer or systems engineer
SQA—Software quality assurance or testing
SM—Software management
ACQ—Acquisitions personnel who are purchasing or acquiring software from other organization
Tailoring
E—Essential
T—Typical
PS—Project specific
Activity for each role

L—This person leads the activity
X—This person provides inputs to the activity
C—This person consumes the results of the activity
R—This person reviews the results of the activity
25
IEEE Std 1633-2016
Table 1 —Essential, typical, and optional SRE activities by role
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.1 Planning for software reliability—These are SRE activities that need to be performed prior to using any SRE
models or analyses
5.1.1 Characterize the software system – E L X R Software and Systems Engineering

Identify the software and firmware C Vocabulary (SEVOCAB):
LRUs, bill of materials (BOMs), http://pascal.computer.org/sev_display/index.
OP, and impact of software on the action
system design. IEEE Std 12207-2008
IEC 61014:2003 [B29]
5.1.2 Define failures and criticality— E C C C L This step may be implemented within the
Every product and system has a or measurement process in 6.3.7 of
unique definition of failure R IEEE Std 12207-2008
criticality that should be defined in
advance so that the software is
designed, tested, and measured
correctly.
5.1.3 Perform a reliability risk T L X X L IEEE Std 730™-2014 [B32] Tables C.8 and
assessment—Safety, security, or or C.16 (see the definition of integrity level)
vulnerability, product maturity, L R IEEE Std 15026™-3-2013 [B34]
project and schedule risks can have
an impact on the required
reliability.
5.1.4 Assess the data collection T X L X R IEEE Std 730™-2014 [B32] Tables 23 and
system—The analyses and models 31, and C 3.6.2
will require software defect,
product and process related data.
5.1.5 Review available tools needed for T L X X R
SR
5.1.6 Develop a software reliability E L X X R
plan—Identify what SR activities X
are relevant for a particular
software release.
5.2 Develop failure modes model IEC 61014:2003 [B29]
5.2.1 Perform software defect root cause PS C L X R PS
analysis (RCA)—Identify the most
common type of software defect so
that practices can be put in place to
identify and remove those
categories of defects prior to
deployment.
5.2.2 Perform software failure modes PS L C X L PS
effects analysis (SFMEA)—This C or
bottom-up analysis is similar to a R
hardware (HW) FMEA except for
the software failure modes and root
causes and the software viewpoints.
5.2.3 Include software in the system fault PS L C X L PS
tree analysis (FTA)—The FTA is C or
useful for identifying single and R
multiple point failures from a top-
down viewpoint.
26
IEEE Std 1633-2016
Table 1—Essential, typical, and optional SRE activities by role (continued)
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.3 Apply SR during development—All of these activities can be IEC 61014:2003 [B29]
performed well before the code is written or tested. IEEE P24748-5 [B30]
5.3.1 Identify/obtain the initial system E L C C L
reliability objective—The required or or
or desired MTBF, failure rate, R R
availability, reliability for the
entire system.
5.3.2 Perform a SR assessment and E L X X L
prediction—Predict the MTBF, C or
failure rate, availability, reliability, R
and confidence bounds for each
software LRU.
5.3.3 Sanity check the prediction— T L L
Compare the prediction to
established ranges for similar
software LRUs
5.3.4 Merge the predictions into the E L R
overall system predictions—
Various methods exist to combine
the software and hardware
predictions into one system
prediction.
5.3.5 Determine an appropriate overall E L X X X
SR requirement —Now that the C R
system prediction is complete,
revisit the initial reliability
objective and modify as needed.
5.3.6 Plan the reliability growth— E L X X R
Compute the software predictions C C C
during and after a specific level of
reliability growth testing.
5.3.7 Perform a sensitivity analysis— PS X L
Identify the practices, processes, C
techniques, risks that are most
sensitive to the predictions.
Perform trade-offs.
5.3.8 Allocate the required reliability to T L X R See NOTE 2.
the software LRUs—The software C
and hardware components are
allocated a portion of the system
objective based on the predicted
values for each LRU.
5.3.9 Employ SR metrics for transition T X L X R Software and Systems Engineering
to testing—These metrics C C Vocabulary (SEVOCAB):
determine if the software is stable http://pascal.computer.org/sev_display/index.
enough to be tested. action
27
IEEE Std 1633-2016
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.4 Apply SR during testing—These SRE activities are used once the ISO/IEC 29119-1:2013, Concepts and
software LRUS have been integrated. Definitions (published September 2013);
ISO/IEC 29119-2: Test Processes (published
September 2013); ISO/IEC 29119-3: Test
Documentation (published September 2013);
ISO/IEC 29119-4: Test Techniques (at DIS
stage, anticipating publication in late 2014)
5.4.1 Develop a reliability test suite T L X X R

5.4.2 Increase test effectiveness via PS C L
software fault insertion—This is C
when identified failure modes are
instrumented such that the software
identifies and recovers from them
as required.
5.4.3 Measure test coverage—Measures T C X X R
the software coverage from both a C C
structural and functional
standpoint.
5.4.4 Collect fault and failure data—This E L X R
data is collected during testing and
is required for using the SR metrics
and models. This subclause
includes two very simple metrics
that help determine the best
reliability growth model.
5.4.5 Select reliability growth models— E L X C R
The best models are those that fit
the observed fault trends.
5.4.6 Apply SR metrics—These metrics T C L X R
are used in conjunction with the SR X
models.
5.4.7 Determine the accuracy of the E L L R
prediction and reliability growth
models—The prediction models in
5.3.2 and the SRG models in 5.4.5
can be validated against the actual
observed MTBF and failure rates
to determine which model(s) are
the most accurate for this particular
software release.
5.4.8 Revisit the defect RCA—The PS C L X R IEEE Std 730™-2014 [B32] Tables C.6 and
defect RCA can be updated once C.16
testing is complete and used to
improve the defect removal of the
next software release.
28
IEEE Std 1633-2016
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.5 Support release decision—SR results can be used to determine if IEEE Std 1012-2012 [B33]
the software is ready for release. If it is not, the models can
determine how much more testing time and defects are required to
be removed to be ready for release.
5.5.1 Determine release stability E C L C R IEEE Std 1012-2012 [B33], 7.2.2,
particularly 7.2.2.3.6. See NOTE 3.
5.5.2 Forecast additional test duration— T C L C
If the required reliability is not
met, determine how many more
testing hours are required to meet
it.
5.5.3 Forecast remaining defects and E C L C
effort required to correct them—If
the required reliability is not met,
determine how many more defects
and staffing is required to meet it.
5.5.4 Perform a Reliability PS L R
Demonstration Test (RDT)—
Statistically determine whether a
specific reliability objective is met.
5.6 Apply SR in operation—All of these activities are employed once the software is deployed.
5.6.1 Employ SR metrics to monitor T X X X
operational SR—It is important to C
monitor operational reliability to
know how the reliability in the
field compares to the estimated and
predicted.
5.6.2 Compare operational reliability to T L R
predicted reliability.
5.6.3 Assess changes to previous T X X X
characterizations or analyses— C
Adjust the reliability models inputs
and assumptions accordingly to
improve the next prediction.
5.6.4 Archive operational data— T X X X X
Organize the data for use for future C
predictions.
6.2, Methods for predicting defect E X L R
Annex density and fault profile.
B
6.3, SR growth models E C L R
Annex
C
NOTE 1—Within the framework of the standards harmonized with the IEEE Std 12207-2008, the concept of reliability is
often bundled with the concept of integrity and expanded in the discussion of integrity levels.
NOTE 2—This step may be implemented within the software requirements analysis process in 7.1.2 of IEEE Std 12207-2008.
NOTE 3—This step may be implemented within the decision management and measurement processes in 6.3.3 and 6.3.7,
respectively, of IEEE Std 12207-2008. See also details for verification and validation processes in IEEE Std 1012-2012 [B33].
29
IEEE Std 1633-2016
Annex D contains a relative ranking of the culture change, effort, calendar time, and automation of each of
the preceding tasks. Several of the preceding tasks can be merged with and sometimes even replace an
already existing software development or reliability engineering task.
4.4 Life-cycle considerations for SRE
The general SRE activities of planning, analysis, testing, and acceptance may be applied to any type of
software product and in any kind of LCM. Although the underlying practices are independent of a process
model, considerations for application of SRE practices varies among these processes.
IEEE Std 12207-2008 recognizes three basic software LCMs, as follows:
a) Phased: Often called “waterfall” development, in a phased project all requirements are defined
first. Then design, programming, and testing follow in sequence, with backtracking as necessary.
US DoD standard 2167-A provides an example.
b) Incremental: Most requirements are defined first, then partitioned into subsets. Requirement
subsets are designed, programmed, and tested, typically in a sequence of subsets (“increments.”)
Later increments typically extend or revise the requirements and artifacts of earlier increments. The
Rational Unified Process is a widely used incremental process (see [B74]).
c) Evolutionary: Often called “agile” development, an evolutionary project begins with a general
notion of a solution. This is divided into subsets to be explored and implemented in short cycles,
typically spanning about one month. In each such cycle, usage examples are identified and
expressed as test conditions, which are prerequisite for programming. Other work products
antecedent to delivered software (e.g., line-item requirements, design documentation) are eschewed
in favor of direct interaction with system users or customers. Programming and unit testing is
conducted to implement the usage examples selected for a cycle. There are many variations.
In practice, the preceding incremental and evolutionary LCMs are adapted and refined to meet practical
considerations.
Being data-driven and working from high-level models to actual implementation measurements, most SRE
practices require the results of one or more activities as an input to another. For example, defect predictions
require certain codebase metrics and system-scope reliability predictions require either an actual or
assumed system structure. As incremental and evolutionary processes produce a series of partial
antecedents and implementation, SRE activities should be adjusted accordingly.
The charts in the following subclause show how the software development processes defined in
IEEE Std 1220-2008 may be achieved under phased (waterfall), incremental, and evolutionary (agile)
development strategies. The IEEE 12207 processes are depicted as horizontal lines. The non-dimensional
arrow of time points left to right. The solid area above a line indicates the relative effort allocated to that
process. Roughly, the total colored area indicates the total effort expended on the process. The implicit
vertical axis for each process is the percent of the process’ total effort. Upward and downward ramps
suggest increasing and decreasing work. These charts are intended to suggest how SRE practices align with
these basic process patterns. The charts are notional and suggestive. They are not intended to be
prescriptive. They show that with the necessary changes, SRE may be applied in a wide range of LCMs.
All of the procedures in this document can be used whether there is a waterfall life-cycle model or an
incremental life-cycle model. Table 2 summarizes how the procedures are applied for each.
30
IEEE Std 1633-2016
Table 2 —How to apply software reliability for an incremental life cycle

Reference Phased development life cycle Incremental and evolutionary development life cycle
5.3.7 Sensitivity analysis using prediction models Sensitivity analysis using prediction models can identify
can identify how to make one release less whether moving riskier software features to an earlier
risky increment improves the overall reliability
5.3.9 Metrics are used to determine whether a Metrics are employed for each increment and also
particular increment should transition to the summed for the entire operational release
next phase
5.5 Decision making on when to release the Decision making on when to release the software is
software is based on the results of one based on the reliability results of the final increment
increment
5.2.3 step b, Defect pileup is measured by combining the Defect pileup is measured by merging the predicted
5.4.6, 5.5.3 predicted defects from this release to the defects from each increment with the actual defect
actual defect profiles from prior releases as profiles from prior releases as well as future releases.
well as future releases.
5.3.2.3, 6.2, Prediction models can be used as soon as the Prediction models can be used on entire release or to
and Annex B concept is defined each internal or external release or both
5.4, 6.3, and Reliability growth models can be used as Reliability growth models can be used for each
Annex C soon as the software is integrated with other increment. The results of each increment can be merged
software and with the hardware for a final reliability growth estimation.
Figure 6 shows that the processes in a phased life cycle are sequential (a phase may not begin until its
antecedent phase is substantially complete) but not mutually exclusive. The long tails indicate that work to
accommodate revisions and debugging typically continues for the duration of a project, in most of the
processes. The Integration segments assume that partial integration testing is performed as some slice of the
system is complete.
This follows common practical use of the phased model, not the naïve assumption of strictly sequenced
phases.
Figure 6 —SRE practices in a phased life cycle

Figure 7 is a mapping of the Unified Process Model (UPM) work flows onto the IEEE 12207 processes
(see Jacobsen et al. [B36]). The UPM is probably the most widely recognized variant of incremental life
31
IEEE Std 1633-2016
cycle. The height and ramps of solid areas closely follow Jacobson et al. Where the UPM defines
“implementation” and “test” workflows, these are interpreted as the IEEE 12207 processes “construction,”
“integration,” and “qualification testing.”
An incremental project seeks to identify most of the requirements early on. A provisional design is
produced and revised in parallel. The requirements and architecture that drive SRE planning, analysis, and
prediction are typically substantially complete after the elaboration phase. The system is then constructed in
increments, typically with their own integration and end-to-end testing. Revision and elaboration to all
work products are expected in the construction phase. As a result, SRE predictions should be revised as
new information becomes available. Figure 7 suggests how the focus of SRE activities varies as an
incremental project progresses.
Inception—This is a short phase in which the following things are established: justification for the project;
project scope and boundary conditions; use cases and key requirements that will drive the design trade-offs;
one or more candidate architectures; identified risks; preliminary project schedule and cost estimate.
Elaboration—In this phase the majority of requirements are captured; use cases, conceptual, and package
diagrams created.
Construction—The largest phase in which system features are implemented in a series of short, timed
iterations.
Transition—In this phase the system is deployed to end users.
Figure 7 —SRE in an incremental development project

Evolutionary development, generally known as agile, is organized as a sequence of projects to produce
small working subsets of a system in a fixed time frame, typically about four weeks. There are many
variations of evolutionary development defined by software methodologists. Organizations and project
teams have also devised many hybrid process models. In scrum, each subproject is called a “sprint.”
Typically, the first sprint focuses on overall planning, producing a high-level understand of project scope.
Subsequent sprints start with elicitation of a small requirements subset, followed by programming and
testing for only that subset. Integration typically occurs throughout a sprint.
32
IEEE Std 1633-2016
As a result, a system is constructed in many small steps. These steps can produce the software metrics and
failure data that drive SRE practices, but until the final sprint, information needed for SRE planning,
modeling, and analysis is necessarily incomplete. Refer to Figure 8.
Some evolutionary approaches call for one or more planning sprints and one or more transitional sprints
(Ambler and Lines [B2]). A key difference of these sprints is that they are not necessarily intended to
produce working software. Planning sprints may be used to initiate SRE planning and analysis. Transitional
sprints may be used to apply SRE testing and release evaluation to a release candidate in its entirety. For
evolutionary projects, the recommended practice is therefore:
 Include one or more planning sprints at the start of a project to establish a minimal system-scope
baseline of SRE planning and analysis artifacts.
 Include one or more reliability sprints at the end of a project to conduct system-scope testing on a
completed release candidate so that meaningful reliability predictions may be developed and
evaluated. Construction and integration should be limited to making and verifying corrective action
fixes, followed by rerunning the system-scope reliability test suite.
Figure 8 —SRE in an evolutionary process

Many software projects may be characterized as maintenance; they are intended to change an existing
codebase for perfective, corrective, or adaptive purposes. Even if the reliability requirements of the
resulting system are unchanged, the system should be retested with a reliability test suite. During
maintenance, SRE is also useful for determining warrantees and determining the number of software
engineers who are needed to support the field. Refer to Figure 9.
33
IEEE Std 1633-2016
Figure 9 —SRE in maintenance
5. Software reliability procedure
5.1 Plan for software reliability
Prior to implementing SRE there are some planning tasks that should be performed first to increase the
effectiveness of the SRE activities. Figure 10 illustrates the inputs and outputs and flow of the SRE
planning tasks.
Figure 10 —Plan for software reliability
34
IEEE Std 1633-2016
Table 3 illustrates the benefits and applicability for incremental development models. Note that all of the
planning tasks are applicable for incremental or evolutionary LCMs.
Table 3 —SRE planning tasks

Applicable for
Reference Benefits
incremental development
5.1.1 Characterize the software system Determines which system components are Yes—Characterize all
appropriate for SRE, how the software will LRUs that will be
be used operationally, how the software developed in all increments.
impacts the overall system.
5.1.2 Define failures and criticality Determines the specific types of failures that This task is not dependent
impact reliability for this particular system. on software life cycle.
5.1.3 Perform a reliability risk Identifies risks such as safety, security, These tasks should be
assessment product maturity, size, and reliability growth performed at first increment
that can affect both the reliability and the and revisited whenever any
required SRE tasks. plans change.
5.1.4 Assess the data collection system Identifies any refinements needed to data
collection system to support SRE.
5.1.5 Review available tools needed for Identifies any tools needed for SRE.
SR
5.1.6 Develop an SRPP Identifies which SRE activities will be
implemented and when.
5.1.1 Characterize the software system
Before the SR metrics and models can be used, the following things need to be identified:
a) Software line replaceable units (LRUs) in the system

b) Software bill of materials (BOM)
c) Operational profile (OP)
d) Relationship of the software components to the system design
5.1.1.1 Identify the software LRUs in the system
This task is essential and is a prerequisite for performing a SFMEA (5.2.2), performing a reliability
assessment and prediction (5.3.2), and allocating the required SR to all software LRUs (5.3.8). The
software manager(s) have the primary responsibility for identifying the software LRUs and making them
available to the reliability engineers and other stakeholders. The reliability engineers need to have the list of
software LRUs prior to performing any predictions. The list of software LRUs will determine the scope of
the SRE effort, which ultimately affects the SRPP in 5.1.6. If the software LRUs are all developed by
different organizations, more effort is required for the SRE activities than if all LRUs are developed by the
same organization.
A software LRU is the lowest level of architecture for which the software can be compiled and object code
generated. Software configuration items are commonly referred to a computer software configuration item
(CSCI). However, a CSCI may be composed of more than one LRU. Hence the term LRU will be used in
this document. While hardware components are replaced with identical hardware components, software
components are updated whenever software defects are corrected. The lowest replacement unit for software
will be either a dynamically linked library (DLL) or an executable. In even a small system there will be
usually more than one software LRU. In medium or large systems there may be dozens of software LRUs.
If the software has been designed cohesively each LRU can fail independently of the others. For example,
35
IEEE Std 1633-2016
if the GPS in a car is not functioning properly due to software, one should still be able to drive the car, use
the stereo, retract the roof, use the rear camera when in reverse, etc. It is a common practice to predict the
reliability of all of the software LRUs combined. However, this is not a recommended practice unless there
is only one replaceable unit, which is rarely the case with large complex software intensive systems. The
practitioner should identify the software LRUs so that they can be properly added to the RBD or other
system reliability model.
To identify the software LRUs one should look at the entire system and identify all components and then
identify which components are applicable for a SR prediction. This includes looking at all associated
hardware units on which software may reside, including central processing units (CPUs), application
specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs).
From the hardware point of view there is either an ASIC or an FPGA. FPGAs are applicable for SR
because they are programmable. However, not all need the level of scrutiny or a SR prediction. Take for
instance, the Basic Input Output System (BIOS), a configurable firmware-level component. The BIOS is
typically a bootstrap component that generally either works or does not work upon startup. Due to the fact
that it can usually be verified as working or not working during testing, it may not be applicable for
inclusion in SR from a cost/benefit perspective. There may also be sensor firmware at this level or other
firmware that interfaces directly with the FPGA or ASIC. Firmware is the combination of a hardware
device such as an integrated circuit and computer instructions and data that reside as read only software on
that device. While software may be updated on a regular basis over the life of a system, the firmware is
typically updated less often over the life of the device. Above the firmware level is the operating system
(OS).
It is possible that there are different OSs installed on different processing units. Above the OS level is the
application level for the processing unit and above that are the applications that make the system under
consideration to do what it is meant to do. Government furnished software (GFS) and commercial-off-the-
shelf software (COTS) and free open source software (FOSS) are examples of applications.
The advent of rapid application development methodologies such as Agile, Dev Ops, etc., have led to
increased dependency on third-party software in the application development process and with it an
increase in the use of FOSS. Not only can the third-party software have defects in it, but systems can fail
because of mismatched interfaces to the third-party software. For example, the wrong driver might be used
with a particular device. Or there could be incompatible Application Programmer Interfaces (APIs). Or
worse there could be multiple versions of the same application in a larger system [e.g., Java Virtual
Machine (JVM)], and all run simultaneously in support of different system features.
Glueware is the software that connects the COTS, GFS, or FOSS with the rest of the software system. An
adapter is the portion of the glueware that handles the actual interface. However, the glueware may have
additional functionality over and above the adapter. If there is no COTS or GFS then there is not a need for
glueware. In nearly any system there will be newly developed application software even if the system is
largely composed of COTS, GFS, or FOSS. Software development organizations often underestimate both
the newly developed software and the glueware when the system is composed of a significant number of
COTS, GFS, or FOSS.
Middleware allows different components to communicate, distributes data, and provides for a common
operating environment. Middleware is generally purchased commercially. Middleware is not needed for
systems with a small number of components or computers. There could be several software configuration
items, each serving a different purpose such as interfacing to hardware, managing data, or interfacing to the
user. At the COTS or middleware layer there could be one or more databases. There could also be a user
interface at the application software or COTS layer.
The software architecture diagram(s) should describe all of the software configuration items and their
interaction to each other. Any software configuration item that is an independent program or executable
should be considered to be an LRU. Figure 11 contains the checklist for identifying software LRUs.
36
IEEE Std 1633-2016
a) Software predictions are conducted at the program or executable level. The predictive models in
5.3.2 are not performed on the modules or units of code because the lowest replaceable unit is the
application, executable, or DLL. Usually each LRU has a one-to-one relationship with a CSCI).
However, in some cases CSCIs are composed of more than one software LRU. Hence, a listing of
CSCIs does not always provide a listing of LRUs.
b) Identify all in-house developed applications, executables, and DLLs, and add to list of software
LRUs.
c) Identify all COTS software and add them to the list of software LRUs. Examples include OSs and
middleware as well as others. Do not include COTS software that is not going to be deployed with
the system (i.e., development tools)
d) Identify all GFS and add them to the list of software LRUs. Do not include GFS that will not be
deployed with the system.
e) Identify all FOSS and add them to the list of software LRUs. Do not include FOSS that is not going
to be deployed with the system (i.e., development tools)
f) Identify all glueware and middleware and add them to the list of software LRUs.
g) BIOS is a simple bootstrap that is probably not relevant for a reliability prediction.
h) FPGAs are applicable for SR but may already be part of hardware predictions.
i) For each LRU listed in step b) through f) establish the following items:
 Names of each executable
 Appropriate development organization
 Size of each LRU (see Annex A)
 Expected duty cycle of each software configuration item (see 5.3.2.3 Step 3)
j) Construct the BOM as discussed in the next subclause.
Figure 11 —Checklist for identifying software LRUS
5.1.1.2 Construct a software bill of material (BOM)
The software BOM is a project specific task. Any system that is comprised of several software LRUs,
particularly systems with COTS, FOSS, firmware, or drivers, should be constructing a software BOM.
Today’s software development methodologies are progressing rapidly to match the required innovations in
time to market of services and applications. The new development approaches such as agile, has forced
developers and companies to become more dependent on COTS and FOSS in their development process.
It is typical for organizations to identify a final product software image as a single element on the main
product BOM regardless of how many software LRUs comprise the software product. The rationale is that
the current software development life cycle can manage all dependencies adequately. Unfortunately this
approach can cause problems when hardware and software come together during the product manufacturing
and final system assembly if there are multiple software LRUs. Therefore, there should be a process where
hardware and software BOM development and components are managed and tracked. This capability
allows for identifying any key dependencies between the hardware and software components, and flag any
reliability issues. This capability goes well beyond software configuration management as it addresses the
configuration of the software as a whole and not just the configuration of each of the individual software
LRUs.
37
IEEE Std 1633-2016
By connecting the application(s), COTS, FOSS, drivers, and firmware with the suppliers through a
software BOM as part of overall software image and aligning it to hardware components in the overall
product BOM, any reliability issues with COTS, FOSS, driver, firmware, and hardware can be narrowed
down to the right source. This capability provides the companies with the ability to improve their
management of software and hardware reliability and quality. See F.1.1 for an example.
Figure 12 contains a checklist for constructing the BOM.
a) Define the part numbering system.

1) Identify the software LRU classes (COTS, FOSS, company owned, mutual agreement, license
based, etc.)
2) Determine a set of unique codes for each software LRU class.
b) Verify that there is a configuration management tool / database available.
c) List all the software LRUs in the product as per 5.1.1.1
d) Assign a class and a unique BOM ID to each software LRU.
e) Complete tables similar to Table F.1.
f) Identify the relationships between software LRUs and hardware LRUS.
g) Include all associated documentation or links to the BOM.
Figure 12 —Checklist for constructing bill of materials
5.1.1.3 Characterize the operational environment
This is an essential task that is a prerequisite for developing a reliability test suite (5.4.1), using the SR
growth models (see 5.4.5), and measuring test coverage (5.4.3). This subclause is highly recommended
prerequisite for using the SRE predictive models (see 5.3.2.3 Step 3).
Software does not fail due to wear out or other physical deterioration. It also does not necessarily fail
as a function of calendar time. The discovery rate of software failures and the deployed latent defects
are directly related to how much the software is used and the manner in which it is exercised during
testing and operation.
A profile is a set of disjoint alternatives in which the sum of the probabilities of each alternative is
equal to 1. An OP for software yields the most likely usage of the software in advance of development
and testing. Without an established OP, it is possible that the software development and testing might
focus on less frequent features and modes, which could then result in unexpected field support and
reliability issues once deployed operationally. Additionally, the SR growth models discussed in 5.4.5
are accurate if and only if the software is being stressed as per its OP.
Figure 13 contains the steps for characterizing the OP. See F.1.2 for a complete example of an OP.
38
IEEE Std 1633-2016
a) Locate the customer profile. A customer is any group or organization that is acquiring the system.
The customer can be internal or external to the development organization. Identify the percentage
of usage by each customer group.
b) Identify the user profile. The user is a person or group of people who are using the system for a
specific goal. Identify the percentage of usage by each user group within each customer group.
c) Define the system mode profile. A system mode profile is a set of operations that are grouped based
on the state or behavior of the system. Identify the percentage of usage by each system mode in
each user group and customer group.
d) Determine the functional profile. A function in this context is a set of tasks or operations that can
be performed by the system. A functional profile identifies the usage of each of these tasks based
on the system model, user, and customer. Identify the percentage of usage by each function in each
system model by each user group and customer group.
e) Compute the OP by multiplying each of the previous percentages. See F.1.2 for a complete
example.
f) Subclause 5.4.1.1 illustrates how to use the OP to develop a reliability test suite
Figure 13 —Steps for characterizing the operational profile (see Musa [B60])
5.1.1.4 Identify the impact of software design on software and system design
This task is essential for developing and testing the software. It is a prerequisite for developing a reliability
test suite (5.4.1).
Software design can impact systems design and vice versa. Hence, an overall system understanding is
needed to assess the best design approaches to build reliability into that system. Fault and failure analyses
(see 5.2) are needed at the functional and design level to determine the weak areas and then assess and
choose the potential trade-offs for a more robust design looking at hardware, software, and combined
solutions. As a result of these analyses, additional software is often added to the system to perform fault
detection, isolation and response (FDIR), but the software cannot perform efficient FDIR if the hardware is
not instrumented or designed for it as shown in the example in F.1.3. Figure 14 is the checklist for
identifying the impact of the software design on the system and system design.
5.1.2 Define failures and criticality
This is an essential task. This tasks is executed as typically a joint effort between acquisitions, software
management, reliability engineering, and software quality assurance. In some cases, however, the
acquisitions organization may define the failures and criticality in the statement of work.
This task is a prerequisite for performing a software failures modes effects analysis (SFMEA) (see 5.2.2),
collecting failure and defect data (see 5.4.4), applying SR metrics during testing (see 5.4.6), selecting and
using the SR growth models (see 5.4.5), and deciding whether to deploy the software (see 5.5).
If a defect is encountered under the right conditions when the product is put to use, it may cause the
product to fail to meet the user’s legitimate need. Serious consequences may follow from a user
experiencing a software failure; for example, a defect may compromise business reputation, public safety,
business economic viability, business or user security, and/or the environment.
[ISO/IEC/IEEE 29119-1:2013]
39
IEEE Std 1633-2016
a) Determine where the defects probably lay, the weak areas of the software system, and the best way
to mitigate them. Two techniques generally used by hardware and adapted for software are the
failure modes and effects analyses (FMEA) in 5.2.2 and fault tree analyses in 5.2.3.
b) Understand the modes and states of operation, environments and conditions the system should
operate under in order to target what should be protected and to what extent.
1) What should be done by hardware and what is best done by software depends on what needs
to be detected, what can be detected, at what level does it need to be detected at, as well as
what is the appropriate response.
2) Over designing either the hardware or the software may have the opposite effect on the
system’s reliability.
3) Software may need to detect and recover from failures. Hardware needs to have sufficient
monitoring and control at the appropriate places for software to provide closed loop response
to hardware failures.
4) Use Table 4 as a guide
c) Find the right balance between the hardware and software solutions. Overly complex software may
be unreliable while software that fails to detect system failures may also be unreliable.
Figure 14 —Checklist for identifying impact of software on system design
Table 4 —Reliability considerations

Reliability
Applicable methods
consideration
Software
Protecting data Storing data in multiple places; cycle redundancy check (CRC) ), compare before use,
self-checking of software data integrity.
Protecting system and Send commands/messages with CRC and handshaking for correctness. Check
reporting commands/data before use at destination, check of message size, content, stage of
operation, etc.
Protection from Sensors, affecters, data lines.
hardware failures
Ensuring available Perform background memory checks for memory viability.
memory
Protecting system Use multiple bits (8, 16, and 32) to indicate any change in critical functions or for sensor
states and effector identification—then, prior to activation, check address and instructions
sequence against separate stored data and check if activation is “legal” under current
conditions.
Protection from Self-checking of software events and data integrity.
inadvertent operations, Watchdog timers that monitor the software’s operation and trigger a reset or alternate
out-of-sequence system.
commands or data, and Fault/failure detection, isolation, and recovery (FDIR).
environmental affects  Detection—how the SW knows one or more fault or failures have occurred
 Isolation—how the SW knows what failed and when
 R_____—what the SW should do about it
 Recovery—(human interaction, auto-reset, retry rate, revert to backup, etc.)
 Reduced operations
 Partial) Reset
 Restart
 Reload
 Restore
40
IEEE Std 1633-2016
Table 4—Reliability considerations (continued)
Reliability
Applicable methods
consideration
Hardware
Protection against Hardware interlocks
inadvertent software
command
Available memory Shield memory areas
It is a common, but incorrect, myth that compiling the code will catch most of the software defects.
However, compiling is an act of producing the executable from the implemented code. It, in itself, can
introduce defects (build issues are quite common). Compiling does not have the knowledge of what the
expected system behavior is—it only knows how to build the executable. Even with the new static and
dynamic analyzers available to detect bugs in code, they cannot identify whether the requirements are
complete, design is sufficient, or the interfaces are correct.
There are two categories that partition all software defects (i.e., each software defect belongs to exactly one
of the two categories) (Grottke, Trivedi [B23]):
 Mandelbug—A defect whose activation and/or error propagation are complex. This is the case if
the propagation of the error generated by the defect involves several error states or several
subsystems before resulting in a failure, causing a time lag between the fault activation and the
failure occurrence. Another source of complexity in fault activation and error propagation is the
influence of indirect factors, such as interactions of the software application with its system-
internal environment (hardware, operating system, other applications), the timing of inputs and
operations (relative to each other), and the sequencing of inputs and operations. As a consequence,
the behavior of a Mandelbug may appear chaotic or even nondeterministic, because the same set of
input data seems to make it cause a failure at some times, but not at others. Race conditions are
classic examples of problems caused by Mandelbugs.
 Bohrbug—A repeatable defect; one that manifests reliably under a possibly unknown but well-
defined set of conditions, because its fault activation and error propagation lack complexity. These
include defects due to incorrect software requirements, software architecture, software detailed
design, and code. Failures from these defects result in a predictable system failure when the inputs
are such that the faulty code (which may be due to faulty requirements or faulty design) is executed
as a particular juncture of the software code is reached. They are repeatable if one knows the
particular conditions that resulted in the failure, with 100% probability of the failure occurrence,
and they can be fixed by changing and/or rewriting the code. If the faulty code is due to faulty
specification, the specifications can be fixed and then the design and code. If the faulty code is due
to a faulty design that can also be fixed. Bohrbugs are very similar to quality hardware failures;
once found, they can be fixed and they do not have any residual failure rate left in the system. At
the component and system level testing, they contribute to the infant mortality part of the bathtub
curve. These software defects can be found with high level of confidence at subsystem level testing
and prior to the total system integration.
Due to their elusive nature, Mandelbugs are typically difficult to find in and remove from the code. The
following (overlapping) classes of defects are subtypes of Mandelbugs:
 Heisenbug—This kind of defect seems to disappear or alter its behavior when one attempts to
probe or isolate it. For example, failures that are caused by improper initialization may go away as
soon as a debugger is turned on, because many debuggers initialize unused memory to default
values. (See Grottke, Trivedi [B23].)
 Aging-related defect (Grottke, Matias, Trivedi [B25])—A defect that is able to cause an increasing
failure rate and/or a degrading performance while the software is running continuously. Usually,
this is due to the accumulation of internal error states. An example is a memory leak: as more and
41
IEEE Std 1633-2016
more memory areas are claimed but—erroneously—never released during execution of the
software, there is a growing risk of a failure because of insufficient free memory. Special
techniques are used during subsystem level testing to rigorously exercise the memory management
system and de-bug the software subsystem. The aging-related defects left in the code in the
operational phase can be dealt with so-called software rejuvenation techniques (Huang et al.
[B28]).
 Software-hardware interface defects—Software can fail due to a lack of robust interface with the
hardware. Failures can occur when hardware is temporarily operating outside of its specification
and interfacing with software causing both to be out of their operational ranges. Consequently, the
system can fail, or its performance is substantially degraded; but there is no specific hardware
failure that can be identified as the root cause. Trouble not identified (TNI) usually results, which is
very similar to tolerance stackups. Tolerance stackups are used in engineering to describe events
when each element of the system is in specification, but their interaction causes the overall system
performance to be in the failure region. To reduce the occurrence of such failures when software
and hardware interface, attention needs to be given to the subsystem level of the software
requirements.
Each project should agree upon failure definition and scoring criteria (FDSC), as a means of standardizing
1) which behaviors are to be categorized as reliability failures, and 2) how to assign severity scores to those
failures. The scoring criteria for software failures are project and product specific and the definition of the
scoring criteria usually involves the customer or customers. Some criteria might not, for example,
recognize degradation of a function as a failure. Further, the relative severity of a failure might depend on
identifying the criticality of a failed function or component.
Recognizing the relative consequence of a given system failure is one half of any risk-based approach to
development or appraisal efforts. If the frequencies of various failures (the other half of the risk equation)
can be assigned quantitative values, then it would help to also assign quantitative consequence values. This
is often accomplished by estimating a monetary value, or cost, of failure. The values are likely to be
influenced by the type of the underlying defect. For example, there has been evidence indicating that
failures caused by Bohrbugs tend to be more severe than those caused by Mandelbugs (Grottke, Nikora,
Trivedi [B21]). Also, the complexity involved in the fault activation of Mandelbugs suggests that these
defects may result in a lower failure rate.
Project-specific definitions for failures should be identified. These definitions are usually negotiated by the
customers, testers, developers, and users. These definitions should be agreed upon well before the
beginning of testing. There are often commonalities in the definitions among similar products (i.e., most
people agree that a software defect that stops all processing is a failure). The important consideration is that
the definitions be consistent over the life of the project. There are a number of considerations relating to the
interpretation of these definitions. The analyst should determine the answers to the questions found in
Figure 15:
a) Identify the system level failures that can be caused by software. The SFTA (see 5.2.3) and the
SFMEA (see 5.2.2) can be used to brainstorm this.
b) For each of the preceding system failures, what is the relative criticality? (i.e., catastrophic, critical,
moderate, negligible)
c) Determine whether faults caused by defects which will not be removed are counted.
d) Determine how to count defects that result more than one failure (e.g., one pattern failure represents
several failure occurrences of the same failure type)?
e) Determine what a failure is in a fault-tolerant system.
f) Generate the failure definition and scoring criteria and verify that the failure reporting and defect
tracking systems use this scoring criteria.
Figure 15 —Checklist for identifying a failure definition and scoring criteria
42
IEEE Std 1633-2016
5.1.3 Perform a reliability risk assessment
This is an essential task and is a prerequisite for determining the SR plan (see 5.1.6), determining an
initial system reliability objective (see 5.3.1), and performing a SR assessment and prediction (see
5.3.2). The software management is typically the lead for this task. While this task is typically
performed by the organization responsible for developing the software, it can also be performed by the
organization acquiring the software.
Reliability risk assessment can start as early as concept planning and project start up with the
assessment of the development environment and continue through requirements development, design,
reviews, verifications, changes, and testing.
From a product standpoint the risks that can affect the reliability of the software include but are not limited
to safety considerations, security and vulnerability considerations. and the current maturity of the product.
Even though safety, security, and vulnerability are not the scope of this document, these can be risks with
regards to SR. There are times in which trade-offs are required between reliability and safety, security, and
vulnerability. For example, sometimes the best way to protect against vulnerability is for the software to
stop processing. Software products that are not mature can be a reliability risk but also products that are
very mature can be risky as well if the product is too difficult to maintain or is at risk of obsolescence. A
checklist for creating a reliability risk assessment is found in Figure 16.
a) Identify product related risks such as safety, security, vulnerability, project maturity. See 5.1.3.1,
5.1.3.2, and 5.1.3.3.
b) Identify project and schedule related risks such as grossly underestimated size prediction, grossly
overestimated reliability growth, defect pileup. See 5.1.3.4.
c) Identify if there are too many risks for one release (brand new product, hardware, technology,
processes, or people). See 5.1.3.5.
d) Generate a listing of risks that affect SR from steps a) to c).
e) Consider these risks when determining the system reliability objectives
f) Consider these risks when performing the SR assessment and prediction (see 5.3.2), which will
ultimately be used to establish the reliability growth needed to achieve the prediction and reliability
objective
Figure 16 —Checklist for making a reliability risk assessment
5.1.3.1 Assess safety risks
It is very hard to have safe software if it is not reliable. With that being said, reliability and safety are
both system design parameters that usually need to be balanced against each other. Table 5 shows that
the same analyses can be used for both safety and reliability when one adjusts the focus accordingly.
43
IEEE Std 1633-2016
Table 5—Safety and reliability viewpoints

SRE activity Reliability point of view Safety point of view
Hazards A hazard does not necessarily have to be safety Hazards are safety related or safety
analysis related or safety critical. It can be related to a loss of critical.
mission, loss of revenue, etc.
Failure Failures may have financial, security, vulnerability Safety related and safety critical failures
definition and impact in addition to safety. are separately categorized.
scoring
Failure modes Any failure mode in any part of the code that is Safety-related or safety critical failure
analysis visible to the system or end user, impacts modes in the safety-related or safety
availability, or requires a maintenance action. critical code
Reliability MTBF or failure rate is defined as per the failure The mean time between safety-related
predictions definition and scoring criteria, which means that a events can be derived from the
failure might not be safety related. probability of a particular safety-related
event.
Software Simple code can be more reliable than complex Error handling of system and software
design code. faults will introduce more complexity
Since safety and reliability are both system design parameters, they can sometimes conflict with each
other just as performance and reliability requirements can conflict. Software safety and reliability can
conflict in that a simple straightforward software design is often the most reliable as more complex
code can be less deterministic. However, from a critical function perspective, multiple software and or
hardware backups might be a design solution chosen to try to assure that a function will work despite
multiple failures. System reliability often relies on similar or dissimilar redundancy. When looking at a
more autonomous system, which needs to operate in the face of faults and failures, without human
interaction, it can be difficult to make the design trades. A design can quickly become extremely
complex. The question is: has the design introduced more potential failure modes than it fixed? There is
a possibility that in making the system more reliable the software becomes less reliable and more
complex. The reliability practitioner needs to consider this possibility.
System safety is focused on what leads to a hazard. Software safety delves into how software can
contribute to those hazards and can provide “must work,” “must not work” design solutions. SR
practitioners should inform the designers of potential defects in the software and software process
itself. Then the organization can provide a balance in design trade-offs, as well as assure sufficient
testing to find any defects in the software/system safety design. The checklist for assessing the safety
risks is in Figure 17:
a) Perform a preliminary hazards analysis on functionality as well as known types of hardware

systems and subsystems. A system’s potential hazards need to be known and what subsystems,
hardware, firmware, and /or software that might contribute to a catastrophic or critical failure,
which could cause loss or sever damage to life, mission, facility, business, or the environment.
As the system becomes better defined, the hazard analysis is also refined. Understand that the
hazards analysis is also applicable for identifying hazards that are reliability related as well as
safety related.
b) The failure definition and scoring criteria should have safety-related classifications to distinguish
safety-related failures.
c) Once the critical functions of software are identified, the next step is to verify that the software is
designed to handle them. Software not only can contribute to a system failure, it can also monitor,
control, and or mitigate known system and hardware potential hazards. Software is safety critical if
it performs any of these functions. Plus, it can be considered critical if used to provide data on
which critical decisions are made. The software code that is safety related needs to be identified as
such.
44
IEEE Std 1633-2016
d) Identify the “must-work” functions such as firing multiple pyro devices simultaneously within a
rocket stage transition, as well as the “must-not-work” situation such as not inadvertently firing
those same pyros at any other time. Perform a software FMEA (see 5.2.2) to determine critical
events, inputs, and outputs of the function can be performed to provide ideas for the design solution
space to increase both reliability and safety.
e) Once the software has been designed to mitigate the identified critical software or system element,
perform a reliability analyses to demonstrate if the chosen design is reliable from a systems
perspective. The software contributions to the system hazards and their mitigations and controls
needs to be cross checked with the reliability critical items list and the software’s functions that
may impact those. Quantitative SR predictions can also be adapted to predict and monitor the
mean time between safety-related failures.
f) Make design trades between SR and safety and consider the following:
 Similar and dissimilar redundancies
 Voting or monitoring logic to switch and knowing when to switch
 Error and status reporting logic
 Multiple sensors and effectors
 How many and how current do the system conditions need to be known for a backup to
work,
 If the storage/memory is trustworthy
 When and where to employ watchdog timers
 Common failure modes in both the software and hardware and knowing how the software
can monitor those modes
Figure 17 —Checklist for assessing safety risks
5.1.3.2 Assess security and vulnerability risks
As with safety considerations, the security of a system depends partly on the reliability of the software. One
of the sources of vulnerability is software that is not coded according to accepted coding standards.
Improving the reliability of the software may help to make it less vulnerable, but it is not sufficient for
making it secure. Security issues may be analyzed with techniques such as top-down FTA and bottom-up
FMEA.
Security risks relate to system failure modes in which the confidentiality, integrity, or availability of the
system or its information is compromised. Vulnerabilities are defects (introduced in requirements, design,
or implementation) that could be exploited to initiate security failures.
Many security risks can be addressed through the same practices that reduce the risk of other types of
failure. Additional attention may be appropriate to address unique causes of vulnerabilities. These measures
may include security-aware requirements, design, and implementation practices as well as specific
appraisal techniques (such as penetration testing) to detect vulnerabilities not otherwise identified.
As with any engineering trade-offs, the extent of security and of security assurance have to be balanced
against other system attributes, such as usability or efficiency, that might be diminished due to security
enhancements. There are now several static and dynamic code analyzers that search for sets of
vulnerabilities and other tools that help determine the pedigree of COTS software and open source
software. Security is a great concern in many of software projects now from medical devices, to cars, to
factory operations.
45
IEEE Std 1633-2016
5.1.3.3 Assess product maturity
Software code that is newly developed will be generally riskier than software that is reused. On the other
hand, software products that have been fielded for many years and are nearing obsolescence are generally
riskier than those that are not. Any software product that is not “throw away” is subject to obsolescence.
This is because the environment such as the OS around the software continually changes. The software can
and will become obsolete if it is not perfected to be up to date with its supporting environment. Software
can also become problematic after several years if the software code is not maintained properly. For
example, “copy and paste” code is typical in aging systems. “Copy and paste” code is the opposite of
object-oriented code in that code that is nearly identical is copy and pasted one or more times and
ultimately can be problematic. As part of the planning process, the practitioner will need to identify the
maturity of each of the software components in the system. Table 6 shows how product maturity is
considered quantitatively in the SR modeling:
Table 6 —How product maturity affects the SRE practices

Subclause affected by product maturity Description
This defect density prediction model does take into consideration the
Predict defect density 6.2.
maturity of the product.
When a software LRU is has been fielded in operation for a few
Predict effective size B.1.1, B.1.2, B.1.3 years but is not yet obsolete, that can have an impact on computing
its effective size which is used to predict its reliability.
5.1.3.4 Assess project and schedule risks
There are certain development practices that have been associated with more reliable software as discussed
in 5.3.2 and 6.2. However, there are also certain factors that can negatively affect the reliability of the
software. Table 7 provides insight into what risk to check for and why.
Table 7 —Checklist for assessing project and schedule risks

Project and schedule risk Why this risk should be assessed
Determine if the size of the software is grossly Software size determines the schedule and the reliability
underestimated. prediction.
Determine if reliability growth is grossly over Reliability growth is how long the software version is tested in a
estimated. real environment without adding any new features.
Determine if there is defect pileup. This is what happens when releases are spaced too close together.
A particular release may be at risk from a previous release and
may be a risk to a subsequent release.
5.1.3.4.1 Size is grossly underestimated
The size estimations for the software are a primary factor that drives the schedule. So, if the size
estimations are grossly underestimated then so will be the schedule. If the schedule is insufficient then the
reliability growth will be affected as discussed next. It is a chain reaction that begins with a few faulty
assumptions. There are several ways that the size of a software system can be grossly underestimated, as
follows:
a) Reused components really are not reusable or the work required to reuse them is as much as
developing new code
b) Size estimates are based on old history that does not take new technology into consideration
Following are some indicators that reused components really are not reusable:
 Reused code is written in different language or development environment or operating system
46
IEEE Std 1633-2016
 Reused code is written for different target hardware

 Reused code is more than a decade old
Size estimates are often based on past history. Unfortunately, if the size history is not recent that can lead to
gross underestimates of size because of the fact that software has been increasing in size steadily since the
1960s. If size estimates are based on past projects that are more than few years old, this should be identified
as a risk. (See US GAO [B92].)
5.1.3.4.2 Reliability growth is grossly overestimated
There are at least two reasons why SR growth can be grossly overestimated, as follows:
a) Immovable deadlines—When the deadline for the completion of the software or system is
immovable that means that any schedule slippage will probably be compensated for with shortened
reliability growth cycles.
b) Reliability growth plans neglect to consider that when new features are added to the software
baseline the reliability growth resets. Refer to Figure 18.
Reliability growth has a significant impact on the reliability of the software. If the reliability growth is
underestimated then so will the reliability predictions and estimations. When reliability growth is cut short
that can and will cause defect pileup as discussed in the next subclause. Figure 18 is an example of defect
pileup that can occur when reliability growth is grossly overestimated.
Figure 18 —Expected versus actual reliability growth with feature drops

Defect pileup occurs when defects generated in a particular release are not fixed in that release. Over
several releases, unfixed defects accumulate and increase in size (snowball) to the point that both the
software is unstable and future software releases are jeopardized. The defect pileup usually causes future
release schedules to slip because the software engineers working on release X+1 or X+2 are distracted with
unexpected and unplanned field support from release X. Defect pileup is one of the leading causes of
scheduled slippage (see Neufelder [B68]). Figure 19 is an example of predicted defect pileup (The
prediction was done via the methods in 5.3.2.3.) In this example there is no predicted defect pileup between
the first three releases. However, between the last three releases the pileup is both noticeable and
increasing.
47
IEEE Std 1633-2016
Figure 19 —Example of defect pileup

Defect pileup often goes undetected until problematic because SR predictions are performed on only one
release at a time without regard for the previous and next releases. In 5.3.2.3 Step 2, a method to predict
defect pileup is presented. In 5.5.3 the steps for computing the actual defect pileup are presented. As part of
the risk assessment, it should be determined whether the symptoms of defect pileup are likely to affect this
software release:
a) Software engineers are spending a considerable amount of unplanned time supporting prior
releases.
b) There are many defect reports from previous releases that have not yet been scheduled or corrected.
5.1.3.5 Assess whether there are too many risks for one software release
An inherent risk is a risk that is often difficult to avoid. Some of these include the following:
 This is a brand new software product or technology (if the software is at version 1 that is indicative
of a new software product)
 Any specialized target hardware has not been developed yet
 Brand new processes or procedures
 Brand new personnel or personnel who are new to the product or technology or company
Research (Neufelder [B65]) has shown a correlation between the number of risks on a software
project/release and the outcome of that release. The outcome of each project was known to be either
1) successful, or 2) distressed, or 3) neither. The third category is referred to as “mediocre.”
 A successful project is defined as having a defect removal efficiency (DRE) of at least 75% at
deployment.
 A distressed project is defined as having ≤ 40% defect removal at deployment.
NOTE 1—Other research by Jones [B9] found that DRE was much higher for successful projects.
NOTE 2—Two independent bodies of research found that the maximum DRE observed was 99.9%. See Jones [B40]
and Neufelder [B68].
The DRE is simply the percentage of total defects found over the life of that particular software release that
were found prior to deployment. DRE is also referred to as defect purity (Tian [B91], [B90]]). Table 8
shows that the projects with none of the risks shown in this subclause were most likely to be successful. In
the referenced study, there were no successful releases with more than two of these risks. See Neufelder
[B65].
48
IEEE Std 1633-2016
Table 8 —Relationship between risks and outcome
Successful Mediocre Distressed

release release release
None of these risks 78% 27% 0%
Exactly one of these risks 11% 64% 50%
Exactly two of these risks 11% 6% 30%
Exactly three of these risks 0% 0% 10%
Four or more of these risks 0% 3% 10%
The following inherent risks generally cannot be changed within one software release/version. However,
the risks can often be mitigated by making several concurrent releases that have only one or two risks
applicable to it. In summary, as part of the risk assessment, the practitioner should analyze whether a
particular release has too many risky objectives and whether or not these risks can be mitigated.
5.1.3.5.1 This is a brand new software release/product or technology
The first major release of a particular product or a product that is based on technology that is new to the
particular organization is more risky from a reliability standpoint then subsequent releases. This is because
during the first release there are many unknowns that are difficult to quantify. This can lead to a schedule
slippage, which can then lead to the other risks previously shown such as insufficient reliability growth and
defect pileup.
5.1.3.5.2 The right people are not available to develop or test the software
Several major studies (Neufelder [B68], SAIC [B77]) that have correlated SR to certain key indicators have
found a strong relationship between the experience of the software engineers and the reliability of the
software. When there is high turnover or when there are software engineers who do not have the industry or
domain experience to develop and test the software, there is a risk.
5.1.3.5.3 The target hardware has not been developed yet
When the target system hardware (over and above a computer) is evolving during the software
development process, this can be a risk to the SR. This is because the software cannot be fully tested until
the hardware system design is stable. This is also because any design changes in the hardware also result in
design changes to the software.
5.1.3.5.4 Technology risks
The world around software system can evolve faster than the software organization can keep up with. The
OS, interfacing hardware, drivers, and third-party software evolve over time. If the software is not kept up
to date with the technology of its environment, it can result in the software becoming prematurely obsolete.
During the planning phase, all technologies should be identified and analyzed to determine any potential
risks due to brand new technology or due to aging technology.
5.1.4 Assess the data collection system
This is a typical task and is performed jointly by the software quality assurance and software management.
The acquisitions organization can review the data collection system. Prior to using any of the SR models or
49
IEEE Std 1633-2016
analyses, the data collection system should be assessed to verify that it supports the selected SR tasks. In
setting up a reliability program, the following goals should be achieved:
 Clearly defined data collection objectives.
 Collection of appropriate data (i.e., data geared to making reliability assessments and predictions).
See 5.3.2.1 and 5.4.4.
 Review of the collected data promptly to see verify that it meets the objectives.
A process should be established addressing each of the steps in Figure 20:
a) Establish the objectives for the data collection (which metrics are appropriate?)
b) Identify the data that needs to be collected from the steps in 5.3.2.1 and 5.4.4.
c) Set up a plan for the data collection process.
d) Use applicable tools to collect the data.
e) Evaluate the data as the process continues.
f) Provide feedback to all parties.
Figure 20 —Checklist for assessing the data collection system

Data collection can be viewed in two ways. If the company or organization has an established software
engineering data base, then reliability information is an important component of the data. Generally this
will be summary information identifying the characteristics of the project, the type of reliability model
employed, the verification results, and the reliability estimates at various milestones during the project
development. The second aspect of data collection pertains to verification of the software during software
development. The phase of development should be recorded, the software model being used, and the type
of verification to be performed. A complete list of the defects detected during verification, the nature of the
defect, the time of occurrence, information on the correction of the defect, and the severity of the failure.
Details of SR estimates based on the verification results should be recorded. Comparisons should be made
of the present SR estimates with estimates made during earlier stages in the development process. Any
changes in the development process or verification process based on the results of the current reliability
estimates should be recorded.
It is often difficult to gather SR test data if no provision was made in the project plan at the outset. Often
one is faced with accumulating the data that exists. It is common in many projects to use α-testing (in-
house system testing of mostly complete project) or β-testing (outside testing by trusted potential users of
the mostly complete project). These test results, which are generally well documented in notebooks, often
yield adequate reliability test data.
One should first scrub the data to eliminate the following:
 Hardware failures (cases where equipment repair or replacement was required to continue testing)
 Operator error
 Failures induced by test equipment
 New feature or changed feature requests
The operating time between failure and the nature of the failure (new failure or repeated occurrence of a
previously identified failure), should be investigated and recorded. All failure occurrences, unique failures,
or repeated occurrences should be counted.
The following are useful tips for planning for the SR data collection:
 Include a flag in the problem report that indicates whether the problem is related to reliability.
50
IEEE Std 1633-2016
 Include a counter in each problem report to be incremented every time that same defect causes a
failure.
5.1.5 Review available tools needed for SRE
This is a typical SRE task. Reviewing the available SRE tools is a joint effort mainly because each of the
tools automated tasks performed by each of the stakeholders. The reliability engineers will be mostly
interested in the tools needed for SR prediction and failure mode analysis while the software quality
assurance and test engineer will be interested in the tools required for SR growth modeling during testing.
The software manager(s) should be involved in reviewing the tools for reliability predictions and
assessments and sensitivity analysis.
The automation considerations for the SRE tasks are described in Clause 5. It is not the goal of this
document to prescribe particular commercial tools but rather to indicate the features needed, whether these
tools exist in industry, and any refinements that need to be made to existing tools. Table 9 lists generic
types of tools that should be used and their automation capabilities. A list of available tools can be found in
the Annex C. That list does not represent an endorsement of any particular tool.
Table 9 —Tools needed for software reliability

SRE task Automation considerations
5.1 Planning for software reliability This is typically a manual activity.
5.2 Develop failure modes model
5.2.1 Perform software defect RCA The ability to export software defect reports from the software or
system defect tracking system, sort them by root cause, and generate a
Pareto chart of most common root causes.
5.2.2 Perform SFMEA A spreadsheet that automates the templates for each viewpoint and for
each failure mode and root cause.
5.2.3 Include software in the system The same tool used for the system FTA should be used for the SFTA.
FTA
5.3 Apply SR during development
5.3.1 Identify/obtain the initial Usually a manual effort.
system reliability objective
Perform a SR assessment and Each of the predictive models is available as a spreadsheet or high-
5.3.2 prediction level software.
Sanity check the early Usually a manual effort.
5.3.3 prediction
Merge the predictions into the The same tool that is used for the hardware predictions can be used
5.3.4 overall system predictions for software as long as it allows software components to be added.
5.3.5 Determine an appropriate Usually a manual effort.
overall SR objective
5.3.6 Plan the reliability growth The same tools used to plan the growth of the hardware can be used
for the software.
5.3.7 Perform a sensitivity analysis This is only applicable for survey-based prediction models. The tools
that automate the SR models often have a sensitivity analysis built in.
5.3.8 Allocate the required reliability The same tool that is used for the hardware predictions can be used
for software as long as it allows software components to be added.
5.3.9 Employ SR metrics for These tools are either part of the development environment or they
transition to testing run in the background during testing. The tools should be able to
merge the coverage from one run to another. They should be able to
output statement and branch coverage as a minimum.
51
IEEE Std 1633-2016
Table 9—Tools needed for software reliability (continued)

5.4 Apply SR during testing
5.4.1 Develop a reliability test suite Test generation tools and test execution tools are needed. The test
generation tools help the user develop the test cases based state
models, timing diagrams, etc. Test execution tools will run the tests
and collect the results.
5.4.2 Increase test effectiveness via The user needs to apply a compiler that compiles programs in high-
software fault insertion level languages such as C and C++ to a low-level intermediate
representation (IR). The IR preserves type information from the
source level, but at the same time, represents the detailed control
and data flow of the program.
5.4.3 Measure test coverage During development white box test coverage tools are used by
developers. The white box test coverage tools usually run with the
debugger or development environment although they can also run
in the background. The white box testing tools can be specific to
the language. During a software systems test black box testing is
performed. There are test tools that run in the background during a
black box test and measure test coverage. The test coverage tools
should be able to merge the coverage results from one run to the
next.
5.4.4 Collect fault and failure data Various commercial defect reporting systems exist
5.4.5 Select reliability growth models These metrics depend on software defect and failure data. They will
either require imported data from the software defect tracking system
or they will be part of the software defect tracking system.
5.4.6 Apply SR metrics The simple models can be automated via a spreadsheet. The more
complicated models require a macro enabled spreadsheet, a
mathematical modelling tool, or a commercially available tool. Note
that all tools should receive a listing of observed faults per day as
input.
5.4.7 Determine the accuracy of the A few of the commercially available tools provide this accuracy
prediction and reliability comparison. It can also be automated in a spreadsheet.
growth models
5.4.8 Revisit the defect RCA The ability to export defect reports from the defect tracking system,
sort them by root cause, and generate a Pareto chart
5.5 Support release decision
5.5.1 Determine release stability Usually a manual activity.
5.5.2 Forecast additional test duration A few of the commercially available tools provide this feature. It can
5.5.3 Forecast remaining defects and also be automated in a spreadsheet.
effort required to correct them
5.5.4 Perform an RDT Any tool used for the system RDT can be used for the software RDT.
52
IEEE Std 1633-2016
Table 9—Tools needed for software reliability (continued)

5.6 Apply SR in operation
5.6.1 Employ SRE metrics to monitor field The failure and defect reporting databases need to be used to
SR track issues from the field by version and product.
5.6.2 Compare operational reliability to Usually a manual activity.
predicted reliability
5.6.3 Assess changes to previous
characterizations or analyses
5.6.4 Archive operational data
5.1.6 Develop a Software Reliability Program Plan (SRPP)
This is an essential task. The inputs to this task are all of the other tasks in the planning clause of the
document as well as identification of the appropriate SRE tasks from 4.3. This task can be executed by the
acquisitions personnel and included as part of the contract. However, it is typically performed by the
development organization and delivered to the acquisitions organization for approval. The reliability
engineer typically produces this document with inputs from software management and software quality
assurance.
Implementation of robust reliability engineering efforts starting in the requirements phase and continuing
throughout the Design, Development, Integration, Test Deployment, and Operations and Support phases
should take into account this document and other related IEEE documents, applicable ISO 9000 standards
as well as other industry standards. Software Engineers coordinate development of the Reliability
Engineering Program Plan with Reliability Engineering, System Engineering, Test and Evaluation, Safety,
Logistics and Program Managers to allow for execution of Reliability Engineering in compliance to
applicable policies and processes, coordinated across multi-disciplines and that reliability objectives are
met.
Reliability engineering processes and tasks defined in the Reliability Engineering program plan should be
integrated within the Systems and Software Engineering processes and documented in the program’s
Systems Engineering Plan, Software Engineering Plan, Test and Evaluation Master Plan and Life-Cycle
Sustainment Plan. Reliability engineering should be assessed during system and software engineering
technical reviews, Test and Evaluation, and Program Management Reviews as requested.
The Software Reliability Program Plan (SRPP) incorporates guidance from the Reliability Engineering
Program Plan and guidance to invoke SRE tasks, tailored to the program, during all program lifecycle
phases to demonstrate confidence in achieving the programs reliability requirements during integration,
developmental and operational testing. The period of performance addressed by the SRPP extends through
all lifecycle phases. The SRPP represents the primary technical planning document for implementing SRE
process practices and principles throughout the program’s lifecycle. The objective of the SRPP is to:
 Define the method to manage Reliability engineering for the program that includes software, a list
of expected deliverables, and the schedule of activities for all efforts.
 Establish processes to manage the Reliability requirements for systems and subsystems and
software.
 Demonstrate software reliability is achieved through modeling, simulation, or analysis by each
subsystem or system.
 Demonstrate software reliability is achieved by each system subsystem to meet operational
Reliability requirements.
53
IEEE Std 1633-2016
The SRPP identifies the overarching SRE task activities expected to perform in order to increase the
software reliability. The SRPP is the primary program management tool used to plan, monitor and control
program SRE tasks. The Reliability Engineering Plan defines the key stakeholder’s of Reliability
Engineering efforts. The SRPP identifies scheduling of SRE tasks relative to the lifecycle schedule so that
SRE functions are an integral part of requirements, design, development, integration, and processes and
that SRE activities are coordinated efficiently with other project disciplines, such as systems engineering,
software engineering, requirements, design, development, integration, test and evaluation, logistics
planning, safety engineering, and human systems interface.
The following procedures are defined for three possible audiences. The first audience is an Acquisitions
person whose organization is acquiring but not developing a software system. This person will be defining
deliverables for software and system contractors. The second audience is the contractor who is providing
software and system deliverables and has contractual obligations for software reliability. The third audience
is an organization who is not under contract to perform software reliability but wishes to establish an SRPP
to increase the confidence that the software meets internally identified reliability requirements. A checklist
for creating a SRPP for three possible audiences is found in Figure 21, 0, and Figure 23.
a) The SRPP should be developed during or prior to the requirements phase. Organize a group of
subject matter experts from software management, software quality, reliability engineering,
systems engineering, safety engineering, logistics, program management, etc., and review 4.3 of
this document. Review the results of the other planning tasks in 5.1.
b) Select the SRE activities that apply to the program based on the criticality of the software and the
resources available. The essential tasks are typically those tasks that can be combined with existing
development practices to reduce cost.
c) Develop contract RFPs statement of work tasks and deliverables. Verify that reliability engineering
tasks, deliverables, and requirements are included in the contract to be awarded (if applicable). The
reliability engineering team conducts predictions and allocations on systems and subsystems in
accordance with the life-cycle strategy to validate the reliability thresholds, and objects can be
placed into the contract specifications prior to release of request for proposals.
d) Specify that the development organization perform the SRE tasks identified in step a), prepare an
SRPP, provide status against it at every milestone review, and update it throughout the life cycle.
Figure 21 —Checklist for creating an SRPP—for acquisitions personnel
54
IEEE Std 1633-2016
a) Review the SRE requirements and the required SRE tasks for the program.
b) Create an SRPP that includes the required SRE tasks.
c) Arrange the SRPP in order of planning, development, deployment decision, and operation.
d) Add to each section the specified SRE tasks.
e) Identify who will perform each of the tasks.
f) Identify the cross-functional relationships and teams required to support the SRE. The SRE
practitioner should also be identified as a team member with regard to the software
engineering activities. Communication paths between the SRE practitioner and software
management should be clearly identified.
g) Invite all stakeholders to review and approve the plan.
h) Plan updates to the SRPP on a continuous basis and review at every major milestone to
verify that all projects support activities are properly managing reliability engineering.
i) Update the SRE status for any planned system engineering technical reviews, program
reviews, or major milestones to verify adequate consideration of the interfaces and
dependencies among these acquisition program activities. Updates are made to provide more
detail in the Reliability Engineering Plan in the form of a reliability growth plans, analysis,
and reporting growth curve(s).
j) Update the SRE during the Development Phase. Changes may include updates to the
reliability growth planning, analysis and reporting, identifying the systems and software
with demonstrated low reliability. Test and Evaluation activities are revised and more details
on the plans for reliability testing are detailed. Updates should provide more detail of
Reliability Engineering status at system and software engineering reviews to verify adequate
consideration of the interfaces and dependencies among these acquisition program activities.
The Reliability engineering team provides more reliability growth detail in the Reliability
Engineering Plan in the form of a reliability growth plans, analysis, and reporting growth
curve(s) that provide reliability growth progress on systems and software that have been
predicted or demonstrated to have low reliability.
k) Update the SRPP during Operation and Support. During this phase the Reliability
engineering team’s level of engagement with the system engineering and software
engineering processes is dependent on several factors that determine the Reliability
engineering activities required. Systems/subsystems upgrades that are determined to have no
observed decrease in reliability require a less comprehensive Reliability engineering
program, which may consist of the initial Reliability engineering prediction.
Systems/subsystems with new design or implemented in a new environment and/or have
been determined to cause degradation of reliability or reliability growth require a
comprehensive Reliability engineering plan, activities and products. The Reliability
engineering team may need to revisit reliability growth for systems upgrades and provides a
strategy to address new requirements or new functions that are being integrated into the
design and details on how they will be tested and validated.
l) Anytime modifications are made to the SRPP, the program should prepare and document
any Reliability inputs to the system engineering plan, software engineering plan and test and
evaluation plans.
Figure 22 —Checklist for creating an SRPP—for those organizations under contract
55
IEEE Std 1633-2016
The checklist for the organizations that are not developing software under contract is shown in Figure 23.
a) Execute all of the steps under the acquisition personnel except for step c).
b) Execute all of the steps under the previous section except for step a).
Figure 23 —Checklist for creating an SRPP—for those organizations not under contract
There are a variety of software analyses related to outsourcing, requirements, design, code, and testing.
While nearly all analyses have some impact on the reliability of the software, there are some analyses
that are directly related to defects and therefore to SR:
 Software defect root cause analysis (RCA)—What causes most of the defects?
 Software failure modes effects analysis (SFMEA)—What kind of effect will relevant failure
modes have on system?
 Software fault tree analysis (SFTA)—How can a system hazard be caused by software?
Figure 24 illustrates the flow of data between the failure modes analyses. All analyses can and are
employed to drive the development and test strategies discussed later in 5.4.1, 5.4.2, and 5.4.3. The
defect RCA can be conducted regardless of whether there is a waterfall or incremental LCM. Early in
software development, defect reports from a prior release can be used for the analysis. The SFMEA is a
bottom-up analysis that starts at the failure modes and works up to the system events that could be
caused by software failures. The SFMEA can be repeated in each increment if there is an incremental
development process. The SFTA is a system analysis that can be conducted at any time during
development or test regardless of whether there is a waterfall or incremental development.
Figure 24 —Develop failure modes model
56
IEEE Std 1633-2016
Table 10 illustrates the purpose of each of the preceding analyses as well as the applicability for
incremental development. The failure modes analyses are not affected by the development model. They
are performed whenever a particular artifact is available during development, regardless of whether
there is a waterfall development or an incremental development. With an incremental development, the
failure modes analyses can be and usually are revisited in each increment. The analyses can be used
together to improve effectiveness. The software defect RCA for example, can be used whenever there
is not a budget for the SFMEA. It can also be used prior to the SFMEA to increase the confidence that
the SFMEA focuses on the most key failure modes.
Table 10 —SRE Tasks related to developing a Failure Modes Model

Failure modes analyses Purpose/benefit Applicability to incremental development
5.2.1 Perform software defect Supports effective and targeted The defect reports can be sampled at any time
RCA defect reduction. from any sprint increment or all sprint
increments combined.
5.2.2 Perform SFMEA Identified whether and to what Performed whenever a particular artifact or
degree particular failure modes portion of artifact is available. Can be
affect the system. performed in one increment or several
increments depending on the assessed risk of
the outputs of that increment.
5.2.3 Include software is the Used to identify the sources of Not affected by the LCM. This analysis is
system FTA one fault that could be caused by performed whenever there is a system FTA.
software, hardware or a
combination thereof.
The SFMEA and SFTA can be used together when there is a brand new system with unknown failure
modes and unknown system hazards. In that case, the SFMEA and the SFTA can be performed so that
they meet in the middle. That means that the analyses are performed until the SFTA is identifying
software failure modes that were not captured on the SFMEA and the SFMEA captures system level
hazards that are not known. Table 11 provides a checklist for choosing which and when to use the
preceding analyses.
Table 11 —When and how to perform the failure mode analyses

Defect
Criteria for selection SFMEA SFTA
RCA
It is not known which of the hundreds of possible software failure modes are X X, do the
most likely for this software. defect
RCA first
The budget dictates that only the most common failure modes be analyzed in X X, do the
the SFMEA. defect
RCA first
The budget dictates that the SFMEA focus on the most common cause failure X
modes that can be found during a requirements or design review.
There is a need to identify single point failures as well as common cause failures X
(failures caused by the same failure mode but in multiple locations of the code).
The product is mature but the code is suspect. X
There is a desire to make requirements, design, and code reviews more effective X
by combining them with the failure mode analyses.
The software requirements specification does not describe very well how the X X
software should handle negative behavior or hazardous events.
57
IEEE Std 1633-2016
Table 11—When and how to perform the failure mode analyses (continued)
Defect
Criteria for selection SFMEA SFTA
RCA
The technology or the product is brand new. System level hazards are not X X
completely understood.
There are a small number of well-known top-level hazards, but it is unclear how X
or if the software can cause those
There is a need to identify failures that are due to a combination of events. X
A serious but intermittent event has occurred and it is urgent that the failure X
mode(s) associated with it be identified.
5.2.1 Perform a defect root cause analysis (RCA)
This task is a recommended but not required prerequisite for the SFMEA (task 5.2.2). This task is also
recommended if the goal of SRE is to make improvements in the development activities that are
causing specific types of defects to occur more often than others. This lead for this task is usually the
software quality assurance organization because this organization typically has access to the failure
data. This task does require cooperation and inputs from the software development engineers. This task
is most efficient when the software defect and failure reporting system has been defined such that
software root causes are required input for closing software problem reports.
All software defects are ultimately caused by mistakes made during the development activities. The
causes can be anything from workmanship-type mistakes to poor requirements to misunderstanding of
the operational constraints. Many software defects are caused by humans who are not actually writing
the code. For example, if the specification is erroneous then the code will be erroneous. If the interface
design is erroneous then the interface code will be erroneous. The purpose of the defect RCA as well as
the other failure mode analysis is to understand the development activity or activities that are
introducing most of the software defects.
The sources of the defects can be and usually are unique for each organization and product. Hence, a
defect RCA on one software project does not necessarily provide value for another project. Figure 25
shows the steps for performing a defect RCA. The steps to perform a defect RCA are given in
Figure 25, while Table 12 provides keywords commonly associated with root cause defects.
a) If the software LRU(s) are not yet in the post integration testing phase collect defect reports
from requirements and design reviews, inspections, and walkthroughs. As a point of
reference, look at the defect reports from a recent, similar version of software. If the software
is in the testing phase, collect the defect reports generated during testing. If the software is in
testing, collect the defect reports generated during testing.
b) Review each defect reported. In some cases the software engineer may have recorded the
root cause for the defect. Otherwise use Table 12 to identify the most likely root cause based
on the keywords in the defect report.
c) Count up the number of defect reports in each of the categories in Table 12.
d) Generate a Pareto diagram that shows the most common root causes from left to right.
Figure 25 —Checklist for performing a defect root cause analysis
Table 12 contains some keywords that are associated with common software defect root causes. The
software engineer who corrects a defect knows what the root cause(s) is/are in order to correct it.
Hence, if the software engineers record the root cause for each corrected defect, then the reliability
practitioner does not need to search for common keywords such as those listed in Table 12. Notice that
the keywords are listed in order of the phase that the defect is introduced, which includes requirements,
design, and code. A requirements defect is a defect in which the required functionality is not
58
IEEE Std 1633-2016
implemented. It is possible that the design and code execute the wrong requirement perfectly. In that
case, the software is still faulty because it does not address the requirements. It is also possible that the
requirements are captured and understood correctly but the design does not support the requirements.
Finally, It is possible that the requirements and design are correct but the code has not been
implemented to meet either or both.
The last column of Table 12 shows the recommended resolutions if the particular root cause happens to
be the most common. For example, if the functionality root cause is the most common root cause then
the recommended resolutions for functionality is appropriate.
Example: Several defect reports are collected and analyzed. The keywords from Table 12 are searched,
parsed, and tabulated. The results are shown in Figure 26. In this example, the exception handling is the
most common root cause. This is a design related issue. The design document templates should be
reviewed to verify that exception handling is covered in the design. The design review templates should
also be reviewed to verify that exception handling is inspected as part of the design review. Note that
Figure 26 is only an example. It should not be construed that all systems will have the following root
causes.
Pareto of root causes

30
25
20
15
10
5
0
Figure 26 —Example defect root cause analysis
59
IEEE Std 1633-2016
a
Table 12 —Keywords associated with common root causes for defects
Failure mode Keyword Recommend resolutions
Requirements
Functionality Specification, required, Employ more rigorous requirements related reviews. Use the
functionality, desired, functional SFMEA prior to designing and coding. See 5.2.2.2.
requirements
Design
Timing Timing, synchronization, Employ timing diagrams during design, include timing related tests
slow, fast, timeout, race in the reliability test suite in 5.4.1.4.
condition
Sequence/logic/ Sequence, order, logic,  Use state transition tables and diagrams, and logic
state state, or persistent use of diagrams during design. Verify that the test plans cover all
if, else, or otherwise. state transitions as per 5.4.1.3.
 Employ unit testing procedures that require a specific level
of coverage. See 5.3.9.2.
 Employ the interface SFMEA during architectural design
and the detailed SFMEA during detailed design and
coding. See 5.2.2.2.
Data Corrupt, output, input, Use data flow diagrams and interface design diagrams during
unit of measure, results, design. Specify the data types, formats, unit of measure; default,
overflow, underflow minimum and maximum values. Employ the detailed SFMEA
during design and code reviews. See 5.2.2.2.
Exception Exception, detection, Employ a more rigorous review of the requirements, top-level
handling error, recovery, fault, design, interface design to verify that hardware and software failures
failure, retry, hardware, are adequately handled.
fault, try/catch
Interfaces Interfaces, parameters Employ a system wide interface design specification. Review the
design for each subsystem with regards to the interface contracts.
Employ an interface SFMEA. See 5.2.2.2.
Coding
Memory Memory, resources, free, Review the code specifically for memory allocation issue and/or
allocate, deallocate employ automated tools that search for common memory leaks.
Algorithm Algorithm, formula, Verify that all algorithms are documented in a detailed design
divide, multiply, etc. document; are coded to conform to that design; and are unit tested
by the developer. Employ a detailed SFMEA that focuses on what
can go wrong with the algorithms. See 5.2.2.2.
a
Reprinted with permission of A.M. Neufelder [B64].
5.2.2 Perform a software failure modes effects analysis
This task is recommended if any of the items on Table 11 indicate that it should be performed. The SFMEA
can be a labor intensive. However, if properly tailored and planned and executed, the cost can be
outweighed by the cost of the reduced defects and the cost of reworking the requirements, design and code
late in development. The lead role for the SFMEA depends on the viewpoint. The functional, interface,
usability SFMEAs can be led by reliability engineering and supported by the appropriate software
engineers and designers. The detailed, vulnerability and serviceability SFMEAs should be led by the
software designers and software engineers but monitored or facilitated by someone who is knowledgeable
of the FMEA process such as reliability engineers. The production SFMEA is typically led by software
management.
The goal of the SFMEA is to identify key failure modes in the requirements, design and code and
describe the appropriate method in which the defect can be isolated and mitigated. The SFMEA can be
performed in different phases of product development. For example, for ease of execution it can be
incorporated as part of software code review where the reviewers discuss any applicable failure modes
for the software. The main element that enables a design team to have a successful and fruitful SFMEA
60
IEEE Std 1633-2016
is to have failure mode taxonomy. Without this information, it is difficult to bring any consistency into
the SFMEA process.
Traditionally, the Failure Modes and Effects Analyses is a reliability analysis technique used to assess the
risk of a system failing during operation. For software, it can also be used to identify software defects that
can lead to system/subsystem failure that would be difficult to instigate during testing. Unexpected data or
behavior of software are examples. SFMEA identifies key software failure modes for data and software
actions and analyzes the effects of abnormalities on the system and other components. This can be used as a
systems approach to analyzing the software's response to hardware failures and the effect on the hardware
of anomalous software actions by identification of:
 Hidden failure modes, system interactions, and dependencies

 Unanticipated failure modes
 Unstated assumptions
 Requirements/design inconsistencies
The SFMEA is applicable to any type of software or firmware application and is also applicable to any type
of development LCM. The reason is that there are a core set of failure modes that apply to all application
types and all development models. In other words, the core set of failure modes apply to any software
regardless of whether it is developed incrementally or not. For example, all software and firmware systems
have logic and data. Therefore all software and firmware systems are susceptible to faulty logic and faulty
data.
The FMEA process identified in existing FMEA standards is applicable to software (MIL-HDBK-338B
[B54], MIL-STD 1629A [B56], SAE [B87]). However, most practitioners are not able to easily apply the
FMEA to software because these references are lacking the failure modes or the viewpoints for the
software analysis. This subclause will provide the information needed to apply the FMEA to software. The
following are the steps for performing a SFMEA (Neufelder [B64]):
The process for conducting an SFMEA is shown in Figure 27.
Prepare the SFMEA—The effectiveness of the SFMEA is highly dependent on up-front preparation and
planning. It is important to make sure that the SFMEA is focused on the most critical aspects of the
software or firmware and the most likely failure modes. It is also important that all participants in the
SFMEA have a common understanding of how to complete the SFMEA. In this step the analysts identify
where the SFMEA applies, set some ground rules for the SFMEA, identify applicable viewpoints, identify
the riskiest parts of the software, identify and gather documentation required for the analysis, identify
personnel resources needed for the analyses, identify the likelihood and severity thresholds for mitigating
risks, and define the template and tools to be used for the SFMEA.
Analyze software failure modes and root causes—One of the most critical aspects of the SFMEA is
determining the failure modes that are most applicable for a particular system. Overlooking one failure
mode could result in missing an entire class of software defects. Sometimes the simplest of failure modes
are involved with the most serious failure events. To reduce the possibility that applicable failure modes
and root causes are not overlooked, common software failure modes and root causes for each of the eight
software viewpoints are analyzed.
Identify consequences—Once the lists of failure modes and root causes are complete, the next step is
identifying the effects on the software itself (local), subsystem, and system. If there is a user interface, the
effects on the user will also be identified. Software engineers are often able to identify the local effects.
However, connecting the local events to the subsystem and system effects often requires creative thinking
and system level expertise. Consequently, this step usually needs to be completed by more than one person.
61
IEEE Std 1633-2016
Once the effects are identified, the compensating provisions, preventive measures and severity, and
likelihood are assessed.
NOTE—SFMEA process reprinted with permission from Softrel, LLC “Effective Application of Software Failure Modes Effects
Analysis” © 2014.
Figure 27 —SFMEA process

Mitigate—Once the consequences are known, the mitigation can include a corrective action to either
reduce the effect or eliminate the failure or a compensating provision to avoid the failure. Additionally, a
check should be made of the consequences of the chosen mitigation strategy to assure that additional
problems do not arise from that implementation. Also, consider that the best solution may require hardware
changes.
Generate a critical items list (CIL)—The output of the SFMEA is the list of critical items and their
associated mitigations. This list should be used as input to several other SR and engineering tasks such as
developing the test suite to test the failure modes as part of the stress case coverage testing. The checklist
for preparing a SFMEA is shown in Figure 28.
a) Prepare the SFMEA. See 5.2.2.1.

b) Analyze failure modes and root causes. See 5.2.2.2.
c) Analyze consequences of the software failure modes and root causes. See 5.2.2.3.
d) Mitigate the consequences as per 5.2.2.4 and generate a CIL. See 5.2.2.5.
e) Understand the differences between a SFMEA and a hardware FMEA. See 5.2.2.6.
f) Refer to F.2.1 for an example.
g) Use the SFMEA results to drive the development and testing activities (see 5.4.1.5).
Figure 28 —Checklist for performing a SFMEA
5.2.2.1 Prepare the SFMEA
The steps for preparing the SFMEA are shown in Table 13.
62
IEEE Std 1633-2016
Table 13 —Checklist for preparing a SFMEA

SFMEA task Steps
1. Identify where a. Identify all parts of the system.
the SFMEA b. If there is a preliminary hazards analysis (PHA) and/or a risk management matrix, retrieve it.
applies c. Rank each software LRU based on its impact on safety.
d. Rank each software LRU based on its mission criticality.
e. Identify a numerical ranking of each component based on its combined impact on mission and
safety.
2. Identify the a. Identify all parts of the system that existed prior to this version of the software.
riskiest parts of b. Identify which of the existing components will undergo a major revision or rewrite.
the software c. Identify all software LRUs that will be reused without modifications.
d. For all of the software LRUs that existed previously, rank each of them based on how many
defects were fielded in that component.
e. Identify a risk rating of low, medium and high for each software LRU based on steps a)
through d).
f. Revisit the criticality impact assessed in step f). For each software LRU multiply that
criticality ranking by the development risk ranking. Rank the items by the highest combined
safety and mission criticality combined with highest development risk.
g. Adjust the weighting for each of the risks based on program and organization policies. For
example, the above shows equal weighting for safety, mission, and development risk. Based on
the needs of the system, these weights may need adjustment.
h. Document the combined risk ranking of the components and the justification for the ranking.
3. Select Review Table A.1. For each of the software LRUs that has been determined to be in scope, identify
applicable which viewpoint(s) apply.
viewpoints
4. Identify and Review Table A.2. Each of the viewpoints requires certain information or artifacts. Make sure to
gather acquire the artifacts prior to starting the SFMEA.
documentation
and artifacts
5. Identify the Usually the SFMEA cannot be performed by just one individual. Some parts of the SFMEA can be
people needed for completed by individuals while others are a group effort. Experienced software engineers, systems
the SFMEA engineers, and software assurance engineers are usually required. Table A.2 summarizes the
artifacts needed for 7 of the 8 viewpoints. The production SFMEA generally does not require any
artifacts. Table A.3 summarizes the personnel required for the analyses. Use this table to identify
the people needed for the analysis.
6. Decide a. Review the list of components and the estimated SFMEA times
selection scheme b. Identify how each applicable viewpoint can be pruned.
c. Revisit the time estimates for the SFMEA preparation based on the revised scope
d. Compare the estimated time for the revised scope to the budget.
e. Revisit steps 2 through 4 until the scope and time estimates are acceptable.
7. Define ground a. Identify and agree on the ground rules that will be taken when doing this SFMEA with respect
rules to human error, interface chains, network availability, speed and throughput. Refer to
Table A.4.
b. Document the ground rules for the SFMEA.
c. Make sure that all SFMEA participants are aware of the ground rules.
8. Define severity a. Locate the FDSC document for this project.
and likelihood b. Identify which failures are applicable for software.
c. Tag each of the failures with the appropriate criticality levels
d. Record the failure scoring definitions and associated criticality levels in the SFMEA
documentation as they will be needed later to assess criticality
e. Locate the risk matrix that is being used for this project. This risk matrix will be used for
determining which SFMEA items are placed on the CIL and mitigated.
9. Select template The SFMEA has a similar template to the hardware FMEA. However, the template requires some
tailoring to accommodate the eight SFMEA viewpoints.
5.2.2.2 Analyze failure modes and root causes
Once the SFMEA has been planned and prepared, the next step is to construct the failure modes and root
causes section of the SFMEA table as per Figure 29.
63
IEEE Std 1633-2016
a) Research past failure modes and root causes (Beizer [B4], Common Weakness Enumeration [B10],
Neumann [B69]) from similar systems developed in the past. Use the defect RCA from 5.2.1 if
available.
b) Brainstorm additional failure modes and root causes that pertain to this software system
c) If the selected SFMEA is a functional SFMEA, copy the software requirements that are in scope for
the SFMEA into the Table A.5. Select the failure modes that are applicable for this requirement.
Brainstorm the root causes for the applicable failure modes.
d) If the selected SFMEA is an interface SFMEA, copy the software interfaces that are in scope for
the SFMEA into the Table A.6. Select the failure modes that are applicable for this interface.
Brainstorm the root causes for the applicable failure modes.
e) If the selected SFMEA is a detailed SFMEA, copy the template in Table A.7. Identify the
applicable functions selected for the analysis. Inventory the selected functions and determine which
of the failure modes is applicable for this detailed design. Functionality, data, and exception
handling is almost always applicable while sequences, algorithms, memory management and
Input/output are not necessarily applicable to every function. Brainstorm the root causes for the
applicable failure modes.
f) If the selected SFMEA is a maintenance SFMEA, copy in all of the corrective actions,
implemented since the last baseline, into the template in Table A.8. For each corrective action,
inventory the selected function and determine which of the failure modes is applicable for this
detailed design and which is affected by the corrective action. Brainstorm the root causes for the
applicable failure modes.
g) If the selected SFMEA is a usability SFMEA, copy in all of the use cases into Table A.9 For each
use case, identify the applicable failure modes. Brainstorm the root causes for the applicable failure
modes for each use case.
h) If the selected SFMEA is a serviceability SFMEA, collect the installation scripts for the software.
Identify the applicable failure modes. Brainstorm the root causes for the applicable failure modes.
Use Table A.10.
i) If the selected SFMEA is a vulnerability SFMEA, the steps are similar to the detailed SFMEA.
However, the focus is on the vulnerability design and coding issues. Identify the failure modes that
apply to the design or code under analysis. For each applicable failure mode, identify the common
weakness enumeration that pertains to each failure mode. Use Table A.11.
j) If the selected SFMEA is a production SFMEA Use Table A.12. This viewpoint is the only
viewpoint that is process versus product related. This viewpoint focuses on why the organization is
deficient at detecting software failure modes prior to deployment. Every failure mode will have at
least one associated product related cause and at least one production related cause.
Figure 29 —Checklist for analyzing failure modes and root causes
5.2.2.3 Identify consequences
For each row of the SFMEA table identify the consequences as per Figure 30.
64
IEEE Std 1633-2016
a) Continue using the template that was selected in the previous step 5.2.2.2. Proceed to the
consequences section of that figure. See Table A.13.
b) Identify the local effect of each failure mode and root cause on the software LRU itself. Usually the
software engineering subject matter experts can identify this.
c) Identify the effect of each failure mode on the subsystem. Usually the engineers most experienced
with the subsystem or system can identify this.
d) Identify the effect on the system. Usually the engineers most experienced with the subsystem or
system can identify this.
e) Identify any preventive measures for this effect. Examples of preventive measures are increasing
bandwidth or memory.
f) Identify the severity and likelihood of each failure mode. The risk priority number (RPN) is
calculated from the severity and likelihood ratings. The RPN of a particular failure mode is
compared against the RPN matrix to determine which failure modes are to be mitigated, should be
mitigated, etc.
NOTE—A failure modes effects and criticality analysis (FMECA) is a FMEA that has a quantitative assignment for the
criticality such as a ranking from 1 to 10 so that the probability of the failure mode can be computed in addition to the
RPN.
Figure 30 —Checklist for identifying SFMEA consequences
5.2.2.4 Mitigate
This section of the SFMEA identifies the applicable corrective actions and compensating provisions. If
there are corrective actions or compensating provisions, then the RPN is revised. Corrective actions include
changes to the requirements, design, code, test plan, user’s manuals, installation guides, etc. Compensating
provisions are applicable when an action other than a corrective action can mitigate the failure. For
example, in some cases an end user can mitigate a software failure if they are aware of it soon enough to
avoid it. Figure 31 provides a checklist for mitigation.
a) Continue from 5.2.2.3. See Table A.14.

b) Identify any applicable corrective actions for each failure mode and root cause. For software these
include changing the requirements, changing the design, changing the code, developing a specific
test case, changing the user manual, etc.
c) Identify any compensating provision for each failure mode, root cause, effect. A compensating
provision may include involvement of the user to avoid or downgrade the event, hardware
interlocks, etc.
d) Complete the RPN in light of any corrective action that has been made. If no corrective action is
made then the RPN for mitigation is the same as the initial RPN.
Figure 31 —Checklist for SFMEA mitigation
5.2.2.5 Generate a critical items list (CIL)
The critical items list (CIL) is the final SFMEA ranked so that only the highest RPN items are placed into
the CIL. The software CIL is then merged with the hardware CIL so as to establish the system CIL. A
complete example of a SFMEA can be found in F.2.1.
65
IEEE Std 1633-2016
5.2.2.6 Understand the differences between a hardware and software failure modes effects
analysis
The items not relevant for a SFMEA are shown in Figure 14:
Table 14 —Differences between SFMEA and other FMEA

FMEA
Explanation
consideration
Definition of For hardware, likelihood is related to the part failure rate. For software, likelihood is related to
likelihood how likely this particular failure mode in this particular function will transpire in operation. The
likelihood is based on the following:
 How obvious is the failure mode and local effect? If the failure mode in a particular
function is so obvious that it is visible under any set of inputs in any type of testing then
it is not likely to transpire in operation as it would be most likely be removed prior to
that time.
 Is the particular function or code in the critical path or a function that is occasionally
executed?
 Is the specific failure mode in a particular function explicitly covered in any test
procedure?
 How many times in the past has this particular failure mode been problematic for this
type of system
Computation of a Failure rate is computed at the LRU level of the software—the SFMEA is performed at a lower
component failure level than LRU.
rate
Viewpoints The software has seven possible design viewpoints and one process viewpoint. The SFMEA
template varies depending on the viewpoint.
Failure modes Software will not wear out. The failure modes are be specific to software.
Redundancy is an Software redundancy can be expensive and does not necessarily mitigate all failure modes. For
effective mitigation example, if the requirements are faulty, redundant software may result in two software systems
that both have faulty requirements. Redundant hardware will generally not mitigate any failure
mode that is not related to timing or memory.
5.2.3 Include software in the system fault tree analysis (FTA)
This SRE task is recommended when any of the items in Table 11 indicates applicability or whenever there
is a system FTA being conducted and the system is software intensive. This task should be a joint effort
between reliability engineering and software management. FTAs, traditionally used for system hazard
and/or safety analyses, provide a top-down look as follows:
 Identification of critical fault paths and design weaknesses in the software

 Identification of the best place to build in fault tolerance of software
 Identification of critical failure modes that cannot be designed out
If the software is part of a hardware/software system, it should not be analyzed in a vacuum. This is
because many failures are related to interfaces between or interactions between software and hardware.
Software failure events are added to the system level tree and analyzed with the same fault tree connectors
and diagramming that is used for hardware. The software fault tree is integrated with the system fault tree
as opposed to being a standalone fault tree. If the system is software only, then the system FTA is
equivalent to a software FTA.
The part of the FTA (Vesely et al. [B93]) that is unique to software is brainstorming the software related
failure modes. Table 15 can be used to facilitate such brainstorming (Neufleder [B64], SAE [B87]).
0 is the checklist for including software on a system FTA.
66
IEEE Std 1633-2016
Table 15 —Software failure events

Software failure mode Description
Faulty sequencing Operations happen in the wrong order or state transitions are incorrect.
Operations start too early or too late or take too long. The right event happens in
Faulty timing
the right order but at the wrong time.
Faulty output The output of the system or software is incorrect. Outputs can also be behavioral.
Undesired outputs The output or actions of the system are technically correct but not what is desired.
Faulty functionality The system does the wrong thing or does the right thing incorrectly.
Faulty error detection The software fails to detect an error in itself, hardware, or environment.
False alarms The software is overly sensitive and detects an error that does not exist.
a) Verify that knowledgeable engineers are involved in the construction of the system FTA.
b) Identify system level events that have been caused by software in the past.
c) For each failure event on the system FTA, brainstorm how that failure mode could be caused by
one of the software failure events in Table 15.
d) Brainstorm system level events that can be caused by software but not hardware.
e) Add each viable software failure event to the tree just as the events due to hardware are added to
the tree.
f) Use the appropriate connector such as AND, OR, Exclusive OR to connect the failure event(s) to
the tree.
g) At each level of the fault tree, repeat the preceding steps.
h) When the fault tree has reached the lowest failure mode then the failure modes are reviewed as a
whole and investigated or tested to determine if they have or can cause the top-level event. If the
event has already happened and the FTA is being used to isolate the root causes then each of the
failure modes at the bottom of the tree will be investigated in ranked order or likelihood to isolate
the event at the top of the tree. If the analysis is being performed a priori to any field events then
each of the failure modes at the bottom of the tree are further investigated to determine appropriate
mitigation and to verify that the test plans include testing of the failure mode.
Figure 32 —Checklist for including software in a system FTA
5.3 Apply software reliability during development
“Development” within the context of this document includes the tasks relating to software requirements,
software architecture and design, software detailed design and coding, implementation, and software unit
testing. These models can be used as early as the proposal or concept phase because they do not require any
defect or testing data that is not available until later in the development cycle. The models can be used
when there is an incremental LCM and they can also be applied to COTS and FOSS software LRUs as well
as firmware.
The SRE activities for the development phase are illustrated in Figure 33. The identification of the system
reliability objective is typically done first either by an acquisitions person or by marketing. The SR is
assessed and predicted as a parallel activity with the hardware reliability prediction activities. Once the
predictions are complete they can and should be sanity checked against typical SR values. The predictions
may be updated depending on the results of the sanity check. If within the scope of the SRE plan, the
67
IEEE Std 1633-2016
assessment results are analyzed for sensitivity. This analysis identifies both the strengths and gaps in the
development activities that will ultimately effect the system prediction. Once the SR predictions are
finalized they are merged with the hardware predictions into the system reliability model or RBD. After
that the total SR (from all software LRUs) that is needed to meet the system objective is determined. The
SR growth needed to meet the system objective is then determined based on schedule and resources
available. The sensitivity analysis may be revisited to determine how to optimize the reliability growth of
the predictions. It may be necessary to update the system objective if it cannot be met with the current
schedule. For some projects there may be many software LRUs developed by several different
organizations. In that case, it may be necessary to allocate the top-level software requirement down to each
of the LRUs so that it can be tracked by the appropriate software engineering teams. Within each software
LRU it may be necessary to perform a sensitivity analysis in order to meet the particular LRU allocation. In
parallel to all of the SRE modeling the software is being tested by each software engineer and the white box
test coverage is monitored. When development for the increment or the entire system is nearing
completion, the reliability growth requirements and the measured white box test coverage is analyzed to
determine whether to transition to the system level testing phase.
Figure 33 —Predict software reliability during development

This subclause will cover how to perform the following SRE activities. The inputs and outputs of each
activity follow. Each of these activities is applicable when there is an incremental or evolutionary LCM. In
5.3.2.4 instructions are provided for using the prediction models when there is not a waterfall LCM in
place. All of the tasks illustrated in Table 16 can be employed during requirements, architecture design,
detailed design, coding, and unit testing regardless of the LCM employed.
68
IEEE Std 1633-2016
Table 16 —Summary of SRE tasks that are used during development

Applicability for incremental
Reference Purpose/benefits
development
5.3.1 Determine/ obtain the Identifies the reliability goal for the Not affected by LCM since it
system reliability objectives in entire system. occurs before development.
terms of reliability, availability,
MTBF
5.3.2 Perform a software The first step in quantifying SR. Yes. See 5.3.2.4.
reliability assessment and
prediction
5.3.3 Sanity check the early Indicates whether or not the Yes. See 5.3.2.4.
prediction prediction is reasonable.
5.3.4 Merge the software Combines the software with the This applies to all LRUs in all
predictions into the overall hardware predictions. increments.
system predictions
5.3.5 Determine an appropriate Determines the top-level SR required Not affected by development cycle
overall SR requirement to meet the system objective. or LCM.
5.3.6 Plan the SR growth Increases the confidence that the Not affected by development cycle
prediction can be met for the or LCM.
software given the current schedule.
5.3.7 Perform a sensitivity Identifies gaps, strengths, and This is performed in an early
analysis development practices that are increment or sprint.
least/most sensitivity to reliability
5.3.8 Allocate the required If there are several software Any LRU developed in any
reliability to the software LRUs organizations this step may be increment is subject to an
necessary for tracking. allocation.
5.3.9 Employ SR metrics for Identifies whether the code is stable Performed at each increment or
transition to system testing enough to progress to verification sprint as well as final increment.
testing.
5.3.1 Identify/obtain initial system reliability objective in terms of reliability, availability,

MTBF
This is an essential task and is a prerequisite for several other tasks. A system reliability objective is a
reliability figure of merit for the entire system including hardware and software. It is usually initially
developed during the concept or proposal phase of the project. The initial system reliability objective may
be and often is refined once the reliability predictions for the system and the engineering efforts begin. The
system reliability objective may be determined by the acquisitions organization, systems engineering, or
marketing depending on the type of system being developed. The reliability objective may be and often is
specified in a contract and hence will also be called a system reliability specification.
The potential issues with the system reliability objective with regards to SRE that are addressed by this
subclause are as follows:
a) Software is often not reflected or considered in the objective.

b) Software is considered in the objective but not communicated to the engineering and reliability
teams.
c) The system objective does not specify when (i.e., such as the milestone) the software and hardware
should meet the objective.
d) The definition of failure is not defined in an FDSC and not communicated with the objective.
e) The objective may not be the most suitable of merit for the SR.
69
IEEE Std 1633-2016
The checklist shown in Figure 34 is recommended to provide for a system reliability figure of merit that
reflects the software part of the system, is clearly defined and clearly communicated. Table 17 lists some
general guidance for identifying the appropriate reliability figures of merit.
a) Use Table 17 to determine which reliability figures of merit are most applicable.
b) Verify that the specification clearly indicates that the requirement applies to both hardware and
software.
c) Verify that the objective considers the impact of software failures. Review any past similar
systems and determine the actual reliability figures of merit and the impact of the software on
those systems prior to establishing the specification. Remember that systems rarely lose
functionality. If a past system had system failures due to software, a future system will likely
have more due to systems having more functions performed by the software.
d) Specify particular milestones for when the objective is to be met. SR will grow when there is
reliability growth testing and no new features are added. Once new features are added, the
reliability growth resets. Consider this when establishing the milestones for the objective. Some
typical milestones are as follows:
 End of engineering test
 End of acceptance test
 Average of first year of usage
e) Verify that the initial system reliability objective clearly defines the word failure. An FDSC is
one of the best ways to do this. See 5.1.2. Make sure that the FDSC includes examples of
expected failures due to software and assigns an appropriate criticality to those failures.
f) Derive the quantitative objective itself based on the recommendations in Figure 35, Figure 36,
and
Figure 37.
Figure 34 —Checklist for identifying/obtaining the initial system reliability objective
Table 17 —Specification figures of merit for reliability

Figure of merit When it is most applicable
MTBF/MTBCF/MTBSA/MTBEFF/failure rate Most applications
Reliability (as a fraction between 0 and 1) Mission type systems that have a defined mission time such as
vehicles or aircraft.
Availability (as a fraction between 0 and 1) Systems that run continually (satellites, networks, etc.)
There are three possibilities for deriving a quantitative objective. First, the objective can be derived from a
predecessor system. See Figure 35 for how to derive the objective in that case. If there is no predecessor
system then there is another alternative for deriving the objective. For instructions for this case see
Figure 36. If the system is mass deployed there is a third means of deriving the reliability objective. See
Figure 37 for instructions for how to derive a quantitative objective for mass deployed systems.
70
IEEE Std 1633-2016
a) Identify the actual figure of merit from field data. For example, if the selected figure of merit is
MTBSA then calculate the actual MTBSA for a predecessor system. Separate the software and
hardware related failures in the data.
b) Determine how long it has been since the software was developed for this predecessor. Multiply
the number of software failures by at least 110% for each year since that predecessor software
has been deployed. This accounts for the fact that systems on average increase in size by 10%
each year. (See US GAO [B92].)
c) Determine how much the hardware has changed. Keep in mind that some hardware may be
replaced by software. Adjust the historical hardware failure count by this percentage. The result
is an adjusted system failure count. Compute the relative difference between the adjusted
system failure count and the historical system failure count.
d) Determine the MTBSA objective by adjusting the result of step c) by the historical MTBSA.
Determine the MTBF, MTBEFF, and MTBCF similarly.
e) Determine the typical mission time for the new system. Use the MTBCF objective determined
in step d) and the typical mission time to determine the required reliability for the new system.
f) Determine the typical MTTR for the hardware and the MTSWR for the software. (See 5.3.2.3
Step 5.) Average the MTTR and MTSWR based on the percentage of HW and SW based on the
adjusted failures expected from each. This is the average repair and restore time. The
availability objective is computed using the average restore time and the predicted MTBCF
from step d).
Figure 35 —Derive the quantitative objective when system has at least one predecessor
a) Use one of the models in 5.3.2.3, 6.2, and Annex B to predict the SR.
b) Use an industry accepted model to predict the hardware reliability.
c) Combine the predictions as per 5.3.4.
d) The objective MTBF, MTBEFF, MTBCF, MTBSA, or failure rate is established based on the
achievable failure rates of the hardware and software.
from step d).
Figure 36 —Derive the quantitative objective if the system has no predecessor
71
IEEE Std 1633-2016
a) Determine the maximum acceptable number of maintenance actions that require a field service
engineer to perform.
b) Determine the expected average duty cycle of the system
c) Using established methods such as a Reliability Demonstration Test (RDT), determine the
maximum failure rate that will support the maximum number of maintenance actions from step
a).
d) Invert the result of step c) to determine the objective MTBF. Adjust it by the expected
percentage of failures that will result in a system abort to yield the MTBSA objective. A system
abort means that the software is no longer performing its function. Adjust the objective MTBF
by the expected percentage of failures that will be critical to yield the MTBCF objective. Adjust
the objective MTBF by the expected number of failures that will be in essential functions to
yield the MTBEFF objective.
from step d).
Figure 37 —Derive the quantitative objective if the system is mass deployed
5.3.2 Perform a SR assessment and prediction
This is an essential task that is a prerequisite for tasks 5.3.3 through 5.3.8. This task can also be used to
help define the reliability objective. See 5.3.1. The lead role for initiating the SR assessment is typically the
reliability engineer or software quality engineer. The assessment requires that software engineers provide
inputs. The results of those inputs are provided to the reliability engineer who will perform the SRE
calculations.
A SR assessment is when the practices employed on a SR project are assessed to predict either the risk
level of the software or a SR figure of merit such as defect density. All SR assessments have some sort of
survey or questionnaire. The survey is completed and scored to yield a predicted score, which determines
the defect density prediction that is then used to predict the reliability figures of merit. The survey is also
used for sensitivity analysis. One can determine what the predicted defect density would be, for example,
by instituting a particular change in the planned development practices. This predicted change can then be
compared with the cost of implementing that change and the cost reduction of having fewer defects as
predicted by the sensitivity analysis.
SR assessments also allow for a reliability engineer to sanity check the predicted SR results against an
actual range of values. Evaluations are also useful for establishing the SR or risk level of a software vendor.
Figure 38 contains the checklist for performing a SR assessment and prediction.
72
IEEE Std 1633-2016
a) Collect data about the project and product such as size and software LRUs. See 5.3.2.1.
b) Select a model to predict software defects or defect density. See 5.3.2.2.
c) Once the model is selected as per 5.3.2.2, proceed to 5.3.2.3, and then to 6.2 or Annex B for
instructions on how to use the selected models.
d) If the software is being developed with an incremental or evolutionary LCM, apply the models as
per 5.3.2.4.
e) If the software is being developed by more than one organization, use the assessment to qualify a
subcontractor, COTS, or FOSS vendor as per 5.3.2.5.
Figure 38 —Checklist for performing a software reliability assessment and prediction
5.3.2.1 Collect data about product and project
The prediction models for SR assessment require three types of data as follows:
a) Software project related data

b) Size data
c) Software LRU data
Project data contains information to identify and characterize each system. Project data allow users to
categorize projects based on application type, development methodology, and operational environment.
Typically a Software Development Plan (SDP) has most of the information needed for the predictions. Size
data is necessarily for predicting the defects. Smaller systems will usually have fewer defects than larger
systems. The practitioner also needs to know other characteristics about the LRU such as who is developing
the LRU. Figure 39 is a checklist for collecting the data required for the SR prediction and assessment.
a) Collect the following project information (from the SDP) including:

1) The name of each life-cycle activity (e.g., requirements definition, design, code, test,
operations)
2) The start and end dates for each life-cycle activity
3) The effort spent (in staff months) during each life-cycle activity
4) The average staff size and development team experience
5) The number of different organizations developing software for the project
b) Collect the size data as per B.1.1, B.1.2, and B.1.3. Collect the following for each software LRU
that has been determined to be in scope as per 5.1.1.1.
1) The name of each component
2) Software size in terms of executable source lines of code that specifies the following:
 Number of new lines of code
 Number of reused but not modified lines of code
 Number of reused and modified lines of code. Of this how many lines are changed due to a
redesign versus a change to only the code (but not the design).
73
IEEE Std 1633-2016
 Number of auto-generated lines of code

 Source language used
c) For each LRU identify the following:
1) Responsible development organization (if there is more than 1 organization developing
software)
2) Average and peak computer resource utilization of this component
3) If there is incremental development, which increment is this component schedule to be
complete?
4) The particular hardware component associated with this component
Figure 39 —Checklist for collecting product, project and size data
5.3.2.2 Select a model to predict software reliability early in development
There are several models available for predicting SR before the code is complete as follows. Each model
has different inputs and hence will usually produce different outputs. The process for selecting the best
prediction model is based firstly on eliminating models that require information than are available or
require more inputs than what the schedule or budget can accommodate. Once the models that cannot be
used are eliminated, the remaining models are assessed to determine which ones are used most in a
particular industry and which ones are the most current with technology. Table 18 is a summary of the
prediction models.
Key:
Number of inputs—Some models have only one input while others have many inputs. Generally speaking,
the more inputs to the model, the more accurate the prediction. However, more inputs also means that more
effort is required to use the model.
Predicted output—The models predict either defect density or defects. Later in 5.3.2.3 Steps 2 and 3, it
will be shown how to convert defects/defect density to failure rate.
Industry supported—All early prediction models are based on empirical data from real historical projects.
Ideally the empirical data is from software systems that are similar to the software system that is
undergoing the prediction. As an example, if one is predicting the defect density of an airborne system, one
would want to use a model that has empirical data from this type of system.
Effort required to use the model—This is directly related to the number of inputs and the ease of
acquiring the data required for the model.
Relative accuracy—The model accuracy is a function of the number of inputs for the model, how similar
the historical data is to the system under analysis, how many sets of data comprised the model, and how
current the historical data is.
74
IEEE Std 1633-2016
Table 18 —Summary of software reliability prediction models
Effort
Year
required
Number Predicted Industry Relative developed/
Model to Reference
of inputs output supported accuracy last
use the
updated
model
Industry tables 1 Defect Several Quick Varies 1992, 2015 6.2.1.2
Neufelder [B68] density
SAIC [B77]
CMMI tables 1 Defect Any Quick Increases 1997, 2012 6.2.1.3
Neufelder [B68] density with
CMMI
level
Shortcut model 23 Defect Any Moderate Medium 1993, 2012 6.2.1.1
Neufelder [B63] density
Full-scale model 94 to 299 Defect Any Detailed Medium- 1993, 2012 B.2.1
Neufelder [B67] density high
Metric based Varies Defects Any Varies Varies NA B.2.2
models
Smidts [B85]
Historical data A Defect Any Moderate High NA B.2.3
minimum density
of 2
Rayleigh model 3 Defects Any Moderate Medium NA B.2.4
Putnam [B71]
RADC TR-92- 43 to 222 Defect Aircraft Detailed Obsolete 1978, 1992 B.2.5
52 density
SAIC [B77]
Neufelder model Defect Any Detailed Medium 2015 B.2.6
Quanterion density to high
[B72]
Year developed/updated—Ideally, the model should be as current with modern technology as possible.
The Rome Laboratory Model is an example of a model that is partially outdated and specific to one
industry. When the Rome Laboratory Model was developed, object-oriented development was just
emerging, Ada was the default programming language, and waterfall development was the standard. The
parts of the model that are not outdated have been adapted and used as a framework for other models such
as the Shortcut and Full-scale models. Parts of the Rome Laboratory Model can be used to calibrate
historical defect data that has been collected either internally or from any of the lookup tables shown in
Table 18. There are several industry tables that map the industry type or CMMI level to defect density.
Several of these lookup tables are outdated. See 6.2.1 for the most current tables.
Figure 40 is a checklist for selecting a SR prediction model based on the preceding characteristics.
75
IEEE Std 1633-2016
a) Review Table 18.

b) Eliminate models that cannot be used due to insufficient resources to collect the number of inputs
required for the model.
c) Eliminate models that cannot be used due to unavailable data.
d) If more than one model remains, determine a trade-off between accuracy and effort required to use
the model. If one is interested in a quick ballpark prediction, then eliminate the models that require
more resources. If one is interested in the most accurate prediction possible, then eliminate the
models with fewer inputs with the understanding that this will require more effort to use the
model(s).
e) If any models are still remaining, eliminate the models with the oldest data.
f) Proceed to 5.3.2.3 to execute the model(s).
Figure 40 —Checklist for selecting a model to predict reliability early in development
5.3.2.3 Use the selected software reliability prediction model
Software reliability predictions models are used during requirements and design phase for the following
purpose:
 During the requirements phase, system and SR prediction models verify that the set reliability
requirements are achievable prior to defining contractual requirements or prior to entering into the
design phase of the system life cycle.
 During the design phase system and SR prediction models verify that the preliminary and critical
designs being proposed can achieve and meet the reliability requirements prior to commencing
system production.
The process for predicting the reliability figures of merit early in development is shown in Figure 41.
Figure 41 —Process for predicting reliability figures of merit early in development
76
IEEE Std 1633-2016
The steps for predicting SR early in development are shown in Table 19.
Table 19 —Prediction steps

Step Reference
Step 1. Predict defects that are expected to be found in either testing or field operation. 5.3.2.3 Step 1, 6.2.1,
B.1, and B.2.
Step 2. Predict when the defects will become observed faults over time. 5.3.2.3 Step 2, 6.2.2
Step 3. Predict the failure rate of the software by dividing the fault profile by the 5.3.2.3 Step 3
estimated duty cycle of the software for each month of operation.
Step 4. If reliability is an applicable metric then predict the reliability from the predicted 5.3.2.3 Step 4
failure rate and mission time.
Step 5. If availability is an applicable metric then predict the availability from the 5.3.2.3 Step 5
predicted failure rate and MTSWR.
Step 6. Update the models whenever the predicted size changes or whenever there is a
change in personnel, processes, techniques, tools, etc.
Step 1—Predict total defects
The steps for predicting defects via defect density are shown in Figure 42.
a) Predict testing defect density using the model(s) selected in Table 18. The specific instructions
for the defect density models are found in 6.2.1 and B.2.
b) Predict operational defect density using the model(s) selected in Table 18. The specific
instructions for the defect density models are found in 6.2.1 and B.2.
c) Predict the effect size as discussed B.1.
d) Multiply the result of step a) by the result of step c) to yield the predicted number of testing
defects.
e) Multiply the result of step b) by the result of step c) to yield the predict number of defects found
in operation, post production release.
Figure 42 —Step 1: Predict defects via defect density
Step 2—Predict the defect distribution
In order to convert a defect prediction to a failure rate prediction, one needs to predict when the defects will
be discovered over usage time. Recall that a discovered defect is essentially a fault. One of the following
approaches is used to determine when the defects will be manifested as faults. All of the methods in
Table 20 can and are used for forecasting reliability growth. These models will be used to forecast when the
predicted defects will be discovered.
77
IEEE Std 1633-2016
Table 20 —Methods to forecast reliability growth/predict when faults will occur

Method to predict
Assumptions Date collection required
when faults will occur
Exponential model. See MTBF increases exponentially up until the estimated Minimal. See 6.2.2.1.
SAIC [B77], Musa growth period
[B59].
Duane model. MTBF flattens out over test time. MTBF increases much Shape parameters from
See [B14]. more slowly than the Exponential model. similar software projects.
AMSAA PM2. See MTBF flattens out over test time. MTBF increases much Shape parameters from
[B3]. more slowly than the exponential model. Provides for a similar software projects.
conservative and slightly less aggressive reliability growth See 6.2.2.2.
curve.
Both Duane and AMSAA-PM2 have MTBF curves that will tend to flatten out over test time. The MTBF
will still increase but much more slowly than the Exponential Method. This is illustrated Figure 43.
Comparison of Exponential model versus Duane/AMSAA PM2

Estimated MTBF
0 2 4 6 8 10 12 14
Predicted MTBF in hours Exponential Predicted MTBF in hours Duane
Figure 43 —Trend with the Exponential model and AMSAA PM2 Model
The shape parameters for the Exponential model are defined based on the type of system. See 6.2.2.1 for
procedures. The Duane and AMSAA PM2 models can be used as predictors if one has historical data from
similar programs to derive the shape parameters. See 6.2.2.2 for procedures.
Figure 44 is an example of a fault profile that results from Step 2 using the Exponential model. The
following fault profile will be used to predict the failure rate in Step 3. Note that even though faults are
measured as integers, fractional values are used for the predicted faults to allow for more accuracy for the
other metrics in Steps 3 to 5.
78
IEEE Std 1633-2016
Figure 44 —Step 2
Step 3—Predict failure rate and MTBF
In the previous step, the defects that are expected to be found over usage time were predicted. However, to
predict a point in time failure rate or a failure rate profile, one needs to establish how many hours the
software will be operating for each monthly interval. The duty cycle is how much the software is operating
on a daily or monthly basis. Some software configuration items may operate continuously while others may
operate infrequently. The goal of this step is not to identify the duty cycle of every individual function but
rather to identify the duty cycle of each software LRU that will appear on the RBD. The checklist of things
to consider when predicting the duty cycle are shown in Table 21:
Table 21 —Checklist for estimating duty cycle

Consideration Examples
Once deployed, will the typical system and therefore software be Refrigerators, missile detection systems,
running continuously? etc.
Once deployed, will the typical system and therefore software be Industrial lighting systems, security
operating as a function of the work week? systems, commercial vehicles, etc.
Once deployed, will the typical system and therefore software be Dishwashers, aircraft, spacecraft, military
operating as a function of a particular mission? vehicles, etc.
Example #1—Duty cycle for a system related to work hours: A manufacturer of commercial lighting
systems knows that the typical customer will have office hours spanning from 7 am to 6 pm Monday
through Friday. The predicted duty cycle is therefore 55 h per week or 232 h per month.
Example #2—Duty cycle for a mission-oriented system: A manufacturer of dishwashers knows that the
average customer for this particular model will run the dishwasher once every day. It is also known that the
dishwasher cycle is 1 h long. Hence the duty cycle estimate is 1 h per day or 30 h per month.
Figure 45 is the checklist for predicting the failure rate, MTBF, MTBSA, and MTBEFF profiles.
a) Consider one “typical” deployed system as opposed to a system that is in development. For example,
hundreds of military vehicles may be deployed but the goal is to estimate the typical duty cycle of one of
those vehicles.
b) Compute the duty cycle as per Table 21.
c) The failure rate is computed by simply dividing the fault profile predicted in the duty cycle for the
software for each month of operation as predicted in the preceding step.
 Predicted λ (month i) = Faults predicted for that month / Ti
 Predicted MTBF(month i) = Ti/ Faults predicted for that month
79
IEEE Std 1633-2016
d) Identify either from industry averages or from past historical data the fraction of faults expected to result
in a system abort, essential function failure, or a critical failure. Adjust the estimated for MTBF by
dividing by these fractions.
 Predicted MTBCF (month i) = MTBF(month i)/fraction of faults expected to be critical
 Predicted MTBSA (month i) = MTBF(month i)/fraction of faults expected to result in a system abort
 Predicted MTBEFF (month i) = MTBF(month i)/fraction of faults expected to result in an essential
function failure.
Where: Ti = operational duty cycle during one month for one instance of the system
e) Since this is an exponential model, the predicted MTBFi is simply the inverse of the predicted failure
rate λi
Figure 45 —Step 3: Predict the failure rate, MTBF, MTBSA, MTBEFF profile
The failure rate predictions take into account defects of every severity level since the defect density
prediction models take into consideration defects of every severity level. However, the model does assume
that every defect is significant enough to ultimately be corrected. The mean time between failures (MTBF)
is a prediction of any failure that is noticeable and ultimately needs to be removed. Those failures can range
from catastrophic to noticeable. Some failures may result in a system abort (the entire system is down for
some period of time). Some failures may result in an essential function failure (a loss of a required system
function but the system is still operating). If one wishes to predict the mean time between critical failure
(MTBCF), mean time between system abort (MTBSA), or mean time between essential function failure
(MTBEFF) one simply adjusts the predicted MTBF by the percentage of total faults that typical result in
those three categories of failure.
Step 4—Predict reliability
Reliability is the probability of success over some specified mission time. Once the failure rate is predicted,
the reliability can be predicted by using Equation (1):
Reliability (mission time) = e(–mission time/MTBCFi) (1)
where mission time is the amount of usage time that corresponds to one mission of the software.
For example, if the system is a dishwasher, a mission would be running the dishwasher one time under
specific conditions. If the system is an aircraft a mission would be one typical flight under specified
conditions.
The MTBCF is the array of values predicted in Step 3. Note that the MTBCF is typically used in the
reliability estimates because typically only the critical defects affect reliability. The practitioner should
verify that the adjustment factor used to predict MTBCF is based on the percentage of faults that affect
reliability. Since the predicted MTBCF is an array of values over the reliability growth period then the
reliability prediction is also an array of values over the reliability growth period. Figure 46 is the checklist
for predicting reliability.
80
IEEE Std 1633-2016
a) Determine the expected mission time for the software. The mission time is not the same as the
duty cycle. The mission time of an aircraft for example would be the average number of hours of
an average flight while the duty cycle would be the total number of flight hours per interval of
time such as a month.
b) Solve for the predicted reliability of that mission time as follows. Note that this is an array of
values over the growth of the software since the failure rate prediction is an array of values of
the growth of the software, as shown in the following formula:
Predicted reliability (mission time) = e(–mission time/MTBCFi)
Figure 46 —Step 4: Predict reliability
Step 5—Predict Availability
Availability can be computed for the software just as it is for any other component. Once the MTBF is
known, the availability for a continuous system can be predicted as shown in Equation (2). Note that this
formula is a limiting form. The actual availability over a short period may be different.
Availability = MTBCFi/ (MTBCFi + MTSWR) (2)
where MTBCFi = array of values computed in Step 3.
Since software does not wear out, MTTR does not apply. However, mean time to software restore
(MTSWR) does apply. MTSWR is computed as a weighted average of the possible restore activities as
shown in Table 22. (See Neufelder [B67].)
Table 22 —Software restore activities

Restore
Description How to compute the time
activity
Restart The application is restarted without The amount of time between when the failure occurs
rebooting the computer and when the software is exactly where it was before the
restart
Reboot Both the hardware and software are The amount of time between when the failure occurs
restarted. and when the software is exactly where it was before the
restart
Downgrade The software needs to be downgraded The amount of time between when the failure occurs
to a prior release because a problem and when the downgrade is complete and the software is
cannot be avoided performing the function that previously failed.
Reinstall The software installation has become The amount of time between when the failure occurs
corrupted so it is reinstalled. and when the downgrade is complete and the software is
performing the function that previously failed.
Workaround The software failure cannot be avoided The amount of time between when the failure occurs
by any of the previous means, and when the end user has completed the workaround.
however, there is an alternative means
for the user to complete the function
that failed.
Corrective The software failure cannot be avoided The amount of time between when the failure occurs
action by any of the previous means. The and when the new software release is installed and the
defect is corrected and a new version is software is performing the function that previously
deployed and installed. failed.
81
IEEE Std 1633-2016
Since the MTBF is an array of values over the reliability growth period then the availability prediction is
also an array of values over the reliability growth period. Figure 47 is the checklist for predicting
availability.
a) Predict the MTSWR using the Table 22.

b) Solve for the predicted availability as follows. Note that this is an array of values over the
growth of the software since the failure rate prediction is an array of values of the growth of the
software, as shown in the following formula:
Availabilityi = MTBFi/ (MTBFi + MTSWR)
Figure 47 —Step 5: Predict availability

For a complete example of a SR prediction see F.3.2.
5.3.2.4 Apply software reliability models with incremental development
If the software is being developed in more than one increment, that will impact how the prediction models
are used. Table 23 lists the considerations pertaining to the incremental or evolutionary models and
guidance for how to apply the prediction models.
Table 23 —Applying the prediction models to evolutionary and incremental development

Considerations with incremental or
evolutionary development How to apply the predictions
If there is an incremental LCM, what do the If the requirements are not evolving in each increment then
increments consist of? Will the requirements be update the size estimates at the end of each increment and
evolving in each of the increments or will the establish one prediction that is updated at each increment. See
requirements be defined up front and the design the following option #1.
and code and test activities evolve over several
increments? If the requirements are evolving in each increment then do a
separate prediction for each increment and then overlay each
increment. See the following option #2.
How many internal releases (releases that are Update the size estimations whenever there is an internal
made available to development and test release
engineers but not to customers) will be made
prior to the final release?
How many external releases (release that is made Each external release is subject to its own prediction
available to at least one customer) will be made
prior to the final release?
What features are planned for each increment? Predict the estimated size of those features in that increment and
then use the prediction models in 5.3.2.3 and 6.2.
The prediction models have one thing in common. They predict the reliability figures of merit for a
particular software release. When there will be incremental development, one has two choices for
predicting the figures of merit as shown in Figure 48.
82
IEEE Std 1633-2016
a) If the requirements are defined at the beginning of the increment and subsequent increments are
design/code/test then
1) Predict the estimated size of the features in a particular increment.
2) For each incremental release, predict the number of defects for each incremental release using
the methods in 5.3.2.3 Steps 1 and 2.
3) Add the predicted defects together from step b).
4) Estimate the duty cycle of a typical operational system as per 5.3.2.3 Step 3.
5) Divide the predicted defects by the predicted duty cycle for an operational system as per
5.3.2.3 Step 3 to yield a predicted failure rate for the final operational release.
b) If the requirements are evolving with each increment then
1) Predict the estimated size of the features in a particular increment.
2) Predict the number of defects for each incremental release using the methods in 5.3.2.3
Steps 1 and 2.
3) Predict when those defects will manifest into faults for each increment and plot this over
calendar time. Each of the fault profiles will generally overlap.
4) Determine the overall fault profile by combining the overlapped defect profiles for each
month of testing or operation.
5) Estimate the duty cycle of a typical operational system as per 5.3.2.3 Step 3.
6) Divide the total predicted defects by the predicted duty cycle for an operational system as per
5.3.2.3 Step 3 to yield a predicted failure rate for the final operational release.
c) Update the size predictions whenever there is an internal release.
d) Update the overall predictions whenever there is an external release.
Figure 48 —Checklist for performing a software reliability assessment and prediction
Refer to F.3.3 for examples of each of the preceding.
5.3.2.5 Use the assessment to qualify a subcontractor, COTS, or FOSS Vendor
This task is applicable whenever the list of LRUs that are applicable for a SR prediction (from 5.1.1.1)
includes an LRU developed by a third party. There are three types of software vendors, as follows:
a) Subcontractor—This is an organization is developing new software with a contractual relationship.

Note that this does not apply to individual software contractors but rather to an entire subcontractor
organization.
b) COTS—Commercial-off-the-shelf software—This software is purchased off the shelf. Usually the
source code is not supplied by the vendor. The vendor does not usually have any contractual
relationship with the development organization.
c) FOSS—Free open source software—This type of software usually has the source code available.
However, the quality of it is often unknown because since there are a variety of individuals
working independently of each other, the development methods and processes are unknown or
often non-existent.
All of the prediction models shown in the previous subclause can be applied to subcontractors, COTS, and
FOSS vendors. From a prediction standpoint the major differences between vendor-supplied software and
COTS software is the availability of the code for size estimates. A vendor generally has a working
83
IEEE Std 1633-2016
agreement with the organization developing the system. That working agreement may provide for exchange
of a software development plan, which typically describes the software development practices that are used
to predict defect density as well as the size estimates that are needed to convert the defect density to
defects, which is then converted to failure rate. If so, any of the models shown in 6.2 are feasible. Each
vendor should have a separate prediction. If the vendor is supplying multiple software configuration items
then there should be a separate prediction for each item. The practitioner should request the software
development and the size estimates in 1000 source lines of code (KSLOC) well in advance to provide for a
timely delivery of data.
The types of software vendors are shown in Table 24.
Table 24 —Types of software vendors

Feasible
Type of Ability to assess Ability to assess the size of the
prediction
vendor development practices software in KSLOC
models
COTS Lookup The development practices of the Generally none. However, the size can be
tables COTS vendor are often unknown estimated from the executable size using
and/or difficult to obtain. the methods shown in B.1.2.
FOSS Lookup The development practices of the Generally the source code is available and
tables FOSS are often unknown and/or hence the KSLOC can be estimated via the
difficult to obtain. methods in B.1.1 for high-level software
and B.1.3 for firmware.
Subcontractor Any Generally the contract provides for Generally the source code is available and
a software development plan as hence the KSLOC can be estimated via the
well as other software deliverables. methods in B.1.1 for high-level software
and B.1.3 for firmware.
Figure 49 is a procedure for qualifying a vendor as well as predicting the reliability of the vendor supplied
software LRUs. The procedure can be tailored based on the criticality and risk of the particular vendor
supplied LRU.
COTS LRUs
COTS are an essential ingredient in enterprise and embedded systems. Examples of COTS include
operating systems and middleware. Establishing a reliability model requires a commitment from the vendor
to provide reliability metrics. Typical COTS manufacturers did not have a vendor relationship because the
software is purchased off the shelf. In some cases the COTS manufacturer may be willing to provide
reliability data for a specific environment and COTS configuration. The following instructions assume that
data is not available.
The difficulty with including COTS in a prediction is that generally the practitioner does not have access to
the following:
a) Effective size of the software

b) Development practices employed to develop it
c) Testing defects, defect rates, or defect profiles
The difficulty is in predicting the number of defects that will be attributed to the COTS component. Once
the defects are predicted the reliability figure of merits can be predicted using 6.2, Annex B, 5.3.2.3 Steps 3
to 5. There are three approaches for predicting the number of defects from a COTS software LRU. An
example of the first approach can be found in F.3.2 step 1. The checklist for assessing SR of COTS LRUs
is shown in Figure 50.
84
IEEE Std 1633-2016
a) Identify all subcontractors, COTS, and FOSS vendors. Include only the vendors that are supplying
software that will be deployed with the system. For example, do not include vendors that are
supplying development tools. Assess as much of the following as possible for each organization:
1) Management—Identify whether the vendor has any process in place to set the SR goals and
SR plans.
2) Organizational structure and development life cycle—Identify whether the roles and
responsibilities clearly defined and understood and is there an approved product development
methodology.
3) Quality system—It is important to determine if there is a culture of continuous process of
product improvement in place and there is closed feedback loop that takes the lessons learned
and builds them back into the development process. Also, how strong is the training program
since it is essential to make sure all members of the design team have similar exposure to the
product development culture. Is there a defect tracking and review process in place?
4) Software development—The goal is to gain confidence that the team follows the software
development process. Determine if there is standardization around tools, RCA, defect
prediction and reduction, overall software robustness, and finally coding standards.
5) Testing—Investigate vendor testing strategy, type of testing is being performed such as unit,
integration, system, and solution testing. Is there a SR demonstration process in place and
how the target failure rate was determined and types of issue found during the system
software testing? Are all the issues being categorized properly in terms of severity? What is
the time line to close the issues and what is their definition of close.
6) Software development processes—Verify that the processes employed match the defined
processes and that those processes are sufficient for delivering reliable software.
b) If the results of step a) indicate deficiencies in management, organization structure and LCM,
quality system, software development, testing and processes, then proceed to steps c) to e) as
applicable and then proceed to the sensitivity analysis in 5.3.7 to determine the overall impact of
the vendor risks on the system prediction. If the vendor supplied software is relatively small in
comparison to the rest of the system it is possible that the vendor is acceptable even with risks. On
the other hand, if the vendor is supplying a relatively large amount of effective code, then
deficiencies may significantly impact the system reliability and hence may require selection of an
alternative vendor.
c) Identify all subcontractor developed components. Apply 5.3.2.2 and 5.3.2.3 to assess each of the
subcontractors and establish a reliability prediction for each subcontractor supplied LRU.
d) Identify all COTS LRUS and vendors. Assess each COTS vendor to establish a reliability
prediction for each COTS LRU.
e) Identify all FOSS LRUSs and vendors. Assess each FOSS vendor to establish a reliability
prediction for each FOSS LRU.
f) The resulting assessments from these predictions are merged into the system model just like the
other software LRUs.
Figure 49 —Checklist to assess the qualification a subcontractor,

COTS, or FOSS vendor
85
IEEE Std 1633-2016
a) Assuming that the vendor development practices are unknown, use the industry or CMMI level
defect density prediction models. Assuming that the vendor’s code size in either effective
1000 source lines of code (EKSLOC) or function points is unknown, estimate the effective size via
the size of the executable using the procedure in B.1. Multiply the predicted defect density by the
predicted EKSLOC to yield the total predicted defects.
OR
b) Predict the defects based on past history with this COTS (i.e., how many defects were found in this
COTS component on a previous similar system?)
OR
c) If the particular COTS has been operational for at least 3 years and there are many installed sites
and the original manufacturer is still in business, it may be feasible to assume that the impact of the
COTS component is negligible.
Figure 50 —Check list for predicting software reliability on COTS components
FOSS LRUs
The primary difference between COTS and FOSS is the vendor’s business model. The COTS vendor
assumes all risks with the product as part of the cost while the FOSS vendor does not. FOSS presents a
unique challenge to organizations. There is no standard LCM or a technical solution that considers quality
of service (or attributes or non-functional requirements).
From a prediction standpoint the major differences between open-sourced software and COTS is an
understanding of the development practices employed on the software. Since the software may be written
by several different people with differing development practices. On the other hand, many times the size of
the software is known for open-sourced software. So, the steps are shown in Figure 51.
a) Has the FOSS component ever been used for a previous system? If so, compute the actual number
of defects (even if it is zero) from that previous system, use that for the prediction and do not
proceed to the next step. Otherwise proceed to step b).
b) If the particular FOSS has been operational for at least 3 years and there are many installed sites
with no known software failures, it may be feasible to assume that the impact of the FOSS
component is negligible. Otherwise go to step c).
c) Assuming that the FOSS development practices are unknown, use the industry level lookup model
to defect density since the application type is probably the only known characteristic.
d) Count the number of KSLOC directly from the code using the methods in B.1.1.
e) Multiply the result of step d) by 0.1 if the FOSS software has been deployed for at least 3 years.
f) Estimate the total number of installed sites for FOSS component. Multiply this by the appropriate
entry from Table B.5.
g) Multiply the result of step f) by the predicted defect density in step c).
Figure 51 —Checklist for predicting software reliability on FOSS components
86
IEEE Std 1633-2016
5.3.3 Sanity check the prediction
This task is recommended if the SR practitioner does not have knowledge of past SR figures of merit to
sanity check the results of 5.3.2. The leader for this task is typically the reliability engineer. The
acquisitions organization may also sanity check the predictions as part of their monitoring and assessment
activity.
Despite the fact that SR is more than 50 years old, hardware reliability prediction has been employed in
industry much longer than SR prediction. That means that engineers have more resources and knowledge
available for sanity checking hardware predictions than for sanity checking software predictions. When a
reliability engineer performs a prediction on a software LRU, the engineer may experience some level of
angst if they do not have some actual reliability numbers to compare it to. If one makes a simple
mathematical mistake is there a way to detect this? If the software has 5 million lines of code, what is a
reasonable prediction?
This will present some typical reliability estimates based on how large the software is and how long it has
been deployed. If the practitioner has actual field data as discussed in 5.6, then that field data can be used
for the sanity check instead of Table 25. Be advised that Table 25 is based on the number of full-time
software engineers working on a particular release. Differences in productivity and product maturity and
inherent product risks can affect the actual MTBF. One should use Table 25 as a relative guideline for
sanity checking. Larger projects will have more faults than smaller projects when everything else is equal.
The number of software engineers working on the software is often a good indicator of the size of the
software.
The MTBF at initial deployment is the MTBF at the point in time in which the development organization
has completed all tests and are deploying the software to the field for the first time. The MTBF after 1 year
of deployment is the MTBF after 1 full year of reliability growth on real operational systems with real end
users and no new feature additions. Table 25 applies to each software LRU. A system may be composed of
several software LRUs so one should apply the sanity checking based on the number of people working on
each software LRU. (See Neufelder [B68].)
87
IEEE Std 1633-2016
Table 25 —Typical MTBF in hours values for various software sized systems
Worst case Average Best case
Size range in MTBF after MTBF after MTBF after
software Worst case Average Best case 1 year of 1 year of 1 year of
people years MTBF at MTBF at MTBF at 24/7 24/7 24/7
(include only initial initial initial operation operation operation
those writing deployment deployment deployment with no new with no new with no new
code) feature feature feature
drops drops drops
One-person
70 2600 18 500 750 7500 52 000
project
Very small,
2–9 software 14 550 3700 150 1500 10 500
people years
Small to
medium,
2 100 625 25 250 1750
10–49 software
people years
Medium,
50–99 software 1 35 250 10 100 700
people years
Large, 100–149
1 25 150 6 60 425
people years
Very large, 200
or more people Very small 15 100 4 40 275
years
NOTE—The people years apply only to those people writing the code. Worst case = deficient development
practices or many project risks, slow growth after deployment. Average case = average development practices with
1 or 2 major risks, average growth after deployment. Best case = superior development practices and no inherent
risks, fast growth after deployment. All columns are shown with the average number of people in that group. Note
that 1 year of operation means that the software is running continually.
Table 26 is a guideline for the percentage of software failures that will result in a reliability or availability
related failure.
Table 26 —Typical severity percentages

Percentage of defects that effect availability and need to be corrected a
From almost 0 to 20%
immediately
Percentage of defects that effect availability but do not need an b
From almost 0 to 65%
immediate corrective action
a
According to Neufelder [B68] average for severity 1 defects = 8%. According to Jones [B42], average = 1%
b
According to Neufelder [B68] average for severity 2 defects = 33%. According to Jones [B42] = 15%
The checklist for sanity checking the prediction is in Figure 52.
88
IEEE Std 1633-2016
For each software LRU in the system:
a) Predict the MTBF for each LRU for the first month of deployment and for 12 months of
operational usage.
b) Identify the approximate number of full-time software engineers who are developing the code for
that LRU. Do not include management, test engineers, SQA engineers.
c) Locate the appropriate row in Table 25 and identify the best, worst, and average case for the MTBF
immediately upon deployment and after 12 months of usage.
d) Compare the predicted values from step a) to the values from step c).
e) If the predicted values are not within the associated best to worst case range, revisit 5.3.2 and verify
that the size, assessment inputs, and reliability growth inputs are valid. Revisit all computations to
verify that the correct units of measure have been used. For example, make sure that size estimates
in terms of SLOC have been properly converted to KSLOC.
f) Compare Table 26 to the estimates for the percentage of faults that impact availability as per Step 3
of 5.3.2.3. If the percentages are not in range, revisit Step 3 of 5.3.2.3 and then recalculate the
reliability and availability predictions in step d) and e).
Figure 52 —Checklist for sanity checking the early prediction
5.3.4 Merge the software reliability predictions into the overall system prediction
This is an essential task if the system is comprised of elements other than software. This task is typically
performed by the reliability engineer. The acquisitions organization will typically review the result of this
task. Once the SR predictions are completed, they are merged with the hardware reliability predictions into
the system reliability model, which is usually a reliability block diagram (RBD). The RBD may have
redundancy, voting algorithms, or dependencies between hardware and software, which are described in
Table 27.
Table 27 —Block diagram configurations

Configuration System reliability calculation
No redundancy If the hardware and software modes of failure are independent, then the system reliability
can be treated as the product of the hardware and SR, and a separate model can be made for
the hardware and software.
Voting algorithms The software is in series with the hardware it supports. The redundant configuration is in
series with a voting algorithm.
Hybrid reliability Complex systems that are comprised of hardware, electronics, and software that are not
model independent require a hybrid reliability model. For the purposes of reliability modeling
hardware and electronics are modeled as hardware with their own individual failure rates.
The steps for merging software predictions into the overall system predictions are shown in Figure 53.
89
IEEE Std 1633-2016
a) Obtain the RBD for the hardware and/or system as well as the hardware reliability predictions
b) Obtain the SR predictions from 5.3.2
c) Add software LRUs to the RBD as per the following instructions.
d) Compute the system reliability as per 5.3.4.
e) Keep the RBD up to date throughout the life cycle and particularly whenever the size predictions or
the predictions from 5.3.2 change.
Figure 53 —Checklist for merging software reliability predictions into

overall system prediction
5.3.4.1 No redundancy
Consider the following example. A railroad boxcar will be automatically identified by scanning its serial
number (written in bar code form) as the car rolls past a major station on a railroad system. Software
compares the number read with a database for match, no match, or partial match. A simplified hardware
graph for the system is given in Figure 54, and the hardware reliability, R(HW), in Equation (3):
R( HW ) = RS × RC × RD × RP (3)
where
RS = predicted reliability of the scanner

RC = predicted reliability of the computer
RD = predicted reliability of the disk storage
RP = predicted reliability of the printer
The hardware and software models are shown in Figure 54 and Figure 55. The reliability equation of the
software is shown in Equation (4):
R(SW) = RSD × RDB × RDS × RCA × RPD (4)
where
RSD = predicted reliability of the scanner decoding

RDB = predicted reliability of the database lookup
RDS = predicted reliability of the data storage
RCA = predicted reliability of the comparison algorithm
RPD = predicted reliability of the printer driver
The system reliability R (SYSTEM) is given as shown in Equation (5):
R(SYSTEM)
= R( HW ) × R( SW ) (5)
90
IEEE Std 1633-2016
Scanner Computer Disk Storage Printer
Figure 54 —Hardware model of a railroad boxcar identification system
Scanner Database Data Comparison Printer

Decoding Lookup Storage Algorithm Driver
Figure 55 —Software model of a railroad boxcar identification system
5.3.4.2 Voting algorithms
In a more complex case, the hardware and software are not independent and a more complex model is
needed. For example, consider a fault tolerant computer system with hardware failure probabilities C1, C2,
C3, the same software on each computer, with failure probabilities SW1″, SW2″, SW3″, respectively, and a
majority voter V (Lakey, Neufelder [B46]). In the example, the voting algorithm V would compare the
outputs of SW1″, SW2″, and SW3″ and declare a “correct” output based on two or three out of three outputs
being equal. If none of the outputs is equal, the default would be used, i.e., SW1″. Refer to Figure 56.
Assume that some of the failures are dependent; for instance a hardware failure in C1 causes a software
failure in SW2″, or a software failure in C1 causes a failure in SW2″. Some software failures [SW″ in
Equation (6)] are independent because this software is common to all computers. Therefore, failures in SW″
are not dependent on failures occurring in the non-common parts of the software. This is shown in Figure
56 and Equation (6) as SW″ in series with the parallel components.
R [(C1× SW 1′′ + C 2 × SW 2′′ + C 3 × SW 3′′) × Voting × SW ′]

= (6)
C1 SW1"
Voting Common
C2 SW2" Algorithm Software
C3 SW3"
Figure 56 —Reliability graph for a fault tolerant computer system
5.3.4.3 Hybrid reliability model
Examples of hybrid reliability models are found in medical, military, and commercial subsystems, systems,
system of systems, and enterprise systems. For example, a navigation subsystem reliability model with
designed redundancy of hardware or electronics (GPSHW1, GPSHW2, ComputerHW3, ComputerHW4,
DataStoreHW1, DataStoreHW2) and software (GPS SW1, GPS SW2, GUIchartingSW1,
GUIchartingSW2) probabilities shown in Figure 57.
Reliability of redundant systems is given by Equation (7):
91
IEEE Std 1633-2016
Rs =−
1 Qs =−
1 (Q1 Q2 ...Qn )
1 (1 − R1 )(1 − R2 )...(1 − Rn ) 
=− (7)
n
1−
= Π(1 − Ri )
i =1
Where Qi is the redundant hardware, electronics, or software within the system:
Rsystem = [1 – (1 – RGPSHW1)(1 – RGPSHW2)] × [1 – (1 – RGPSSW1)(1 – RGPSSW2)] ×

[1 – (1 – RComputerHW1)(1 – RComputerHW2)] × [1 – (1 – RGUIChartingSW1)(1 – RGUIChartingSW2)] ×
[1 – (1 – RDataStoreHW1)(1 – RDataStoreHW2)]
Figure 57 —Reliability block diagram for navigation subsystem

(with designed redundancy)
Another example of a navigation subsystem reliability model without designed redundancy of hardware,
electronics (GPSHW1, ComputerHW3, DataStoreHW1) and software (GPS SW1, GUIchartingSW1)
probabilities are in a series are shown in Figure 58. In a series, the reliability of the system is the
probability that GPS HW succeeds and GPS SW and Computer HW and GUI Charting SW and Data Store
HW and Data Store SW succeeds and all of the other components (hardware, electronics, and software) in
the system succeed. In fact, all n components have to succeed for the system to succeed. A failure of any
component results in failure for the entire system.
In the case of independent components the reliability of the system is then given by Equation (8):
n
Rs = ΠP ( X i ) (8)
i =1
Rsystem = RGPS_HW × RGPS_SW × RComputerHW × RGUIChartingSW × RDataStoreHW × RDataStoreSW
Figure 58 —Reliability block diagram for navigation subsystem without

designed redundancy
5.3.5 Determine an appropriate overall software reliability requirement
This is an essential task. Typically the acquisitions organization provides a system level reliability
requirement. However, if the system is mostly or entirely software, the acquisitions organization has the
primary responsibility of performing this task. Otherwise, the reliability engineer is primarily responsible
for this task. The software manager(s) and software quality assurance and test engineer should review the
SR requirement as they will be responsible for developing and testing to that requirement.
The steps for determining an appropriate reliability requirement are shown in Figure 59:
92
IEEE Std 1633-2016
a) Obtain the system reliability objective as per 5.3.1.

b) If failure rate, MTBF, or MTBCF is a chosen figure of merit, use the procedures in 5.3.5.1.
c) If “reliability” is a chosen figure of merit, use the procedures in 5.3.5.1 and 5.3.5.2.
d) If “availability” is a chosen figure of merit, use the procedures in 5.3.5.1 and 5.3.5.3.
Figure 59 —Checklist for determining an appropriate reliability requirement
5.3.5.1 Establish software MTBF/MTBCF objective
This subclause determines what portion of an MTBF or MTBCF objective is applicable for the software
LRUs. It is assumed that the practitioner has the system MTBF objective. There are several methods to
determine what portion of the system MTBF or MTBCF is applicable to software, thereby providing a SR
objective for all software LRUs. The methods, benefits, and disadvantages are shown in Table 28.
Table 28 —Summary of allocation methods

Benefits/
Method Description Formula
disadvantages
Allocation by Each component’s allocation is Easy to compute. a) Count up $ planned for software
$ based on how much R&D funding Reduces the possibility development.
(Lakey, is required to develop that that software will be b) Count up $ planned for hardware
Neufelder component. allocated zero percent development.
[B46]) of the system objective. c) Add a) and b) to yield total cost
planned
d) Divide a/c. That is the portion
allocated to all software LRUs.
e) Divide b/c. That is the portion
allocated to all hardware
components.
Allocation by Assumes for a series of “n” a) Count up number of software
equal subsystems each are allocated the CSCIs
apportionment same reliability requirement goal. b) Count up number of hardware
(Lakey, HWCIs
Neufelder c) Add a) and b) to yield total
[B46]) number of configuration items
d) Each HWCI and CSCI gets 1/c
as an allocation.
Past failure Assumes that the failure rate of a If data is available from a) Determine the past failure rate
contribution component is relatively equal to field for similar system of only the software.
(Lakey, what it was on a past similar this is easy to compute. b) Determine the past failure rate
Neufelder system. However, it does not of only the hardware.
[B46]) take into consideration c) Add together a) and b).
that systems can change d) The software receives an
with respect to how allocation that is equal to the
much software is in result of a/c.
them.
Allocation by Uses the predicted failure rate of More accurate than a) Perform a reliability prediction
achievable each component to determine its preceding items because one every hardware and
failure rate allocation it takes into software LRU in the system.
(Lakey, consideration each b) Add all predictions together.
Neufelder configuration item’s c) Each configuration item
[B46]) relative contribution to receives an allocation that is
the system. equal to a/b.
93
IEEE Std 1633-2016
Table 28—Summary of allocation methods (continued)
Benefits/
Method Description Formula
disadvantages
ARINC Assumes series subsystems with Requires an analysis of See reference.
apportion- constant failure rates, such that any each subsystems OP.
ment subsystem failure causes system
techniques failure and that subsystem mission
(Von Alven time is equal to system mission
[B95]) time.
Feasibility of Subsystem allocation factors are Method of allocating See reference.
objectives computed as a function of Delphi reliability without repair
technique numerical ratings of system for mechanical-
(Eng. Des. intricacy, state of the art, electrical systems.
[B15]) performance time, and
environmental conditions.
Minimization Assumes a system comprised of n Considers minimization See reference.
of effort subsystems in series. The of total effort expended
algorithm reliability of each subsystem is to meet system
(MIL-HDBK- measured at the present stage of reliability requirements.
338B [B54]) development, and apportions
reliability such that greater
reliability improvement is
demanded of the lower reliability
subsystems.
The steps for determining an appropriate software MTBF/MTBCF from a system MTBF/MTBCF are
shown in Figure 60.
a) Choose a method from Table 28.

b) Identify the system MTBF or MTBCF, which is usually established by a customer, marketing, or
systems engineering as per 5.3.1.
c) Use the method chosen from step a) to identify the percentage of total failures expected to be due to
the software.
d) Apply that percentage to the objective failure rate as follows:
% failures expected in software/objective MTBF = objective software MTBF
% failures expected in software that are critical/objective MTBCF = objective software MTBCF
e) The system reliability objective can now be met by the software by meeting the objective software
MTBF.
Figure 60 —Checklist for determining an appropriate MTBF/MTBCF objective
5.3.5.2 Establish a software reliability objective
The steps for establishing a SR objective are shown in Figure 61.
94
IEEE Std 1633-2016
a) Identify the system reliability objective from 5.3.1.

b) Determine what the typical mission time is for this system. Mission time and duty cycle are
different metrics. Reliability is the probability of the system operating for the mission time.
c) Determine from the result of step b) what the required system MTBF is by solving:
(–mission time/required system MTBF)
Reliability objective = e
d) Execute the steps in 5.3.5.1 and determine the objective software MTBF/MBCF.
e) The system reliability objective can now be met by the software by meeting the objective software
MTBF from step d).
Figure 61 —Checklist for establishing a software reliability objective
Figure 62 provides an example breakdown of system reliability requirement and supporting documentation.
Figure 62 —Reliability specification example

where
“Mission” is defined in the specification

“Failure” is defined in the FDSC
5.3.5.3 Establish an availability objective
The steps for establishing a software availability objective are shown in Figure 63.
95
IEEE Std 1633-2016
a) Identify the system availability objective

b) Determine what the typical/average MTTR is for the hardware.
c) Determine what the typical MTSWR is for the software as per 5.3.2.3 Step 5.
d) Execute the steps in 5.3.5.1 and determine the percentage of the total system that pertains to the
software
e) Compute the weighted average of the hardware MTTR prediction and the software MTSWR by
applying the percentage from step d).
f) From steps a) and b) determine what the system MTBF objective is to meet the availability
objective.
Availability of system = (MTBFsystem/MTBFsystem + weighted average of MTTR and MTSWR)
g) Apply that percentage to the calculated objective failure rate as follows:

h) Objective software failure rate = % failures expected in software/Objective MTBF
i) Objective software MTBF = 1/ objective software failure rate
j) The system availability objective can now be met by the software by meeting the objective
software MTBF from step h).
Figure 63 —Checklist for determining an appropriate availability objective
Figure 64 provides an example of a breakdown of a system availability requirements and its supporting
documentation.
Figure 64 —Availability specifications example

where
“Failure” is defined in the FDSC
Mean time to repair (MTTR) applies to hardware. Mean time to software restore (MTSWR) applies to
software. The appropriate restore activities for software include reboot, reload, workaround, and restart. It
can also include correcting the underlying defect and updating the software. In that case, the new software
96
IEEE Std 1633-2016
is not identical to the existing software. The weighted average of the MTTR and MTSWR times comprises
the system restore time, which is at the top of the diagram.
5.3.5.4 Verify that the software objective is feasible
It is not uncommon for a system specification to be applied to the software such that overall software
objective cannot be met by any SR prediction model. This can happen when:
 The software LRUs are assumed to be a negligible part of the system.

 The software is not included in the original system reliability specification.
 The software allocation is based on a past system that had relatively less software.
 It is assumed that the software will have more reliability growth than what is actually achievable.
The preceding issues can be avoided by employing the steps shown in 5.3.1 to identify the initial objective
and then to revisit as the software and system design evolve.
5.3.6 Plan the reliability growth
This is an essential task. The primary responsibility for performing the task is software management. The
reliability engineer, however, uses the reliability growth to predict the SR. Reliability growth is one of the
most sensitive parameters in the SRE prediction models. Hence, the reliability engineer should not make
any assumptions about it without software management input.
Complex software intensive systems should have a separate SR Growth Plan in addition to a Reliability
Growth Plan. The plan should consist of feasible and executable steps that can be easily verified and
controlled. It should also provide a closed-loop management mechanism that relies on objective
measurements. The steps for planning the SR growth are shown in Figure 65.
a) Obtain the SR predictions from 5.3.2.3.

b) Obtain the overall system schedule as well as the software development schedule.
c) Reliability growth takes place whenever the software is operating on the target hardware or
equipment in an operational manner without any new feature drops AND the software defects that
effect reliability are removed. As long as any updates are to correct defects or perfect the existing
code then there can be reliability growth. Identify when the software will have future feature
releases. The time between feature releases represents the maximum calendar time that reliability
growth can transpire.
d) Review the predictions established in the steps in 5.3.2.3 and identify the MTBF, MTBCF,
availability, and reliability that correspond with the point in time in which the next feature release
will be made. These figures of merit are now considered to be the best case scenario for that period
of time. The average of these figures of merit is the average case scenario for that period of time.
e) If the results of step d) do not meet the reliability goal established in 5.3.1 then either the reliability
growth needs to be improved upon so that there is more duty cycle or a sensitivity analysis is
needed to determine how the objective can be made with the currently schedule growth period as
per step 5.3.7.
f) If the goal is to increase reliability, it is important that different inputs be exercised and tested as
discussed in 5.4.1. Specifically:
 Verify that there will be sufficient code coverage at the subsystem level prior to integration
with the hardware. See 5.3.9.2 on code coverage.
97
IEEE Std 1633-2016
 Verify that integration tests will be performed on all developmental subsystems (in prioritized
fashion) to verify the functionality of the subsystem and to confirm reliability and robustness
of subsystems in the presence of noise factors.
 Verify that memory management will be exercised in the presence of noise factors, including
human factors, in order to verify lack of the memory leaks and other memory related defects
in the software code.
 Verify that software will be operated for extended periods of time without a reboot or restart.
 Verify that SR testing at the system level is an integral part of the testing for early prototypes
as well as the production systems. Given that the operational environment provides potential
important root causes of failures, functionality should be tested in all potential scenarios and
environment. Extend the duty cycle of subsystems during system level testing to verify the
robustness of the software in accordance with the system OP.
g) Verify that there will not be any new feature drops during the reliability growth phase as new
features “reset” the reliability growth. If a feature drop is scheduled during a reliability growth
period, communicate the effect on the reliability growth to management.
h) Verify whether defects identified from prior software releases will impact the reliability growth due
to defect pileup that spills into this release. See 5.1.3.4 and 5.3.2.3 Step 2.
i) Whenever there is a schedule slip or change for software, verify that the reliability growth plans are
still valid and adjust the duty cycle as needed to compensate for the shortened reliability growth
period.
Figure 65 —Checklist for planning the reliability growth
5.3.7 Perform a sensitivity analysis
This is a project specific task. If the software is in an early development phase, this task is relevant.
However, if the software has already been developed this task can provide value for future development
releases. The sensitivity analysis is performed whenever it is apparent that the overall system reliability
objective will not likely be met or when it has been decided that the software defects deployed need to be
reduced for purposes of reducing maintenance labor or defect pile up. The sensitivity analysis for the
software LRUs is performed by the software management. However, the sensitivity analysis at the system
level is performed by the reliability engineer since it requires knowledge of the overall system RBD.
A sensitivity analysis is when each of the inputs to the SR predictions is analyzed based on the best and
worst case. Sometimes changes to the software, the organization structure, development practices, spacing
between releases can have a positive impact on the reliability. Some changes may be costly and timely
while others may be simple and relatively inexpensive.
Sensitivity analysis is conducted on each software LRU and then at the system level. The methods in
5.3.7.1 can be executed as early as the completion of the SR predictions in 5.3.2.3. The methods in 5.3.7.2
can be executed once the SR predictions for each LRU are placed on the RBD as per 5.3.4. See F.3.5 for an
example.
5.3.7.1 Perform a sensitivity analysis of each software LRU
SR prediction is highly sensitive to the following factors. For each factor, the practitioner should consider
how this factor can be reasonably and realistically adjusted so as to have a positive effect on the predicted
MTBF.
98
IEEE Std 1633-2016
a) Effective size—The effective size is a function of the new and modified code. As discussed in B.1,
new code is 100% effective while modified code is less effective and reused and COTS code is the
least effective. Effective, in this sense, means “subject to newly introduced defects.” Typically,
individual engineers do not have the authority to decrease the functionality of a system (which in
turn decreases the code size, which in turn decreases the total defects, which in turn increase
MTBF). However, the SRE practitioner can assess trade-offs and improvement scenarios for
reducing the effective size. One of the most effective ways to reduce EKSLOC is to “code a little
and test a little” instead of writing all of the code in one massive release. Refer to the example in
F.3.3. This example contrasts what happens if all of the code is written in one release versus spread
across three sequential releases. The difference in the resulting MTBF for the three small releases is
substantially better than one big release despite the fact that the total code written is the same and
the deadline for finishing the code is the same.
b) Reliability growth—This is operational time (in an operational environment) with no new feature
drops. As discussed earlier in this document, reliability growth is not unbounded for software. Once
there is a feature drop, the reliability figures of merit reset as a function of the increased feature
size. If one predicts the defect profile and the defect pileup for multiple releases as per the
document, one can determine if the releases are too close together. That is the first step in planning
for adequate reliability growth at the beginning of the project while there is still time to do so.
c) Inherent product risks discussed in 5.1.3—Research shows that the possibility of a failed project
increases with the number of these risks that are present on a particular software release.
Identifying those risks up front as early as possible is the best way to avoid a failed project. Several
of the risks cannot be avoided; however, it is often possible to spread the risks across different
releases. For example, an organization made the decision to design new hardware components, new
software environment, a new graphical user interface (GUI), and new mathematical models all in
one release. This many risks in one release is historically linked to a failed project. Instead of
having four major risks in one release, they can split the risks up into a few smaller releases.
Identifying the inherent risks before they cause a failed project is one purpose of the sensitivity
analysis. Identifying the risks in 5.1.3 does not necessarily mean that the project will be successful,
but avoiding a failure is the first step towards improvement.
d) Defect density reduction—According to the research conducted as per B.2.1, the difference
between the best observed defect density and the worst is a factor of approximately 1000. This
range makes defect density a key improvement area. Subclause 6.2.1.1 has the very short and quick
models for reliability engineers. The 22 parameters of the shortcut model are parameters that a
reliability engineer or someone in acquisitions usually has access to. Those 22 parameters do
provide some limited sensitivity analysis mainly because they contain several of the risks in 5.1.3.
Several of the models in B.2 have very detailed sensitivity analysis capabilities. The key parameters
of those detailed models are summarized in B.3. Subclause B.3 shows which parameters appear in
the most number of SR assessment models from B.2. The table in B.3 is a starting point for anyone
to understand what development characteristics have already been correlated to fewer defects. The
detailed models in B.2 provide the trade-off scenario capabilities for minimizing the defect density.
Table 29 shows the four key sensitive SRE prediction parameters. With regards to the predicted MTBF, it
illustrates the mathematical relationship, which is also shown in 5.3.2.3. The third column illustrates how
this parameter can be adjusted so as to improve the MTBF without reducing features or capability. One
easy way to improve MTBF is to reduce features or test for a long time. This table provides for
improvements that do not require a reduction in features or an infeasible amount of testing time. The final
column indicates how this parameter can be predicted inaccurately. If the parameters are assessed regularly
during development, the predictive models are more able to reflect the true development characteristics.
The relationship between each of these parameters such as size, reliability growth, product risks and defect
density, and reliability is shown in the formulas in 5.3.2.3 Steps 1 through 5. The sensitivity analysis at the
LRU level identifies which development practices have the potential to reduce the defects predicted to
occur in operation.
99
IEEE Std 1633-2016
Table 29 —Sensitivity analysis recommendations

How this parameter can be
adjusted to improve the MTBF Common mistakes made when
Parameter Sensitivity
without reducing features or measuring this parameter
capability
Effective Inverse linear—  Incremental testing—Code a  Code that is thought to be
size Cutting effective little, test a little can reduce reusable is not.
size in half doubles effective size by as much as  Size increases unexpectedly
MTBF. 50% for a longer or bigger and impacts SR prediction.
project.  COTS supplied code does not
 Using more proven COTS perform functions as planned
whenever possible so new code has to be written.
 Development practices that  Scope of software changes
avoid “copy and paste” more often than planned.
techniques.
 Keep size estimates up to date
until code is complete
Reliability Exponential—  Make sure that the program Faulty assumptions that SR grows
growth Adding feature schedule for reliability growth indefinitely.
drops when the reflects the expected feature
reliability is drops If the schedule for the software slips
scheduled to grow  Tracking progress against but not the hardware, the reliability
can mean missing schedule on a regular and growth of the software will be
prediction by more consistent basis impacted.
than linear amount
Inherent Linear Split the risks up into smaller See 5.1.3.
product risks releases instead of having one big
release with many risks.
Defect Inverse linear— Transitioning from one assessment  Key people may leave during
density— Reducing defect level to another. A few examples the project.
Development density by half from B.2 and B.3 include:  Processes can change mid-
practices, doubles MTBF  Increasing domain expertise stream.
personnel,  Tracking progress against
process, schedule on a regular and The preceding is why the assessment
domain consistent basis should be updated at each major
experience,  Identifying problematic code milestone.
etc. and mitigating it before the
project is late
The steps for conducting a sensitivity analysis at the software LRU level are shown in Figure 66.
100
IEEE Std 1633-2016
a) Obtain the SR predictions from the steps in 5.3.2.3.

b) Review Table 29.
1) Identify whether incremental development can reduce the effective size.
2) Identify whether there is significant copy and paste code that inflates the code size.
3) Identify software LRUs that are candidates for COTS, FOSS, or other reuse that would reduce
the effective size.
4) Identify COTS, FOSS, or other vendor supplied components that have a significant
contribution to the overall system prediction. Examples of these are vendor supplied
components that have a relatively large predicted effective size and a relatively large
predicted defect density. It may be necessary to select an alternative vendor in the event that a
particular vendor-supplied LRU is significantly contributing to the overall system failure rate
prediction.
c) Determine the following “what if” scenarios for effective size reduction
1) Determine “what if” scenarios for size by adjusting the size of the LRUs so as to push less
urgent features into a future increment. Recompute the predictions for the current and future
releases using the assumption and compare to the original prediction. Sometimes pushing
functionality into a future release can result in “defect pileup.” Verify that the “what if”
scenario does not result in defect pileup for future releases.
2) Determine “what if” scenarios for size by adjusting the effective size for reduction of any
copy and paste code. Note that removing copy and paste code may impact the design and the
schedule. Recompute the predictions with the new sizes and compare to the original
prediction.
3) Determine “what if” scenarios for size by computing the effective size of any applicable
COTS, FOSS, or other reused code. Recompute the predictions using these size estimates and
compare to the original prediction.
d) Determine “what if” scenarios for the reliability growth estimates. Remember that reliability
growth “resets” whenever new features are added to the software code. So, stretching the calendar
time for the reliability growth is probably not feasible. It may be feasible to add additional shifts or
equipment. Also, remember that making corrective actions improves reliability growth if the
corrective action is implemented properly. So, one scenario for reliability improvement is to
alternate between new feature releases and corrective action only releases. Recompute the
predictions with the new assumptions and compare to the original prediction.
e) Identify the inherent risks that cannot be changed for this product/release. Identify which of these
might be lifted in future releases. For example, if this is the first version of software for brand new
hardware, the next release of software will not be new and eventually the hardware will not be new
either. It may be possible to push less important software features to a future release that has fewer
inherent risks. If applicable, recompute the predictions with the new assumptions and compare to
the original prediction.
f) Employ a survey based assessment for predicting software defect density. Several of the SR
prediction models discussed in 5.3.2.3, 6.2, and B.2 were developed for the purpose of supporting
sensitivity analysis. All SR prediction models that employ surveys can be used for sensitivity
analysis as follows:
1) Answer the survey based on what is defined in the Software Development Plan (SDP)
2) Identify all survey questions that were not answered affirmatively. These are gaps.
101
IEEE Std 1633-2016
3) Identify which gaps can be addressed and which are inherent risks. The gaps that are inherent
risks cannot be changed and are removed from the sensitivity analysis.
4) Starting from the list of gaps that are resolvable, change one gap at a time to be affirmative
and review the result of the revised or “what if” prediction.
5) Note the relative decrease in the defect density of resolving that particular gap.
6) When all gaps have been analyzed rank each gap in order of biggest to smallest decrease in
defect density.
7) Review the gaps at the top of the list and determine any prerequisites that should be in place
to resolve that gap. Prerequisites can include other development practices, hiring personnel,
buying tools, training, etc. Remove items from the list that have prerequisites that cannot be
addressed feasibly on this project. Examples of gaps that reduce the predicted defect density
include but are not limited to:
 Maximizing the end user domain experience of the software development staff
 Having incomplete requirements
 Monitoring progress against schedule
 Reviewing the requirements, design, code and test plan
 Using appropriate tools in an appropriate manner
 Proper planning
 Proper change management and defect tracking
 Efficient techniques for developing requirements, design, code, and test cases
 Avoiding obsolete technologies
 Use of incremental development methods
g) Review all of the scenarios from steps c) to f). Estimate the relative cost of each scenario on the
list.
h) Rank the results of step g) based from highest effectiveness and lowest cost to lowest effectiveness
and highest cost.
i) Discuss the results of step h) with appropriate software management. Ideally the reliability engineer
is a key member of the software engineering team and this is specified in the SRPP.
Figure 66 —Checklist for performing a sensitivity analysis
5.3.7.2 Perform sensitivity analysis of the software LRUs effect on system reliability
Once the software has been analyzed from the software LRU perspective the next step is to analyze it from
a system perspective. The practitioner should review how the software fits on the system RBD. The
software LRUs should be designed to align cohesively with the subsystem hardware that they support.
Assume that the system is an automobile. One would expect to find GPS software, security software, rear
camera software, software to control a convertible or retractable top, etc. What one would not expect to
find is that all of this software is packaged in one large executable. If that is the case then it is possible that
a failure in the GPS software for example could affect the entire automobile instead of just the GPS.
Figure 67 shows one large software LRU supporting multiple hardware subsystems (on left) compared to
subsystems that each have their own associated software (right).
102
IEEE Std 1633-2016
Subsystem X software
Subsystem X Subsystem X
LRU
One large Subsystem Y software
Subsystem Y Subsystem Y
LRU
Software LRU
Subsystem Z software
Subsystem Z Subsystem Z
LRU
NOTE—Copyright Softrel, LLC reused with permission of Ann Marie Neufelder.
Figure 67 —One large software configuration item supporting multiple hardware components
versus one software configuration item for each hardware component
The practitioner should also look for SR predictions that are unusually large compared to the value of the
software. For example, if the least critical software LRU has the lowest predicted MTBF, one should
consider whether it needs to be in the system at all or possibly if that component can be purchased
commercially. Table 30 summarizes the sensitivity of the software LRUs with respect to the system
reliability.
Table 30 —Sensitivity of system prediction

Sensitivity How this can effect system How this can be adjusted or
Parameter
to reliability reliability monitored to improve reliability
Cohesive separation Linear One software LRU supports Re-architect LRUs so that they are
of LRUs by multiple independent hardware functionally cohesive and support the
function and HWCIs or LRUs perform hardware architecture
hardware supported multiple unrelated functions.
A few LRUs are Linear Less critical software functions Decide whether the less important
affecting the overall are bigger in size than more functions can be removed or replaced
prediction critical software functions. with commercially available options
The procedure for analyzing sensitivity at the system level is in Figure 68.
a) Review the overall RBD from step 5.3.4, which includes both the hardware and software LRUs.
b) Identify any software LRUs that are supporting more than one hardware configuration item.
c) Determine a “what if” scenario if any offending LRUs from step b) were to be redesigned to be
cohesive. Cohesive means that the software LRUs are performing multiple unrelated features or
functions or are supporting multiple unrelated hardware. Remember that this will result in a
redesign that could affect both schedule and cost.
d) Rank each of the software LRUs in order of importance.
e) Rank each of the software LRUs in order of decreasing effective size.
f) Identify any components that are relatively large in effective size but relatively less important.
g) Identify any vendor supplied LRUs that are relatively large in predicted effective size and predicted
defect density. These LRUs may require an alternative vendor.
h) Determine “what if” scenarios by assuming that the offending software LRUs from step f) are
either removed or replaced with commercially available LRUs. Recompute the predictions for the
current and future releases using the assumption and compare to the original prediction.
i) Review all of the scenarios from the preceding steps. Estimate the relative cost of each scenario on
the list.
103
IEEE Std 1633-2016
j) Rank the result of step i) based from highest effectiveness and lowest cost to lowest effectiveness
and highest cost
k) Discuss the results of step j), and in particular, the scenario with the highest effectiveness and
lowest cost, with appropriate software management.
Figure 68 —Sensitivity of system prediction
5.3.8 Allocate the required reliability to the software LRUs
This is a typical SRE task. The primary responsibility for this task is the reliability engineer. In 5.3.4 the
overall SR objective is predicted. The software organization(s) can collectively work towards that overall
SR objective. However, if there are many software LRUs and/or if the there are multiple software
organizations and/or if each software LRU is a dramatically different size then it is recommended to
allocate the required SR objective down to the individual software LRUs.
The allocation of system reliability involves solving the basic inequality as follows:
 f(R1 , R2 , ... , Rn ) ≥R*
 i is the allocation reliability parameter for the ith subsystem
 R* is the system reliability requirement parameter
 f is the functional relationship between subsystem and system reliability
For a simple series system in which the Rs represent probability of survival for t hours use Equation (9):
R1(t) • R2(t) ... • R n(t) ≥R*(t) (9)
As discussed in the previous subclauses, there are several reliability figures of merit. Software reliability
parameters may be probabilities, or software failure rates, or MTBF. These figures of merit could become
SR requirements, specified in requirements documents. They could also be used for SR predictions, or
demonstrated SR, or for SR allocations, to name a few forms or uses.
The allocated reliability for a simple subsystem of demonstrated high reliability should be greater than for a
complex subsystem whose observed reliability has been historically low. The allocation process is
approximate. The reliability parameters allocated to the subsystems are used as guidelines to determine
design feasibility. If the allocated reliability for a specific subsystem cannot be achieved with the current
plan, then the system design is to be modified and the reliability allocations redistributed. This procedure is
repeated until an allocation is achieved that satisfies the system level requirement, within all constraints,
and results in subsystems that can be designed within the state of the art. In the event that it is found that,
even with reallocation, some of the individual subsystem requirements cannot be met within the current
state of the art, the engineer can use one or more of the strategies outlined in 5.3.7.
The allocation process can, in turn, be performed at each of the lower levels of the system, subsystem
hierarchy, for hardware and electronic equipment and components and software modules. SR allocations
are derived and decomposed from requirements from the top level of the hierarchy down to the lowest level
where software specifications are written. These allocations are apportioned to the blocks of the model. The
software model is updated with SR predictions based on engineering analysis of the design to initially
validate the model. The SR predictions are then updated with actual test and field SR data to further
validate the model. Where software requirements do not include reliability, a reliability analysis and
allocation report may be prepared that shows how SR estimates and goals are set for the elements of the
software design hierarchy.
Functional complexity, effective software size, and hardware counts are some considerations in a typical
allocation process. In some cases, the process is iterative, requiring several attempts to satisfy allocating all
104
IEEE Std 1633-2016
requirements. The requirements may require alternative design modifications to meet the reliability. In
other cases, when requirements cannot be satisfied since components are needed with unattainable levels of
reliability, trade-off discussions with the customer may be required. The hardware and software LRU
allocations can be derived from the top-level specification or the top-level specification can be defined by
the achievable failure rates of each of the system components. The former is a top-down approach while the
latter is a bottom-up approach as shown in Figure 69.
Component 1 Component 1
Reliability Reliability
Requirement Prediction
0.9999 0.9900
System Component 2 Component 2 System

Reliability Reliability Reliability Reliability
Allocation Allocation
Requirement Requirement Prediction Prediction
Model Model
0.9950 0.9567 0.9500 0.9800
• •
• •
• •
Component n Component n
Requirement Prediction
0.9789 0.9999
Top-Down Allocation Bottom-Up Derivation
Figure 69 —Top-down versus bottom-up allocation

Top-down allocation of reliability requirements starts with the top-level system specifications to assure that
once quantitative system specifications have been determined, they are allocated or apportioned to lower
levels of subsystems, components, and modules. During the system engineering process the top-level
system requirements are allocated to the subsystem levels. Then the subsystem allocation is allocated to the
software configuration items or LRUs.
Alternatively, SR allocations may not begin with SR requirements, so that the requirements at the system
level could be derived from the SR allocations. This type of SR allocation analysis is a bottom-up SR
allocation. Figure 70 is the checklist for allocating the required SR to the software LRUs.
a) Select one of the top-down allocation approaches in 5.3.8.1.

b) Complete the bottom-up allocation as per 5.3.8.2.
c) Iterate as needed until there is an allocation for each software LRU that meets the system objective
but is also achievable for each software LRU.
Figure 70 —Checklist for allocating the required software reliability to the software LRUs
5.3.8.1 Top-down allocation
Once the system specification is defined there are two top-down alternatives to allocate that requirement to
the software LRUs. The first approach is to allocate from the system specification to the subsystems and
then directly to each component in the system without regard for whether it is hardware or software. There
is a tendency in industry to allocate the system specification to the hardware and software separately,
largely because the hardware and software engineering groups are often in separate engineering groups.
However, for systems that are composed of several subsystems containing both hardware and software it
may be preferred to allocate directly from the system to each of the subsystems and then to each
component. In this way the software does not have an allocation, but rather each of the software LRUs have
an allocation. This approach is shown in Figure 71.
105
IEEE Std 1633-2016
Figure 71 —Allocation from system to subsystem to LRU

The second approach is to allocate the system specification to the hardware as a whole and to the software
as a whole. The software LRU allocation would then be derived from the allocation to the software as a
whole and similarly the hardware LRU allocation would then be derived from the allocation to the
hardware as a whole. This allocation method is not as preferred as the preceding method because the
software and hardware engineers may not interact and therefore the predictions may not be kept as up to
date as if the preceding method is used. Additionally, with the second method the software as a whole may
end up with a proportionately smaller portion of the allocation then it should. This approach is shown in
Figure 72.
.
Figure 72 —Allocation from system to HW or SW and then to HW or SW LRU
For either of the preceding allocation methods, the allocations at the software LRU level may be developed
from the effective size of each software LRU such that the software LRUs with the most new code are
weighted the heaviest. There are also other software design metrics, besides effective KSLOC, that are
valuable for SR analysis to be used for allocations. Other metrics such as the expected duty cycle or the
size of each software LRU can be used to determine the allocation. See F.3.4 for an example of each.
106
IEEE Std 1633-2016
5.3.8.2 Bottom-up allocation
The bottom-up allocation process involves predicting the reliability figures of merit for each software and
hardware LRU in the system. These predictions are then combined via a system level RBD to yield a
system level reliability figure of merit. The bottom-up allocation allows for each component to be allocated
a relatively accurate portion of the system specification. In the event that the system level prediction is not
acceptable for the mission or customer, each component of the system is provided with a reliability goal
that is proportional to its contribution to the system. The bottom-up allocation can also be used to derive the
system level specification is none exists.
5.3.9 Employ software reliability metrics for transition to software system testing
This is a typical SRE task. The primary responsibility for this task is the software management and
software organization. The software quality assurance and test personnel are also involved in measuring the
traceability between test cases and requirements.
The reliability predictions discussed in 5.3.2 through 5.3.8 are established for the purposes of long-term
planning of reliability growth, maintenance staffing, and resources. There are other metrics that are useful
for determining when to transition to software system level testing. Table 31 summarizes the metrics that
can be used to transition to the testing activity. The following metrics can be used regardless of whether
there is a waterfall, incremental, or evolutionary LCM. Employing the following metrics reduces the
number of blocked tests that are encountered by software system testers. Blocked tests generally waste both
calendar and work hours.
Table 31 —Metrics that can be used during requirements, design, and construction
Software metrics Definition Typical goal prior to transition to testing
Requirements traceability Degree to which the All requirements for this increment can be traced
requirements have been met by to the requirements, the architecture and the test
the architecture, code, and test cases. See 5.3.9.1.
cases
Structural coverage Degree to which the lines of See 5.3.9.2.
code, paths, and data have been
tested
5.3.9.1 Requirements traceability
Traceability is defined as the degree to which the requirements have been met by the architecture, code, and
tests (IEEE Std 610™-1990 [B31]). The requirements traceability (RT) metrics aid in identifying
requirements that are either missing from, or in addition to, the original requirements. RT is defined as
shown in Equation (10):
RT = (R_1/R_2) ×100 (10)
where
RT is the value of the measure requirements traceability

R_1 is the number of requirements met by architecture, code, or unit test cases
R_2 is the number of total requirements
5.3.9.2 Structural coverage
During implementation, the software engineers should be testing their own code prior to integration with
other software, software testing, system integration, and systems testing. Structural coverage is the degree
that the design and code that has been covered via structural or clear box testing. Structural or clear box
107
IEEE Std 1633-2016
testing is done from the software engineer’s point of view. That means that the tester needs to have full
view of the code in order to do the testing. Structural coverage testing is usually performed with the help of
automated tools that identify the test cases and calculate the coverage results when the tests are run.
Software engineers perform a certain amount of testing prior to software testing and system testing. Their
testing can be more effective if they know how much control and data they are covering with their tests.
The following coverage metrics assist in demonstrating that the code is tested adequately prior to testing
from a system or end-user point of view. The defects found during structural testing can be difficult to
instigate or discover during operational systems testing. The defects found during structural testing are
often far less expensive to fix during development than in later phases of testing.
The structure of the software can be tested by control flow coverage and boundary value coverage. There
are five approaches to control flow testing that range in effort and time required to test. As discussed
following, statement coverage is the bare minimum needed to cover the control flow while multiple
condition decision coverage (MCDC) (Hayhurst et al. [B27]) covers every combination of the control flow
and conditions. Flight controls systems, for example, are required to have MCDC coverage for FAA
certification (RTCA [B73]). Boundary value coverage covers the range of inputs such that there is a
minimal set of inputs that cover the possible data spaces. All of the following tests can be identified and
executed manually, however, on even medium-sized software projects, an automated tool will be a
necessity for achieving the coverage in a reasonable amount of time. Table 32 illustrates the types of code
coverage, when they are used, and the output of the coverage metric.
Table 32 —Code coverage metrics

Type of test Purpose/benefits Coverage output
Control flow coverage
Statement To execute each line of code at least once during testing Percentage of the lines of executable
coverage code in the application that are executed
by the tests.
Decision While statement coverage verifies that lines of code are Percentage of branches in logic tested
coverage tested, decision coverage provides for coverage of logic. including if-then, if-then-else
(McCabe statements, case statements, loops, etc.
[B53])
Condition Each Boolean subexpression has been tested as both true Percentage of Boolean subexpressions
coverage and false. that have been tested as both true and
false.
Condition/ Each Boolean subexpression has been tested as both true Percentage of Boolean subexpressions
decision and false AND each path has been covered. that have been tested as both true and
coverage false and percentage of statements
covered.
Modified MC/DC requires executing at least once every: All test cases that cover decisions and
condition exit/entry point, decision branch, outcome of each conditions.
decision condition in a decision, and condition in a decision that Percentage of Boolean subexpressions
coverage independently affects each decision branch. that have been tested as both true and
(Hayhurst et false and percentage of statements
al. [B27], covered.
RTCA
[B73])
108
IEEE Std 1633-2016
Table 32—Code coverage metrics (continued)

Type of test Purpose/benefits Coverage output
Boundary value coverage
Boundary Boundary value coverage tests often overflow and Percentage of identified data points
value underflow defects as well as typos with <, ≤ and >, ≥. tested.
coverage Data-driven testing exercises the system to provide
(Beizer [B4], assurance of its behavior across a range and combination
Binder [B6]) of input data, attempting to choose as few test values as
possible by dividing the data space into equivalence
classes and selecting one representative from each class
(with the expectation that the elements of this class are
“equivalent” in terms of their ability to detect failures).
Example of condition coverage:
Ex:
if (A and B) or C then
……
Else
…..
Possible test cases:
A=True, B = True, C = False
A = False, B = False, C = True
Note that the preceding covers the conditions but not the paths since the else condition is never executed.
Example of condition decision coverage:
Using the previous example some possible test cases for condition decision coverage are shown as follows.
This is also the minimum number of test cases to test both the conditions and the decisions.
A=True, B = True, C = True

A = False, B = False, C = False
Example of multiple condition decision coverage:
Using the preceding example there are eight required test cases to test every combination of decisions and
conditions.
A B C
True True True
True True False
True False True
False True True
True False False
False True False
False False True
False False False
Example of boundary value coverage:
X is defined as an integer value

If X > 10 then
Call A;
Else
Call B;
109
IEEE Std 1633-2016
Data spaces are therefore:

Data space #1– All values > 10
Data space #2 –All values ≤ 10
Test cases for each data space are therefore:

Data space #1 – X=11, X=HV, X=HV+1
Data space #2 – X=10, X=-HV, X=-HV-1
Where: HV is the largest integer value supported
Which type of coverage metric is appropriate? That depends on the criticality of the code, and any
contractual or regulatory requirements as well as time. As can be seen in the preceding example,
decision/condition coverage requires two test cases for each branch in logic. Therefore it does not require
more testing than the statement coverage from a test execution standpoint. The additional effort is in
identifying the tests. However, if one has an automated tool to identify the test cases, then the labor of
employing decision coverate testing over path testing is similar. MCDC does require more test cases than
the others. If the software is safety critical and/or regulated MCDC may be necessary. Figure 73 is the
checklist for measuring code coverage.
a) Decide which type of control flow testing is feasible as well as the desired percentage coverage.
b) Decide whether to implement the boundary value coverage tests.
c) Verify that the automated tools needed for the control flow testing and boundary value coverage
testing are available. Code-based adequacy criteria require appropriate instrumentation of the
application under test. These tools insert recording probes into the code, which in turn provide the
basis for coverage analyzers to assess which and how many entities of the flow graph have been
exercised and provide a measure against the selected test coverage criterion, such as statement,
branch, or decision coverage.
d) Verify that all software engineers know how to use the tool.
e) Verify that there are clear procedures in place to verify that all code is tested as per the decisions
made in steps a) and b).
f) Measure the test coverage.
g) Make a decision about transitioning to the next phase of testing based on the coverage results.
h) This task may be revisited in 5.4.3.
i) Once code coverage is measured in this subclause and black box coverage is measured in 5.4.1,
make a decision concerning the coverage in 5.5.
Figure 73 —Checklist for measuring code coverage
5.4 Apply software reliability during testing
This subclause discusses how to apply SR during software system level testing and beyond. It is not the
scope of this document to discuss how to do software and systems testing. The inputs and outputs for each
of the SRE task employed during software and system testing are shown in Figure 74.
Developing a reliability test suite that is based on the OP is first. Test coverage is continually measured
throughout testing based on the white box test results, black box test results, and if applicable, the SFMEA
report from 5.2.2. The test effectiveness can be increased via fault insertion, which relies on the failure
modes defined in 5.2.2.
110
IEEE Std 1633-2016
During system level testing, faults and failures are recorded as well as the amount of usage time
experienced by the software. This data is used by all of the SR growth models. The best SR model is the
one that fits the failure rate trend observed as per 5.4.4.
Other metrics are also applied during testing that support the results of the reliability growth models. The
results of the predictions from 5.3.2.3 and the reliability growth models from 5.4.5 are verified for accuracy
prior to making a release decision.
The decision to release the software is based on the results of the metrics, the reliability objective from step
5.3.1, the results of step 5.4.7 and the measured test coverage from step 5.4.3.
Figure 74 —Apply software reliability during testing

All SRE tasks in this subclause are applicable for incremental or evolutionary LCMs. See 4.4 for more
information. Whenever there is a testing activity the following tasks can be implemented. When there is an
incremental model, the reliability models can be applied to each testing increment as discussed in 5.4.5.6.
Table 33 shows each of the SRE testing activities, purpose and benefits, and applicability for incremental
development.
111
IEEE Std 1633-2016
Table 33 —SRE Tasks performed during software and systems testing phase
SRE testing Applicability for incremental
Purpose/benefits
activities development
5.4.1 Develop a Develops a test suite that can increase reliability. Not affected by development
reliability test suite cycle or LCM.
5.4.2 Increase test Increased test coverage, particularly on the ability for Not affected by the LCM.
effectiveness via the software to identify and recover from failures.
software fault
insertion
5.4.3 Measure test Measures the percentage of the code and the Test coverage is accumulated
coverage requirements have been tested. from increment to increment.
5.4.4 Collect fault This is a required prerequisite for all other SRE testing Can be collected at each
and failure data activities. increment as well as totaled for
all increments.
5.4.5 Select Estimate current and future failure rate, MTBF, See 5.4.5.6 for more
reliability growth reliability, and availability that can be merged into the information.
models system reliability model in order to determine whether
a reliability objective is met.
5.4.6 Apply SR In order to determine whether the software should be Can be performed during each
metrics released, additional metrics over and above the increment and at the final
reliability estimations from step 5.4.5 should be used. increment.
5.4.7 Determine the The models employed during development and test
accuracy of the should be validated against actual failure, fault and
predictive and defect rates to allow for the models to be updated as
reliability growth required based on actual results.
models
5.4.8 Revisit the In 5.2.1 the defect RCA is performed on historical data
defect RCA from similar systems. This is updated with the root
causes found during testing so as to drive the focus
during testing and improve future analyses.
5.4.1 Develop a reliability test suite
This is a typical but highly recommended SRE task. The individuals planning the reliability test suite
include software quality and testing, software management, and reliability engineering. The individuals
who will perform the tests themselves include the software engineers and software quality and test
personnel as well as the reliability engineers.
The reliability test suite differs slightly from the traditional test suite in that the goal is to measure the
reliability growth in an operational profile. In theory, SR testing is relatively straightforward. In practice,
the concept is far from trivial. To implement SR testing effectively the software test team will need to have
the inputs summarized for the five types of software system level tests. There are three types of black box
tests: operational profile, requirements based, and model based. There are two types of stress case tests:
timing and performance, and failure modes testing. It is assumed that the clear box testing is performed by
the developers as per 5.3.9.2. The inputs to the reliability test suite are shown in Table 34.
112
IEEE Std 1633-2016
Table 34 —Components of a reliability test suite

Type of test Inputs Purpose/benefits
Black box testing
Operational The operational profile (OP) Exercises the system under test (SUT) according to how it
profile testing from 5.1.1.3. will be used operationally.
Requirements- The software requirements. Exercises the SUT to provide assurance that it satisfies its
based testing requirements as specified.
Model-based Test models may be derived from Exercises the SUT to evaluate compliance with a test model
testing requirements and design that represents critical aspects, including permitted/excluded
documentation. operation sequences as well as input/output domains and
combinations. This may reveal many kinds of defects,
including dead or unreachable states.
Stress case testing
Timing and  Timing and scheduling Exercises the SUT to evaluate compliance with
performance diagrams that are usually requirements for real-time deadlines, resource utilization.
found in the design
documentation
 Performance requirements
 The effects of software on
the system design from
5.1.1.4
Failure modes The identification of failure Exercises the conditions that are associated with the
modes from 5.2.2. identified failure modes. This is the only test that verifies
that software works properly in a degraded environment.
The checklist for developing a reliability test suite is shown in Figure 75.
a) Develop an OP as per 5.1.1.3 and proceed to 5.1.1.4 to identify how to incorporate that OP into the
reliability test suite.
b) Identify all of the software requirements and proceed to 5.4.1.2 to identify how to incorporate the
requirements into the reliability test suite.
c) Identify or construct a finite state machine for the software. Proceed to 5.4.1.3 to identify how to
include model based testing into the reliability test suite.
d) Locate timing diagrams, performance requirements. Proceed to 5.4.1.4 to identify how to
incorporate the timing and performance into the reliability test suite.
e) Locate the specific software failure modes that resulted from 5.2.2. Proceed to 5.4.1.5 to identify
how to incorporate the failure modes into the reliability test suite. Then proceed to 5.4.2 to identify
how to increase the effectiveness of this test via fault insertion.
Figure 75 —Checklist for developing a reliability test suite
5.4.1.1 Operational profile (OP) testing
OP testing should be in conjunction with the requirements testing and the model-based testing. Instructions
are provided in 5.1.1.3 for characterizing the OP. In this, the practitioner develops a test suite that uses that
OP. Recall that the profile is defined by the customer types, user types, and system modes and software
functions. The software test suite should be constructed so that the particular software functions are
exercised the most in the system mode used by most of the users at most of the customers. This may seem
obvious but far too often the software test suite applies equal attention to every function regardless of its
113
IEEE Std 1633-2016
probability of being used. See F.4.1 for an example. By using the OP, the developers and testers can be
prepared to develop and test as similarly as possible to the actual end users.
5.4.1.2 Requirements testing
Requirements testing is when every software requirement is explicitly tested. Since software requirements
tend to be at a high level, this type of testing is typically combined with other tests such as OP testing. The
goal is to not only cover the OP but to make sure that every written requirement is also verified.
5.4.1.3 Model-based testing
Model-based testing uses test models derived from requirements, specifications, and expectations. Test
models use many different representations; state machines are supported in most methods and tools. The
entire procedure is illustrated with a concrete example in the F.4.2. The steps for developing a model-based
reliability test suite are summarized in Figure 76.
a) Document a software specification as a state machine (Binder [B6]).

b) Identify all inputs from all actors interfacing with the system and all outputs from the system.
c) Enumerate all possible values that could be received from input source.
d) Identify all operating variables for the system under test (SUT). These are externally observable
parameters that impact 1) when an input is possible, and/or 2) the required system response based
on the variable value.
e) Specify behavior for each value variant of each input. For each input value variant specify the
following:
1) Operating variable constraints (variable values have to be true for input to occur)
2) Nominal required response (response is always observed)
3) All alternative responses based on unique operating variable values (response MAY occur)
4) All operating variable updates resulting from the stimulus and response
f) Define a compact model specification format following the preceding specification procedure.
1) The resulting state machine needs to be stored in a test generation tool so that tests may be
automatically generated from the tool.
2) Practitioner identifies a commercial tool that is capable of reading the finite state machine
produced using the preceding algorithm, or develops its own test generation capability by
taking random walks through the finite state machine state space.
g) Produce a test automation capability in support of model based testing.
1) Each action is an identifier; the identifier needs to have a corresponding test function
implemented in a tool that will be used to execute test sequences against the SUT. These test
functions include scripts that specify the specific input values that are sent to the SUT and the
output variables that are evaluated to verify that the system responds correctly to each
stimulus applied in a sequence.
2) A test interface is created between the test executive and the SUT. This may include adapters
that contain the native message protocols used to communicate with the SUT. These adapters
convert information from a test description language into messages that precisely reflect or
simulate the data that the SUT is required to process by design.
3) Auto-generated test cases can be automatically executed against the SUT.
NOTE—An input variant is one that invokes a different response from the system based on the value of the input.
Figure 76 —Checklist for developing a model based test suite
114
IEEE Std 1633-2016
5.4.1.4 Timing and performance testing
The inputs to the timing and performance testing are timing diagrams, scheduling diagrams, performance
requirements, and the results of 5.1.1.4. Timing diagrams visualize how data items change over time, and in
particular the concurrency, overlaps, and sequencing.
The purpose of testing for timing considerations is to identify the following faults (Binder [B5]):
 Deadlock: A stalemate that occurs when two elements in a process are each waiting for the other to
respond.
 Livelock: Similar to a deadlock except that states of the processes involved in the live lock
constantly change with regard to one another, none progressing.
 Race conditions: The result of an operation is dependent on the sequence or timing of other
uncontrollable events. Specifically it is when two or more threads are attempting to access the same
data at the same time. This can happen when there are incorrect priorities for the tasks or processes
or resources are not locked.
Scheduling diagrams visualize the performance of the software. The purpose of testing for performance is
to identify faults related to any of the following (Laplante [B50], Mars [B52], Reinder [B75]):
 CPU utilization: Percentage of time the CPU is executing does not exceed a predefined threshold
such as 70%.
 Throughput: The number of processes completed per time unit is as required.
 Turnaround time: Interval from time of process submission to completion time meets the
performance requirements.
 Waiting time: Sum of periods spend waiting is not excessive.
 Response time: Time from submission of request until time first response is produced is not
excessive.
 Fairness: Each thread received equal CPU time and hence no threads starve.
It is not within the scope of this document to discuss the details of timing and performance testing.
However, it is important that the practitioner verify that timing and performance testing is being performed.
5.4.1.5 Failure modes testing
Failure modes testing and timing and performance testing are the only two tests that cover the stress points
of the software. While the timing and performance tests cover the stress in terms of loading, throughput,
and timing, the failure modes tests covers stress testing that is of a functional nature. The requirements and
OP define what the software is expected to do. But they rarely define what the software should not or
cannot do. The SFMEA focuses on the failure space. The result of the SFMEA is a list of mitigations that
usually require a change to the code, and sometimes require a change to the design and the software
requirements. It is commonly and incorrectly believed that the SFMEA yields results that would have been
found during testing. The reason why this reasoning is usually faulty is that “could have” and “would have”
are two very different things. While the mitigations that result from the SFMEA usually “could have” been
found in testing, they generally are not because the testing usually focuses on the success space and not the
failure space.
The checklist in 5.1.1.4 represents a summary of the things that should be included in a failure modes
testing suite. For each of the following failure events, the software can detect, isolate, and apply the
applicable “R”—recovery, reduced operations, reset, partial reset, restart, reload, or repair.
115
IEEE Std 1633-2016
Failures that should be identified by the software include but are not limited to hardware failures,
inadvertent operations, out-of-sequence commands, synchronization and interface failures, valid but wrong
inputs, invalid but accepted inputs, environment effects, etc. See the example in F.4.3.
5.4.2 Increase test effectiveness via fault insertion
This is a project specific task because it does typically require specialized tools. The software failure
modes identified in 5.2 are the inputs to this task. The software organization and management have the
lead responsibility for this task.
Software fault injection (FI) is the process of systematically perturbing the program’s execution
according to a fault model and observing the effects. The intent is to determine the error resiliency of the
software program under faulty conditions. The user needs to apply a compiler that compiles programs in
high-level languages such as C and C++ to a low-level IR (intermediate representation). The IR
preserves type information from the source level, but at the same time, represents the detailed control
and data flow of the program.
The faults inserted represent that of off-nominal conditions from a real, operational environment, which are
introduced randomly. Fault insertion differs from fault seeding in which the source code is modified so as
to be defective for the purposes of estimating the total number of defects in the software. Fault insertion is
when an image of the software is manipulated during runtime so that specific types of faults can be
triggered. Fault seeding is not recommended under any circumstances while fault insertion is.
Since fault insertion typically needs to be performed repetitively, it is desirable to avoid going through
the compile cycle every time a new fault is to be injected. Therefore, the ideal tool should insert special
functions into the IR code of the application, and defer the actual injection to runtime.
The FI infrastructure should provide both fault injection and tracing of the execution after injection.
This will allow the user to inject as many faulty conditions as one wants without recompiling the
application. Further, it is possible to defer the decision of the specific kind of faulty condition to inject
at runtime, depending on the runtime state of the application.
One of the main challenges in FI is to trace the propagation of the faulty condition in the program, and
to map it back to the source code. This is essential for understanding the weaknesses in the program’s
fault handling mechanisms and improving them. The FI infrastructure should allow users to trace the
propagation of the faulty condition after its injection as the program executes, and also to identify
specific kinds of targets or sinks in the program that should be compared with the fault-free run, to
determine whether they have been corrupted by the fault.
In summary, FI infrastructures enable the users to
 Inject faulty conditions into a wide range of applications.

 Control at runtime, the kinds of faulty conditions that should be injected and where to inject them
in the program. This determination can be made by examining the program’s runtime state, and by
considering source-level characteristics of the program’s code.
 Trace the propagation of a faulty condition, after injecting it, and examine the state of selected
program elements in relation to their fault-free state.
As an example, corrupted data may be introduced randomly so as to verify that the software and system can
handle this failure mode. There are different methods in which faulty conditions can be inserted but the
most effective method is during runtime where software is in operation and faulty conditions are inserted in
a random manner. This method requires a set of predefined failure modes and tools that can automatically
and randomly insert the faulty condition during execution.
116
IEEE Std 1633-2016
The method of categorizing faults is an important aspect of fault tolerance and management. The potential
sources of defects are software requirements, architecture, and software code. The failure data regarding the
most common defects can be obtained from field or customer reports.
Defects can be classified via a number of taxonomies. For example, there are defects related to timing,
sequencing, data, error handling, and functionality. Software failure modes are derived from the software
defect classifications. For example, “faulty timing” is a software failure mode. A race condition is one of
the many root causes of faulty timing. Examples of failure modes and root causes can be found in 5.2.2.2.
Figure 77 is the fault insertion test process that starts from identification of various failure modes that the
software is susceptible to evaluation of test results.
a) Collect customer field failure data and brainstorm failure modes applicable to the software. The
analyses in 5.2.1 can be used for this step.
b) Develop a software taxonomy for this system.
c) Identify which failure modes are most critical and applicable for each function of code (some
functions may have some failure modes that are applicable while other functions may have other
applicable failure modes).
d) Plan how to insert the faulty condition that causes that failure mode.
e) Insert the faulty condition by instrumenting the system.
f) Capture how the system behaves when this faulty condition is inserted.
g) Evaluate the test results and make modifications to the requirements, design, or code as applicable.
Figure 77 —Checklist for performing software fault insertion
5.4.3 Measure test coverage
This is a typical but highly recommended SRE task. The software quality assurance and test organization
and the software organization have the primary responsibility for this task since some of the test coverage is
measured during development test and some is measured during a system level test.
Test coverage is a function of both black box coverage and structural coverage. Black box coverage is an
indication of the coverage of the requirements, the model, and the OP. Statistical testing such as the RDT
are considered black box tests because they are conducted without knowledge of the detailed design or
code.
Structural (or “clear box”) testing provides assurance based on visibility into, and analysis of, the actual
detailed design code. The goal is usually to execute every line of code and/or decision in the code at least
once. For even more coverage the goal might be to execute not only every line of code and decision but to
also exercise a large spectrum of inputs for each line of code or decision. Clear box coverage is a
measurement of the extent to which certain software units are executed, typically during testing of an
instrumented implementation. For example, if every statement in a program is executed at least once, 100%
statement coverage is achieved. Many coverage models have been proposed, used, and evaluated.
Does high code coverage equate to high reliability? Can high reliability be achieved without code
coverage? Can code coverage be used to improve the accuracy of reliability estimates? In summary, the
answers are “no,” “no,” and “maybe.”
The relationship between code coverage and SR has been studied since the early 1990s. (See Comparative
Study [B1].) Motivated by reliability estimates that were often overly optimistic, several studies evaluated
the addition of code coverage measurements to reliability growth models. Not surprisingly, increased
117
IEEE Std 1633-2016
coverage is correlated with increased reliability as well as reduced estimate error. These studies
conclusively established that reliability estimates based on low-coverage test runs often overestimate
operational reliability.
A few things may be said with certainty. A software defect cannot be revealed unless faulty code is
executed (“covered”) and the related data is in a state that triggers an incorrect effect. If this occurs during
testing, the observable defect that caused the failure may be corrected and/or influence a reliability
estimate. If a defect escapes the development and testing process and then produces a field failure, a
prerelease reliability estimate based on that testing will over-estimate operational reliability, other things
being equal.
Escapes may result even if faulty code is covered. Although it is assumed that executing faulty code during
test is necessary to reveal a defect in that code, it is not sufficient. Coverage (of any kind) cannot be
sufficient because the data state size related to a faulty code is typically astronomically large and is itself
the typically the result of complex environmental configurations and execution traces.
Observed reliability, both during and post test, is determined by a system’s usage (operational profile). If a
certain pattern of usage does not cause faulty code to be executed, it will not fail given that usage pattern.
In general, simply executing a statement, path, or any other software unit is never sufficient to trigger a
failure. One important consideration is that testing longer does not necessarily increase test coverage
because it is possible to test the same inputs over and over again. With lower code coverage, the risk
increases that an undetected defect will escape and be triggered in field operation, resulting in worse than
estimated reliability.
In addition to these limitations, code coverage is not a guarantee of adequate testing or failure-free
operation since:
 It cannot be produced for an omitted component, feature, or function.

 It can be meaningless for code components that make use of dynamic run-time binding (functional
programming or polymorphic object-oriented methods) or data-driven strategies.
 It can be achieved and easily miss feature interactions that lead to failures.
 It can be achieved with a very limited subset of possible environments, platforms, or
configurations.
 It is typically unrelated to security “flaws.”
 It can be achieved without triggering stress-related failures.
Once the test coverage measure is identified, the tests are run and coverage is measured. The actual defects
discovered (faults) during testing are accumulated. Then the coverage and defect and fault discovery
findings are documented. The checklist for measuring test coverage is shown in Figure 78.
118
IEEE Std 1633-2016
a) Identify the black box and structural coverage goals based on the criticality of the software,
availability of tools, and budget.
b) Software engineers should have already executed code coverage tests and metrics as per 5.3.9.2 and
5.4.1.
c) If the desired or required code coverage was not achieved during development test, it may need to
be measured during integration and systems testing. There are automated testing tools that measure
code coverage as a background task while the software is running.
d) It is essential that whenever changes are made to the code after the coverage has been measured
that the changed modules be retested. Otherwise, the coverage of the changed modules is
essentially zero since the change invalidates the previous test results as well as coverage measures.
e) Whenever a test case fails the developers should identify and correct the defect and then retested to
confirm 1) the defect had been properly corrected, and 2) no new defects were introduced (through
regression testing). In terms of incomplete coverage, there are also coverage gaps in terms of lack
of understanding of how the system will behave under conditions that have not been tested.
f) Record which gaps were remediated and why, which ones were not and why, the rationale for the
remediated and non-remediated gaps, and whether the risk level was accepted or mitigated.
g) Go to 5.5 and make a decision concerning the adequacy of the code coverage.
Figure 78 —Checklist for measuring test coverage
5.4.4 Collect fault and failure data
This is an essential task because the SR growth models require fault and failure data as inputs. The software
quality assurance and test person typically has the lead role in collecting the fault and failure data.
However, the software engineers themselves are contributing to the fault and failure database.
During the testing activity, SR growth models are used to determine the current and forecasted SR figures
of merit. These models require two inputs: the date of the failure and the test or usage hours expended each
day. The date of the failure can be derived from the failure report.
Failure reports—During testing problem reports are generated when anomalies occur in testing. Generally,
when multiple failures occur because of the same defect it is common practice to generate a failure report
that is associated with the unique underlying defect associated with the failure. In order to understand the
relationship between faults and failures, one should either generate a failure report for each instance of the
failure, or record each instance of the failure in the report.
Before software system testing begins determine where the database of software problem reports exists
within the organization. If there are multiple organizations, it is entirely possible that there is more than one
database. Once the database(s) are located, filter the data to ignore the following reports:
 Defects discovered in earlier phases of development (that have been closed)

 Defects discovered during requirements, design, or code reviews (that have been closed)
 Failure reports that are duplicates of other failure reports
 Failures caused by hardware or humans
 Any new feature requests
 Any failure report that has not been reviewed and validated
 Defects in the user documentation
119
IEEE Std 1633-2016
Useful tips for managing the SR data:
 Include a flag in the problem report that indicates that the problem is related to reliability.
 Include a counter in each problem report to be incremented every time that same defect causes a
failure.
The software problem report typically has several items of information on it. These are the items that are
needed for the models.
 The date of the failure

 The severity of the failure (e.g., critical, minor) or whether the failure effects reliability
Every time there is a failure reported in testing, add it to the data set. So, it is a good idea to use automated
methods to retrieve the failure reports and import them into the tool that is being used to estimate SR
growth.
Once the data is filtered, organize it. In 5.4.5.5 two types of data formats were discussed—inter-failure data
and time to failures. In the first case, it is not known exactly when the failure occurred, only that it occurred
during some time period. So, if all of the failure reports are logged at the end of the day, one would know
how many failures were detected per day but not when those failures occurred during the day. In the second
case, if the testing was to be automated such that the actual number of test hours between each failure event
is captured that would be time-to-failure data.
Time—Since software does not fail due to wear out, its failure rate is not necessarily a function of calendar
time. It is important to measure the actual time that the software is executing to provide for data that is
normalized properly. Failure to normalize can dramatically reduce the accuracy of the models, particularly
if the software is not executing the same amount every day. For example, if one day there are 20 people
testing 8 h a day and the next day no one is testing that is an example of when execution time is much more
accurate than calendar time. Ideally, time for SR measurements is measured in CPU time. However, it is
often not possible to measure or collect CPU time during testing and particularly during operation. In that
case, the execution time can be approximated by estimating the number of operational test hours for every
individual who is using the software for its intended purpose (Musa et al. [B59]). Figure 79 summarizes the
steps to collecting failure data.
Once the data is collected, certain primitives should be computed from that data. These primitives are
useful for selecting the SR growth models and are also needed as inputs to the SR growth models. The
checklist for determining data primitives is shown in Figure 80.
a) Locate the software defect and failure reporting databases in the organization/project.
b) Export the software reports that are classified as defects (do not include any new feature
recommendations or hardware reports).
c) For each report identify the date and severity of the failure.
d) Identify which unit of measure is available—CPU time, usage hours, etc.
e) Organize the data—If the time of the failure is known then one can compute the time between
failures or failures in a period of time.
1) If the time to failure models are being used one will need at least one of the following:
 The number of CPU hours or operational hours since the last failure. Note that walk clock
hours is usually not a normalized measure unless the testing effort per day remains constant.
 The number of runs or test cases executed since the last failure.
 Test labor hours since the last failure.
120
IEEE Std 1633-2016
2) If the inter-failure data models are being used, one will need at least one of the following for
each day of testing and for every computer used for testing:
 The number of CPU hours or operational hours per time interval such as a day or week.
 The number of labor hours spent on testing this software during each interval.
f) Compute the failure data primitives as discussed in the next subclause.
Figure 79 —Checklist for collecting fault and failure data during testing
a) Compute the non-cumulative number of faults observed in each time period—f.

b) Compute the non-cumulative number of testing hours in each time period—x.
c) Compute the cumulative number of faults observed through each point in time—n.
d) Compute the cumulative number of hours (either CPU or operational hours)—t.
e) Compute the cumulative faults discovered/cumulative test hours at any point in time—n/t.
f) Plot the fault rate array on the x-axis and the cumulative number of faults array on the y-axis.
Determine the shape and slope of the trend. Proceed to 5.4.5 to select the best model.
g) OR plot the noncumulative faults on the y-axis and the test intervals on the x-axis. Determine
whether one or more peaks are present. Use the plot to select the best model in 5.4.5.
Figure 80 —Checklist for determining data primitives
Example #1:
The data set in Table 35 is an example of when it is known how many failures have occurred during some
period of time, but the time between the failures is not known. In the following example, the failures
observed per day of testing are recorded. The number of faults and test hours expended during each day is
recorded as shown following. On the first day of testing April 28 one fault was observed in 8 h of testing.
On May 3, the next fault was found. There had been 16 h of testing since April 28. On May 5, the third,
fourth, and fifth faults were observed, and there had been 16 h of testing since the second fault was
observed, and so on. The n column is simply the sum of all f’s up to that point in time. The t column is
simply the sum of all x’s up to that point in time. In the last column, n/t is computed. Note that the
granularity of the data is in terms of days. If a larger interval such as weeks is used, the data will have 20%
of the granularity. The more granular the more accuracy the estimates that will result from the methods
discussed in 5.4.5.
121
IEEE Std 1633-2016
Table 35 —Example of primitive data

f (faults x (usage hours n (cumulative t (cumulative
Day n/t
per day) per day) faults) usage hours)
4-28 1 8 1 8 0.125
5-3 1 16 2 24 0.083
5-5 3 16 5 40 0.125
5-6 2 8 7 48 0.146
5-10 2 16 9 64 0.141
5-12 2 16 11 80 0.138
5-13 1 8 12 88 0.136
5-25 2 64 14 152 0.092
5-26 9 8 23 160 0.144
5-27 1 8 24 168 0.143
6-1 1 24 25 192 0.13
6-2 5 8 30 200 0.15
6-3 2 8 32 208 0.154
6-4 2 8 34 216 0.157
6-5 2 8 36 224 0.161
6-6 4 8 40 232 0.172
6-7 1 8 41 240 0.171
6-17 2 64 43 304 0.141
6-23 1 40 44 344 0.128
6-25 1 16 45 360 0.125
7-6 1 56 46 416 0.111
7-23 2 40 48 456 0.105
7-26 1 8 49 464 0.106
7-30 1 32 50 496 0.101
8-3 1 16 51 512 0.1
8-5 2 8 53 520 0.102
8-11 3 32 56 552 0.101
8-24 2 64 58 616 0.094
8-30 1 24 59 640 0.092
The cumulative fault rate (n/t) is plotted against the cumulative faults (n) as shown in Figure 81.
Figure 81 —Increasing and decreasing fault rate
122
IEEE Std 1633-2016
As seen from the Figure 81 and the table data, the cumulative fault rate is increasing until June 6. From
June 6 to the current point in time at August 30 the trend is decreasing. This information will come in
handy when selecting the reliability growth models.
Alternatively, one can determine if the fault rate is increasing or decreasing by plotting the non-cumulative
faults versus the operational time as shown in Figure 82. One can see from the following graph that the
observed faults peaked on May 26 and started to decrease after that time.
Figure 82 —Same data plotted differently

It is not unusual for the cumulative fault rate to be increasing early in testing. However, if it is consistently
increasing to the deployment date that is usually indicative of unstable software if the fault data and time
data is collected properly as per the previous subclause. If the rate increases after decreasing that is usually
an indicator that either new features have been added during testing or that the test coverage has been
accelerated.
Example #2:
Figure 83 is a plot from a real software system. Initially, the trend appeared logarithmic. Then the fault rate
increased for a period of time. Then the trend started to decrease and appeared to be linear. The overall
trend, however, is a decreasing fault rate. Any of the models discussed in 5.4.5 are at least applicable with
the following data.
Figure 83 —Various fault rate trends
123
IEEE Std 1633-2016
Example #3
Figure 84 is another real software fault data set that shows the fault rate is steadily increasing. If the testing
phase has just begun this is not unusual. However, if the testing phase is nearly complete this trend could be
an indication that the software is not stable. Most of the reliability growth models will not produce a result
in the following case. Figure 84 is also an example of what should not be expected at the point in time in
which the software is about to deploy. In 5.4.5 notice that there are only two models that can be used for
the following data set. On the other hand, all of the models are applicable for the example in Figure 83.
Figure 84 —Increasing fault rate trend
5.4.5 Select reliability growth models
This is an essential task. Typically the software quality assurance and test engineer or the reliability
engineer will select the best reliability growth (RG) models.
For several decades it has been observed that the distribution of software faults over the life of a particular
development version typically resembles a bell shaped (or Rayleigh curve) as shown in Figure 85. (See
Putnam [B71].)
a) Faults are steadily increasing. This is very typical for the early part of testing.
b) Faults are peaking (statistically this happens when about 39% of total faults are observed, see
Putnam [B71]). There could be one peak or several peaks if there is incremental development.
c) Faults are steadily decreasing. If there a no new features and the software is tested and defects that
cause faults are removed, eventually the fault occurrences will decrease.
d) Faults are happening relatively infrequently until no new observations.
124
IEEE Std 1633-2016
12
Peaking
Non Cumulative defects

10
Decreasing
8
discovered
6
Stabilizing
4
Increasing
2
0
Normalized usage period
Figure 85 —Rayleigh fault profile

During the testing phase, the software may be in any of one or several of the previous four stages. Since
virtually all of the software reliability growth (SRG) models are dependent on assumptions about the fault
discovery rate, the practitioner cannot select the SR growth model until the practitioner has plotted the non-
cumulative faults by normalized operational time to determine which models are applicable for the current
stage. Therefore the practitioner should be familiar with a cross section of the SRG models and be equipped
to use the model that best reflects the data collected. Table 36 summarizes several SR growth models.
Table 36 —Summary of software reliability growth models

Inherent Can be used when exact
Model name Effort required
defect count time of failure unknown
Increasing fault rate
Weibull (Kenny [B44]) Finite/not fixed 3 Yes
Peak
Shooman Constant Defect Removal Rate Finite/fixed 1 Yes
Model (Shooman, Richeson [B80])
Decreasing fault rate
Shooman Constant Defect Removal Rate Finite/fixed 1 Yes
Model (Shooman, Richeson [B80])
Linearly decreasing
General exponential models including: Finite/fixed 2 Yes
 Goel-Okumoto
(Goel, Okumoto [B19])
 Musa Basic Model
(Musa et al. [B59])
 Jelinski-Moranda (Jelinski,
Moranda [B37])
Shooman Linearly Decreasing Model Finite/fixed 1 Yes
(Shooman [B82])
Duane (Duane [B14]) Infinite 2 No
125
IEEE Std 1633-2016
Table 36—Summary of software reliability growth models (continued)

Inherent Can be used when exact
Model name Effort required
defect count time of failure unknown
Nonlinearly decreasing
Musa-Okumoto (logarithmic) Infinite 1 Yes
(Musa, Okumoto [B58])
Shooman Exponentially Decreasing Model Finite/fixed 3 Yes
(Shooman [B82])
Log-logistic (Gokhale, Trivedi [B20]) Finite/fixed 3 Yes
Geometric (Moranda [B57]) Infinite 3 No
Increasing and then decreasing
Yamada (delayed) Infinite 3 Yes
S-shaped (Yamada et al. [B97])
Weibull (Kenny [B44]) Finite/not fixed 3 Yes
Key for effort required to use the models:

a) Very simple parameter estimation
b) Easy to use if simple parameter estimation is used otherwise not so easy
c) Parameter estimation necessitates automation
Figure 86 illustrates the steps for identifying the models that are applicable. The instructions for using each
of the models can be found in either 6.3 or Annex C. The model(s) are selected firstly based on its
assumptions concerning failure rate. Once the models that apply to the current failure rate trend are
identified, the model assumptions concerning defect count, defect detection rate per defect, and effort
required to use the model are analyzed next. The process for selecting the appropriate models is as much a
process of elimination as a process for selection in that the models that cannot be used on a particular data
set are eliminated first. The remaining models are used and the model that is trending the best against the
data is identified in 5.4.7.
Annex C and 6.3 discuss the steps for using the preceding SR models. The models are presented with the
models that require the least amount of effort to use first and the models that require the most effort last.
Examples are provided for the models in 6.3.
a) Review the fault data primitives as per 5.4.4. Recall that the primitives show whether the rate of
faults is increasing or decreasing.
b) Review 5.4.5.1. If the rate of faults is steadily increasing then remove all models that do not have
this assumption. If the rate of faults is increasing and then decreasing, then either model the most
recent data or employ a model that can handle both trends. Otherwise if the trend is decreasing
most of the models are applicable.
c) If the fault trend is decreasing, review 5.4.5.2. If the fault rate trend is nonlinear (i.e., logarithmic)
that is an indication that the defects are not equally probable to result in faults. In that case,
eliminate models with that assumption.
d) Review 5.4.5.3. If there is reason to believe that the inherent defect content is changing over time
then eliminate models with this assumption. The inherent defect count might increase if the
software engineers are injecting new defects while correcting the discovered defects. Refer to 5.4.6
concerning the corrective action effectiveness.
e) Review 5.4.5.4. Eliminate models that require more effort or automation than is feasible for this
project.
f) Review 5.4.5.5. Eliminate any models that require knowledge of the time of each failure if this
information is not available (most of the time, this is the case).
126
IEEE Std 1633-2016
g) If more than one model is feasible, use the model that requires the least effort or use all applicable
models.
h) It is a good idea to be prepared to use more than one model. During testing the trend and
assumptions can change. The cost of using the models is largely in collecting the data. If the
models are automated it is generally not more expensive to use more than one model.
i) The model that is currently trending the best can be identified as per 5.4.7.
j) The models that are the easiest to use are described in 6.3, while the others that require automations
are described in Annex C. Use the model(s) identified in the preceding steps and proceed to either
6.3 or Annex C as appropriate.
k) If the software is being developed incrementally, refer to 5.4.5.6.
Figure 86 —Checklist for selecting a reliability growth model
5.4.5.1 Fault rate
The fault rate during testing can be any of the following. Generally, the models will assume at least one of
the following failure rates:
a) Increasing
b) Peaking
c) Decreasing
d) First increasing and then decreasing
During the early part of testing it is very common for the fault rate to increase. Once the software stabilizes
it may then begin to decrease. If new features are added to the code during testing, it may/will increase
again. Prior to selecting any SR growth model, the practitioner should graph the number of fault
occurrences over time as per 5.4.4. If the most recent trend is not decreasing it is too early to use most of
the SR growth models. If the most recent trend is decreasing, all of the models may be applicable.
If the trend is initially increasing and then decreasing, the practitioner has two options, as follows:
a) The simplest option is to remove the portion of the data that is initially increasing and use the
appropriate SR growth model(s) on the most recent data.
b) The models that apply to the increasing and then decreasing models such as the S-shaped models or
the Weibull model can be considered.
5.4.5.2 Linear versus nonlinear decreasing
This is applicable if the fault rate is decreasing. Review the fault rate trend from 5.4.4. If it appears to be
linear then consider the models that assume a linearly decreasing fault rate. The linearly decreasing fault
rate is typical when the software is operational. During testing the fault rate may be nonlinear if some
defects are resulting in faults before others. For example, the more obvious defects may be discovered early
in testing and the less obvious later in testing. If it is not obvious whether the trend is linear or nonlinear
then consider both the types of models.
5.4.5.3 Inherent defect content
Several of the models predict reliability by first predicting how many defects exist in the software.
127
IEEE Std 1633-2016
a) Finite and fixed—This means that the total number of defects that exist in the software that are both
known and unknown is countable and does not increase over time. This means that defects will not
be generated when other defects are corrected.
b) Finite and not fixed—This is similar to Item a) except that it is possible that defects are introduced
while correcting other defects.
c) Infinite—These models assume that the inherent defect count is not measurable. These models will
instead measure the rate of defect detection.
The practitioner should research how many corrective actions have introduced new defects in the software.
If this percentage is relatively high then the models that assume a finite and fixed defect count may not be
suitable.
5.4.5.4 Effort required to use the model
The effort required to use the model depends on the following:
a) The number of parameters that need to be estimated.

b) The complexity of the parameter estimation selected. For some models, a simple graph can be used
to estimate the parameters. For other models, the parameter estimation needs to be automated.
c) Whether the model, and in particular the parameter estimations, requires automation.
If the practitioner has an automated tool the work required to use the model is typically related to collecting
the data. Otherwise, the practitioner may want to focus on the models that do not require automation.
5.4.5.5 Model data formats
In 5.4.4 the examples illustrated fault count data. It was known what day the fault occurred but it was not
known exactly when the fault occurred within that day. This is called failure count data, and it is generally
not difficult to collect as long as the analyst knows how many people are testing each day or week and the
date of each software failure discovery is recorded.
Some of the models require a data format that is more granular than the failure count data. Data formats
that are relatively more difficult to come by include the following:
Failure time—The time that each software failure occurred is recorded. This data requires that the software
test harness is intelligent enough to detect a software failure without human intervention and record the
time of that failure.
Inter-failure time—The time in between each software failure is recorded. This can be derived from the
failure time data.
Failure time data assumes that testing commences at some initial time t0 = 0 and proceeds until n failures
have been observed at time tn. Thus, the time of the ith failure is ti. This is one data formats used by failure
counting models. Example of failure times in test hours follows:
T = <3, 33, 146, 227, 342, 351, 353, 444, 556, 571, 709, 759>
There are n = 12 failures. Here t1 = 3, t2 = 33, tn = 759. An alternative representation of failure time data is
inter-failure time data. As the name suggests, inter-failure time data represents the time between the (i–1)th
and ith failure. Example: The same data shown previously is expressed as inter-failure time data as follows:
X = <3, 30, 113, 81, 115, 9, 2, 91, 112, 15, 138, 50>
128
IEEE Std 1633-2016
Example: The same data is shown as failure count data. There are 10 people testing 8 h per day. So, the
intervals are in 80 h, as follows:
T = <1, 2, 3, 4, 5, 6, 7, 8, 9, 10> F = <2, 1, 1, 0, 3, 1, 0, 1, 1, 1>
During the first 80 h, 2 failures occurred. During the next 80 h, 1 failure occurred, etc.
While failure times and inter-failure times can be converted from one format to the other, to disaggregate
failure count data to obtain the failure times or inter-failure times it may be necessary to assume that all of
the failures during an hour were spaced equally. Thus, an organization wishing to apply a model requiring
inter-failure or failure time data should collect this data directly or make the simplifying assumption of
equally spaced defect discovery.
5.4.5.6 Apply software reliability growth models with an incremental or evolutionary life
cycle model
If the software is being developed in more than one increment, that will impact how the reliability growth
models are employed. Table 37 illustrates the considerations pertaining to the incremental or evolutionary
models and guidance on how to apply the reliability growth models.
The steps for applying the SR growth models when there is an incremental or evolutionary LCM are
summarized in Figure 87.
Table 37 —Applying reliability growth models for incremental or evolutionary life cycle
Considerations with incremental or
How to apply the reliability growth models
evolutionary development
If there is an incremental LCM, what do the increments Regardless of whether the requirements are evolving
consist of? Will the requirements be evolving in each of in each increment, do a separate estimation for each
the increments or will the requirements be defined up front increment and then overlay the estimations for each
and the design and code and test activities evolve over increment.
several increments?
How many internal releases will be made prior to the final The user will need to use the models on each internal
release? increment as well as the final release.
How many external releases will be made prior to the final Each external release is subject to its own reliability
release? growth estimation.
a) Identify the increments of testing.

b) For each increment, apply the steps in Figure 88.
c) Overlay the results of each increment in order of testing time starting with the first increment on
the left side.
d) The user may need to use different models in each increment as well as during each increment so
it is a good idea to have several automated.
e) At the final increment, generate a final SRG estimate.
Figure 87 —Steps to apply SRG models for incremental or evolutionary life cycle
Refer to F.4.4 for an example.
129
IEEE Std 1633-2016
5.4.6 Apply software reliability metrics
This task is either typical or essential depending on the metrics. See Table 38 for the primary role and
priority of each metric.
The SR growth models presented in 5.4.5, 6.3, and Annex C forecast one or more SR figures of merit based
on known failure/usage time data. As with any statistical model, the model can only forecast what is
observed. The accuracy of the SR models depends on both the black and white box coverage during testing.
If the OP is not being exercised and/or if the requirements of the software are not being verified, that
may/will cause the SR models to be optimistic since the models cannot measure what they do not know.
The practitioner should use the test coverage and requirements coverage in conjunction with the SR models
to verify that the assumptions of the models are being met.
The accuracy of the SR models presented in 5.4.5 also depends on whether and to what extent the defect
corrections introduce new defects. In most cases the models assume that the defects are removed without
introducing new ones. For legacy software systems, the practitioner should measure the corrective action
effectiveness. If the SR models indicate for example, an estimated number of defects of 200 and the
corrective action effectiveness is 10%, then the practitioner should assume that the estimate of 200 is
optimistic.
The practitioner may want to compare the actual testing defect density to the predicted defect density. If the
actual testing defect density is below or above the expected bounds for the testing defect density that may
be an indication that the software has many more defects than estimated.
Additionally the SR models may indicate that a particular reliability objective has been achieved, however,
it is possible that if the software is shipped when the requirement is met, there will be an excess of
backlogged defects. The organization may decide to continue testing beyond the point of meeting the
reliability requirement to reduce defect pileup in the next software release. If the incoming defect rate is
bigger than the fix rate that is a sign of impending defect pileup regardless of whether the required
reliability objective has been met.
Hence, the SR models should be augmented with additional supporting metrics that combined can aid the
practitioner in determining when the software is ready for the next phase of testing or ready for
deployment. The metrics in Table 38 can be used during testing regardless of whether there is a waterfall,
incremental, or evolutionary LCM. The metrics fall into the four basic categories previously discussed: 1)
coverage of both the requirements and the software structure, 2) code stability, 3) process stability, and 4)
ability to correct defects in a timely fashion. (See IEEE Std 730™-2014 [B32], Smidts [B84], [B85].)
130
IEEE Std 1633-2016
Table 38 —Overview of software metrics used in SRE during testing

Primary
Software metric Definition Priority Purpose
responsibility
Ability to adequately test the functionality and structure
Test coverage Percentage black box and Essential Software quality See 5.4.3.
structural test cases assurance and
executed during testing testing, for black
box testing, and
software
engineering for
structure testing
Requirements Degree to which the Essential Software quality All requirements for this
traceability requirements have been met assurance and increment can be traced to
by the architecture, code testing the completed test cases.
and test cases
Code stability
Corrective action Percentage of corrective Essential Software quality All corrective actions are
effectiveness actions that are not assurance and adequate the first time
adequate testing addressed.
Process stability
Defect density Defects per 1000 source Typical Quality assurance The size of the code is
(defects per lines of code and testing within range of what was
KSLOC) predicted.
The defects found during
testing do not exceed the
predicted testing defect
density.
Ability to correct the defects that are occurring in a timely fashion
Number of The number of software Essential Software quality There should be few to no
backlogged defects defects waiting to be fixed. assurance and backlogged defects of a
testing significant severity
Incoming defect It is the number of Typical Software quality The defect fix rate should
rate incoming defects per unit of assurance and be no less than the
time (i.e., hours, days, testing incoming defect rate
weeks, and month).
Defect fix rate The rate at which software
defects are fixed. It is
measured as the number of
defects fixed per unit of
time.
Defect days The number of days that The defects are being
number (DDN) defects remain in the removed during testing in
software system from a timely fashion
introduction to removal.
Measure requirements traceability during testing

During testing, the requirements traceability metric measures the number of requirements that are covered
from the completed test cases. This metric is in addition to the requirements traceability discussed in 5.3.9,
which measures the number of requirements addressed in the architecture, code and test cases.
Requirements traceability during testing is measured as shown in Equation (11):
R1
RT
= × 100% (11)
R2
where
RT is the value of the measure requirements traceability,
R1 is the number of requirements met by the completed test cases
R2 is the number of original requirements
131
IEEE Std 1633-2016
Measure corrective action effectiveness

When the underlying defect associated with one or more faults is corrected, there is a possibility that:
 The corrective action is not adequate.

 The corrective action introduces new defects that did not exist prior to the corrective action.
The “fix effectiveness” is the percentage of corrective actions that are adequate. The “regeneration rate” is
the rate at which new defects are created by correcting another defect. It is possible that a corrective action
is adequate but generates one or more new defects. It is also possible that a corrective action is inadequate
and generates one or more new defects. If the fix effectiveness and the regeneration rate are noticeably high
this is generally due at least one of the following:
 Very old software code

 Code that has been modified by many different people over time
 Poor corrective action processes and techniques
 Poor source control
If the regeneration rate is relatively high (> 5%) this should be considered when selecting the reliability
growth model as per 5.4.5. Several of the models assume that the corrective action is perfect. Those models
should either be avoided or adjusted to account for the increasing defect count. One simple approach to
adjust the model is to include an additional X% to the inherent defects predicted by the model where
X = the fix effectiveness rate + regeneration rate.
Measure defect density and KSLOC during testing
During the requirements, design and coding phases the defect density can be predicted as per 5.3.2.3.
During testing, the actual testing defect density can be measured directly and can be used to validate the
accuracy of the predicted defect density as per 5.4.7. Defect density as measured during testing is given as
shown in Equation (12):
∑
1
Di
i =1
DD = (12)
KSLOC
where
Di is the number of unique defects detected during the ith day of testing
KSLOC is the number of source lines of code (LOC) in thousands
The checklist for applying SR metrics during testing is shown in Figure 88.
a) Review the requirements traceability and test coverage during testing. Both should be approaching
100%. If it is not the SR growth models will likely not be accurate.
b) Review the corrective action effectiveness. If the percentage of corrective actions to the software
that are not effective exceeds 5%, an RCA should be conducted to determine the cause.
c) If this metric is selected, review the actual testing defect density. Compare it to the predicted testing
defect density as per 5.3.2.3 and 6.2.1. If the testing defect density is unusually high it may be an
indication of either better test coverage than expected or a bigger product than expected. Investigate
both before assuming one or the other.
132
IEEE Std 1633-2016
d) Review the number of backlogged defects. There should not be any high-priority backlogged
defects when transitioning to the next phase of testing or deployment regardless of whether the
reliability objective has been met.
e) If these metrics are selected, review the incoming defect rate against the defect fix rate. The rate of
defect fixing should be on par with the rate of incoming defects. This is another indicator of defect
pileup and impending schedule slippage.
f) If these metrics are selected, review the defect days’ number and the number of backlogged defects.
If this number for either is very high it could be an indication of defect pileup and impending
schedule slippage.
Figure 88 —Checklist for applying software reliability metrics during testing
5.4.7 Determine the accuracy of predictive and reliability growth models
This is an essential task. The reliability engineer has the primary responsibility for 5.4.7.1 and software
quality assurance and test engineer has the primary responsibility for 5.4.7.2. The purpose of this task is to
adjust the allocation of resources for additional reliability growth as necessary.
SR models can be used prior to testing (prediction models from 5.3.2.3) as well as during testing (reliability
growth models from 5.4.5). These models should be validated with actual failure data during testing and at
the end of testing to verify the accuracy of the selected models. The accuracy can be verified during any
major milestone during testing and particularly prior to making any release decisions as per 5.5. The
purpose of this task is to adjust the allocation of resources for additional reliability growth as necessary.
5.4.7.1 Determine the accuracy of prediction models used early in development
Several models can be applied early in development to predict SR. However, those models usually predict
reliability at the point in time of deployment. As a result, the accuracy of the early prediction models
cannot be validated until the software has been fielded for some period of time. Nevertheless, there are
methods to validate prediction models during testing that establish a relationship between the volume of
deployed defects and the volume of testing defects. This presents a method to validate any prediction model
during testing. Table 39 illustrates some typical ratios between testing and fielded defect density. 9
Distressed projects had different ratios from testing to fielded defects largely because of their approach to
testing.
Table 39 —Typical testing defect density

Predicted Average 3-year fielded Average system testing Average integration testing
outcome of defect density (defects per defect density (defects per defect density (defects per
project normalized EKSLOC) normalized EKSLOC) normalized EKSLOC)
Successful 0.0269 to 0.111 0.056 to 3.062 0.129 to 0.503
Mediocre 0.111 to 0.647 3.062 to 0.996 0.503 to 1.981
Distressed 0.647 and up 7.551 and up 1.981 to 13.995
The steps for determining the accuracy of predicted defect density are shown in Figure 89.
9
Copyright Ann Marie Neufelder, Softrel, LLC. Reprinted with permission of author.
133
IEEE Std 1633-2016
a) Determine the percentage of code that is currently complete and has been tested so far for this
release.
b) Multiply the result of a by the total normalized effective KSLOC (EKSLOC) predicted as per
B.1. This is the total normalized EKSLOC developed so far.
c) Collect how many unique defects have been reported so far on this code.
d) If availability of hardware or other resources is blocking test cases, divide the result of c) by the
percentage of blocked test cases.
e) Compute the result of step d) over the result of step b). This is the current testing defect density.
f) Predict the fielded defect density using one of the methods in 6.2.
g) Review Table 39. Identify the range of testing defect densities for this project.
h) Compare the result of f) to the result of g). If the actual testing defect density falls within the
upper and lower bounds then the prediction is validated.
Figure 89 —Checklist for validating the prediction models
See F.4.5.1 for an example.
5.4.7.2 Determine the accuracy of software reliability growth model
The accuracy of the SR growth models depends on the following:
 The fault and usage time data is collected in 5.4.4 is complete.

 The fault and usage time data collected in 5.4.4 is as granular as possible.
 The sheer volume of data collected in 5.4.4.
 The model selected represents the true fault profile (increasing, peaking, decreasing, stabilizing).
 The degree to which corrective actions introduce new defects.
 The degree to which the software is tested as per its OP, the requirements are verified, the structure
of the code is covered.
The first step in determining the model accuracy is to assess each of the preceding as shown in 0.
Once it has been established that the data and estimations are as accurate as possible, one of the simplest
ways to determine the accuracy of any SR growth model is to compare the estimation of the MTBF at any
point in time to the actual next time to failure. For example, if the model is estimating that the current
MTBF is 20 h, one can validate the estimate when the next failure occurs. Figure 91 contains the
procedures for determining the relative accuracy of the SR growth models.
134
IEEE Std 1633-2016
a) Review the fault and usage time data in 5.4.4. Is the usage time per time period normalized? If not,
the results are very unlikely to be accurate. Proceed back to 5.4.4, normalize the time data and
recompute all results. Otherwise proceed to step b).
NOTE—Example of usage time that is not normalized includes calendar time or test cases run.
b) Review the granularity of the data. Is the granularity in terms of faults per hour or day? If so, the
data is relatively granular. If the data is in terms of faults per week or month, the data is not
granular. Investigate whether the faults detected during each normalized usage hour or day during
testing is available. If the answer is no, proceed to step c) but understand that the models may not
be accurate with low granularity, particularly if there are fewer than 30 intervals of testing.
c) Review the sheer volume of data. If there are less than 30 observed faults or less than 30 time
intervals, the estimations will have fairly wide confidence bands. The practitioner should compute
the confidence bounds as per the instructions for the models. These bands are wider when there are
fewer data points.
d) Revisit the beginning of 5.4.5. Verify the actual observed fault trend to date and determine whether
it is increasing, peaking, decreasing or stabilizing. Verify that the models selected are
recommended for that particular fault trend. If the model is not recommended for the actual fault
trend or it is not known what the fault trend is, stop and revisit 5.4.5. Otherwise proceed to step e).
e) Review the corrective action effectiveness. If this is more than 5% the results of the models could
be affected. Adjust the model results by the corresponding percentage and proceed to step f).
f) Review the requirements coverage and structural coverage. If it is less than 100% the model results
may be optimistic. Compute the actual coverage as per 5.4.3 and adjust the estimates accordingly.
So, if there is 70% coverage, assume that the estimated total inherent defects are 30% higher than
estimated. Proceed to the next checklist in Figure 91.
Figure 90 —Pre-checklist for determining the accuracy of the reliability growth models
a) Retrospectively compute the relative error of each of the models. Compare the estimated results
with the actual results. So, if the model estimated that the failure rate would be a particular number
in 100 h of operation then that estimate should be compared against the actual failure rate once
100 h of operation has passed. Relative error = (actual – estimated)/estimated.
b) Rank the models based on the lowest relative error. Note that the model with the closest fit may
have changed during testing as the data trend changes.
c) The relative error can be tracked on a regular basis to establish which of the models (if any) are
providing the most accurate trend. The model that is currently trending the most accurately should
be used in making any deployment decisions.
Figure 91 —Checklist for determining the accuracy of the reliability growth models
It is important that a model be continuously rechecked for accuracy, even after selection and application, to
verify that the fit to the observed failure history is still satisfactory. In the event that a degraded model fit is
experienced, alternate candidate models should be evaluated. See F.4.5.2 for an example.
5.4.8 Revisit the defect root cause analysis
During testing the defect RCA discussed in 5.2.1 should be revisited now that there is testing defect and
failure data. The reliability growth models discussed in 5.4.5 estimate the volume of faults but do not
135
IEEE Std 1633-2016
predict the types of defects that are needed to identify the development activity that is associated with the
most software defects. The defect RCA identifies how these defects could have been prevented or
identified earlier. For example, if the most common software defect is related to faulty logic the software
engineering organization should consider incorporated checklists for typical logic related defects or
possibly automated tools that aid in the logic design.
A release decision is made prior to any external software release. All of these tasks that support the release
decision are applicable for incremental or evolutionary development because this task is performed prior to
the customer release. Table 40 lists the considerations for making a release decision.
Table 40 —Support release decision

Applicability for
Support release decision activity Purpose/benefits incremental
development
5.5.1 Determine release stability Determine whether the software is ready for All of these tasks
release are performed at
5.5.2 Forecast additional test duration If not, forecast how many testing hours would each external
5.5.3 Forecast remaining defects and effort be needed to reach the objectives or the number release
required to correct them of defects and effort required to correct them
5.5.4 Perform an RDT Determines statistically if the system objective
has been met
Most of the inputs are from SRE tasks performed during development and test. The inputs to the release
decision include as a minimum the test coverage from 5.4.3, the relative accuracy of the predictions and
estimations as well as the estimates themselves, the forecasted remaining defects and effort required to
correct them, forecasted defect pileup, and the release stability as a function of the required SR objective. If
the SRE plan includes these activities then the SFMEA report from step 5.2.2, the accept/reject decision
from the Reliability Demonstration Test (RDT) as per 5.5.4, and the forecasted additional test duration
from 5.5.2. Figure 92 illustrates the activities that support the release decision, whether the activities are
essential, typical, or project specific, and the inputs and outputs of each activity.
Figure 92 —Support release decision
136
IEEE Std 1633-2016
At release time some decisions need to be made. The major decision is whether or when to release the
software into operation or into the next phase of the system qualification. Releasing the software too early
can result in defect pileup and/or missed reliability objectives and can also cause the next software release
to be late if software engineering staff is interrupted by field reports. The modeling, analysis, and testing
practices of SRE are intended to produce information that supports acceptance, the process of making a
release decision. The most accurate prediction of operational reliability and lowest risk of defect escapes
therefore requires achieving adequate code coverage, defects discovered are approximately equal to the
predicted number of defects, and test suite is executed, which stresses the system. These criteria are based
on the following observations. The criteria for acceptance are shown in Table 41.
 Without minimal code coverage, there is no evidence about the reliability of uncovered code.
 Without testing that stresses a system under test (SUT) with extreme scenarios interleaved with
representative usage, there is no evidence of robustness.
 Without comparing actual revealed defects to a latent defect estimate, the risk of releasing a
product too early increases.
 Without testing long enough to discover the predicted number of defects, it is likely that latent
defects exist, even if all code has been covered.
 Without testing that achieves a representative sample of field usage, one cannot have confidence
that reliability observed in test is a meaningful indicator of operational reliability.
Table 41 —Criteria for acceptance

Criteria Reference
Adequate defect removal
The current fault rate of the software is not increasing 5.4.4
If a selected task, the results of any SFMEA indicates that there are no unresolved critical items. 5.2.2
The estimated remaining defects do not preclude meeting the reliability goal and/or do not 5.5.2, 5.5.3
require excessive resources for maintenance.
The estimated remaining escaped defects are not going to result in an unacceptable “defect 5.3.2.3 Step 2,
pileup” 5.4.6
Reliability estimation confidence
The relative accuracy of the estimations from 5.4.7 indicates confidence in the SR growth 5.4.7
measurements.
Release stability—reliability goal has been met or exceeded 5.5.1
If a selected task, the RDT indicates “accept” 5.5.4
Adequate code coverage
Recommended: 100% branch/decision coverage with minimum and maximum termination of 5.3.9.2, 5.4.3
loops. (Loops are executed no time or one time as well as maximum number of times.)
Adequate black box coverage
An operational profile (OP) is developed and validated, which includes separate profiles for any 5.4.1.1
critical operational mode, and representative test suites are executed. Reliability estimates are
meaningless without a validated OP, hence “None” is indicated for confidence in an estimate
without that.
Requirements are covered with 100% coverage. 5.4.1.2
Every modeled state and transition has been executed at least once. 5.4.1.3
Adequate stress case coverage
The extremes of variation in loading and resource availability/utilization are identified and 5.4.1.4, 5.4.1.5,
validated, target configurations are identified that achieve at least pairwise coverage of platform 5.4.2
options configurations; abuse cases are added to each OP. The mitigations that result from the
SFMEA are tested. Test suites are executed that interleave these variations with all OPs and
modes. Stress case testing cannot produce meaningful failure data unless it is conducted within
the scope an OP, hence it is a “DC” (Don’t Care) criterion without it.
137
IEEE Std 1633-2016
Figure 93 shows the recommended workflow to develop and use acceptance parameters. Prior to
acceptance, the OP coverage, code coverage, and stress point coverage should have already been measured.
The reliability growth models can be used to determine whether the required reliability has been met
(release stability), as well as estimate the remaining defects, defect pileup.
Figure 93 —Deployment acceptance process

As practical exigencies may not allow achievement of all acceptance criteria, SRE users should consider
the trade-off between increased risk of field failures and omission of recommended practices. Table 42
presents a generic analysis of the trade-offs of these practices and their consequences for reliability estimate
accuracy and the risk of latent defects. If none of these acceptance criteria are met, there is not confidence
in the reliability prediction and therefore there is a risk of frequent failures of unpredictable severity.
Conversely, if all acceptance criteria are met, one may have high confidence in the reliability prediction
and will have reduced the risk of field failures. (See Binder [B5].)
138
IEEE Std 1633-2016
Table 42 —Acceptance decision factors and confidence

Reliability Code Black box Stress case Risk of
Defect removal estimation coverage coverage coverage acceptance
No None No No NA Very high
Yes None No No NA High
No None Yes No NA High
No None No Yes NA High
Yes None Yes No NA High
No Low No Yes No High
Yes None Yes No NA Moderate
No Low No Yes Yes Moderate
Yes Moderate No Yes No Moderate
No Moderate Yes Yes No Moderate
Yes High No Yes Yes Low
No High Yes Yes Yes Low
Yes High Yes Yes No Low
Yes Very high Yes Yes Yes Very low
5.5.1 Determine release stability
This is an essential task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers.
During software development, the reliability of software will typically improve as the development process
progresses, that is, as capabilities or features are successfully developed and defects are removed from the
product. One measure of that overall software product quality is release stability. Release stability is the
degree to which the fully developed, released, and installed executing software is free of critical defects,
faults that when encountered will cause the software or computer system in which the software is installed
to stop execution, freeze critical interfaces until some action is taken, automatically restart, require an
operator initiated reset to restore expected operation, or otherwise prevent access to or execution of a
required critical capability of the system. The stability measure is a MTBF metric, a customer-oriented
metric.
An SRGM, either based on historical performance or making a current estimate and projecting the current
project stability measure toward the targeted release date, is a typical measurement approach for assessing
release stability during software development. Simply counting and accumulating critical fault counts
during the development process, the software developer may model the arrival rate of these defects and
project forward the likely defect counts based on the SRGM. Periodic assessments of the model fit to actual
data may be necessary. Alternatively, for example, critical faults may be counted during an RDT and a
forecast of stability calculated using time between failure measurements as per 5.4.5. Through this software
data and metric study or review, the developer can estimate when the software reaches the required
reliability to release for further test or release for customer use. Oftentimes however, software release
decisions are made based on a number of factors (e.g., cost, schedule, quality); stability is only one of those
factors. Armed with this information and other in-process software development metrics as needed, the
software team has the methods and tools to assess software stability during development to make software
release decisions (or alternatively, to continue the test, analyze, and fix process).
The date at which a given reliability goal can be achieved is obtainable from the SR modeling process
illustrated in Figure 94. As achievement of the reliability target approaches, the adherence of the actual data
to the model predictions should be reviewed and the model corrected, if necessary. Figure 95 discusses
stability of the software for release.
139
IEEE Std 1633-2016
NOTE—Reprinted with permission from Lockheed Martin Corporation article entitled “Determine Release Stability”
© Lockheed Martin Corporation. All rights reserved.
Figure 94 —Example SR measurement application
a) Examine the estimated current failure from 5.4.5, 6.3, and Annex C.
b) Compare it to the required failure rate from 5.3.1.
c) Has it been met? If not, use the procedures in 5.5.2 to estimate when it will be met.
Figure 95 —Determine release stability
5.5.2 Forecast additional test duration
This is a typical task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers.
Additional test duration should be predicted if the initial and objective failure rates and the parameters of
the model are known. By the end of testing the software should be stable enough to use an exponential
model for forecasting additional test duration. See Equation (13).
∆t = (N0/λ0) × ln(λ0/λf) (13)
where
∆t is the number of test hours required to meet the objective

N0 is the estimated inherent defects
λ0 is the initial failure rate (the actual very first observed failure rate from the first day of testing)
λp is the objective or desired failure rate
Once the ∆t is computed, it should be divided by the number of work hours per day or week to determine
how many more days or weeks of testing are required to meet the objective.
5.5.3 Forecast remaining defects and effort required to correct them
This is an essential task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to software management.
140
IEEE Std 1633-2016
Additional defects required to be fixed to meet the objective can be solved for by using Equation (14):
∆n = N0 /( λ0 × (λp – λf)) (14)
where
∆n is the remaining number of defects assuming an exponential model

λ0 is the initial failure rate (the very first failure rate data point from first day of testing)
λf is the objective (desired) failure rate
λp is the present failure rate
Once the estimated defects required to meet the reliability objective is calculated, it can then be used to
estimate the number of engineers that need to be staffed to detect and correct them. For example, if a
typical defect requires 4 h of testing labor to detect and verify the corrective action and requires 4 h of
development labor to isolate and correct, then the number of people needed to test and perform corrective
action can be determined.
The preceding indicates the risk of releasing this version when there are not sufficient people to correct the
defects. However, there is also the risk of releasing the software when there will be defects to correct in
future releases as well. The defect pileup should also be estimated as per the instructions in 5.4.6.
An alternate method to estimate defects is the capture/recapture method described in C.5.
5.5.4 Perform a Reliability Demonstration Test
This is a project specific task that is typically performed by the reliability engineer in conjunction with an
existing hardware Reliability Demonstration Test.
A Reliability Demonstration Test (RDT) (MIL-HDBK 781A [B55]) uses statistics to determine whether a
system meets a specific reliability objective during the final testing. RDT has been used for hardware
demonstration for several decades. It is most useful for demonstrating the system reliability that includes
the software. MIL-HDBK 781A [B55] contains instructions for performing an RDT. The RDT can be
applied to software by simply tracking both the hardware and software failures against the system
reliability objective.
Software Reliability Engineering [B88] and The Handbook of Software Reliability Engineering [B89]
discuss RDT for assessing the reliability of software systems. Once all software development testing has
been accomplished (i.e., required capabilities developed, integrated, and verified, and defects corrected)
and the RDT system has been prepared, then RDT can begin. The method establishes an OP for the RDT,
select tests randomly, conduct the tests, and “do not fix” discovered defects during testing. As test time is
accumulated and defects are discovered, these are plotted on a chart similar to the following. Progress
during RDT is noted on the chart until either an “accept” or a “reject” status is achieved. See Figure 96.
There are additional methods for conducting an RDT and obtaining a useable MTBF result. MIL-HNDK-
781A [B55] discusses a variety of methods and associated mathematics. One in particular includes a time-
terminated acceptance test where RDT is conducted until a particular amount of test time has been
accumulated. Based on the cumulated test time and number of faults discovered, an estimate of the MTBF
is computed. Note that repairing the discovered faults after the RDT does not guarantee a higher quality
software product without performing another RDT since the software release is a new (different) product
due to the change in the code.
141
IEEE Std 1633-2016
NOTE—Reprinted with permission from Lockheed Martin Corporation article entitled “Perform a Reliability
Demonstration Test” © Lockheed Martin Corporation. All rights reserved.
Figure 96 —Reliability Demonstration Test

When used, RDT provides a practical method and measure for accessing the reliability of the proposed
software product release.
The checklist for performing an RDT is shown in Figure 97.
a) Perform all developer, software, and system tests. The RDT should not be conducted until all tests
are run and all defects that impact reliability are removed.
b) Select the alpha (consumers risk of accepting software that does not meet the object) and beta
(producers risk of rejecting software that is acceptable) for the test and identify the objective in
terms of failure rate.
c) Start the test by using the software in an operational environment. If there is an RDT planned for
the hardware, the software is simply part of that test in that the failures due to hardware and
software are tracked.
d) Whenever a failure is discovered it is plotted regardless of whether it is due to the software or
hardware.
e) When the trend reaches the “reject” range the test ends and the objective is determined to not be
met with beta confidence
f) When the trend reaches the “accept” range the test ends and the objective is determined to be met
with beta confidence.
g) Otherwise if the trend is in the “continue testing” range then the test continues
h) If a failure due to software is encountered the underlying defect have to be fixed to continue
testing; the test also ends with a reject status.
Figure 97 —Checklist for performing Reliability Demonstration Test
5.6 Apply software reliability in operation
During operation the SRE tasks shown in Figure 98 can and should be executed. The inputs and outputs for
each task described in the figure are evaluated in support of the release decision. None of the tasks in this
subclause are affected by the LCM since these activities take place once the software is deployed.
142
IEEE Std 1633-2016
Figure 98 —Apply software reliability in operation

Table 43 shows the purpose and benefit of each of the SRE activities used in operation.
Table 43 —SRE Tasks employed during operation

SRE activity in operation Purpose/benefit
5.6.1 Employ SRE metrics to monitor operational Determines whether the predicted and estimated operational
software reliability reliability is accurate
5.6.2 Compare operational reliability to predicted and Provides a means to calibrate future predictions and
estimated reliability estimations
5.6.3 Assess changes to previous characterizations or Allows for improvements to be made in development,
analyses analyses, and testing as well as reliability prediction
5.6.4 Archive operational data Saves valuable information for the rest of the organization and
future releases and products
5.6.1 Employ software reliability metrics to monitor operation
results of this task are supplied to the reliability engineers and software management.
Table 44 lists the metrics that can be used in operation regardless of whether there is a waterfall,
incremental, or evolutionary LCM. These metrics are categorized by the following:
 Indication of accuracy in the size, defect density, and reliability growth assumptions
 Ability to support the fielded software without impacting the future releases
 Indication of software release success
 Ability to compare multiple similar products
143
IEEE Std 1633-2016
Table 44 —Software reliability metrics used to monitor the operational reliability

adoption rate
Software reliability Recommended
Definition Typical goal
metrics priority
Size, defect density, and reliability growth accuracy
Defect density and The actual fielded defect density is If the practitioner is The actual defect density
KSLOC computed using only fielded using the predictor is not different from
defects and the actual size in models this is expected.
KSLOC. highly
recommended.
Immediate customer Number of severity 1 and 2 defects Typical If this number is
found defects found by the customer within the substantial that is an
first 6 months. indication of low black
box or structural
coverage.
Ability to support the field
Number of backlogged The number of customer submitted Typical The backlogged defects
defects backlogged defects (defects from are not growing in volume
prior releases). and hence causing defect
pileup.
Incoming defect rate The number of customer submitted Highly The incoming defect rate
defects. recommended is not greater than the
Defect fix rate The rate of fixing customer defect fix rate.
submitted defects
Indication of customer satisfaction
SW adoption rate Rate at which customers are Project specific High rates are usually
downloading new software. indicative of a successful
release.
Ability to compare multiple releases or products
SWDPMH Software defects per million hours. Project specific Goal is based on the
complexity and size of the
product and release. See
5.6.1.
5.6.1.1 Software defects per million hours (SWDPMH)
Recall that software failure rate is a function of the software’s duty cycle and the size of the software. For
that reason, software failures rates are difficult to compare across different types of systems. Therefore, the
defect density metric is used since it is a normalized metric. However, if one would like to compare
systems that have a similar duty cycle and size, the SWDPMH metric (Smidts et al. [B85]) can be used as
long as the systems under comparison are relatively the same size (in lines of code) and operate relatively
the same amount with similar install base sizes.
This metric is useful for systems that will be mass deployed as a million hours of usage is required. If the
system is not mass deployed the unit of measure can be simply converted to hours instead of millions of
hours. The metric shows the rate at which software defects are occurring per cumulative million hours of
user system / product usage. Each defect experienced by the user regardless of whether its severity is
counted in calculating the SWDPMH. There will be a lag between the time software is released and the
start of calculating SWDPMH. This time is required to build an acceptable install base. The size of the
install base of any software release could be different, which creates a challenge for organizations to
compare products against each other; this is the reason why SWDPMH is normalized. As long as the
products that are being compared belong to the same family, SWDPMH can be used to determine the
superiority of one product over others.
One key benefit of SWDPMH is the ability to measure the reliability of the software by various released
versions. It is important to identify best practices that helped a version be more successful than the previous
versions. The calculation methodology is straightforward and utilizes number of the defects reported by the
144
IEEE Std 1633-2016
customers, which includes all severity (1, 2, 3, etc.) and number of units installed. There are approximately
730 h in a month. So assuming that the software is running continually [see Equation (15)]:
(sum of defects in the last 3 months × 1 million)

SWDPMH = --------------------------------------------------------------------- × correction factor (15)
(sum of installed units for the last 3 months × 730)
“Correction factor” = adjustment for percentage of customers who do not have the associated version. A
correction factor of 1.02 implies 2% of customer found defects did not have an associated version.
Figure 99 is the checklist for using SR metrics in operation.
a) Compute the actual defect density and KSLOC for the software in operation as per Table 44. Verify
that the actual defect is within range of the predicted defect densities from 5.3.2.3 Step 1. If not,
analyze the predicted versus actual size, defect density, or reliability growth to determine which of
these is not within the predicted range. It is possible that the actual defect density is different from the
predicted defect density because the KSLOC is much bigger than expected or the reliability growth is
much smaller than expected, for example.
b) Compute the immediate customer found defects as per Table 44. If the customer is finding a
substantial number of severity 1 and 2 defects in the first 6 months of operation, the test coverage
metrics in 5.4.3 and 5.3.9.2 should be revisited.
c) Compute the metrics related to field support. If any of these metrics indicate that the field issues are
not getting fixed as fast as they are occurring then the maintenance schedule should be revisited. This
can cause defect pileup for future releases.
d) If there are multiple customers, compute the software adoption rate. If the rate is high, that is usually
an indication of a successful release.
e) Compute the SWDPMH across several fielded software products. Compare similar products to
determine typical rates that can then be used in the prediction and estimation models for the next
release.
Figure 99 —Checklist for using software reliability metrics in operation
5.6.2 Compare operational reliability to predicted reliability
This is a typical task that is typically performed by the reliability engineer. During operation, the reliability
of the software can be measured by collecting the actual failure reports from end users and customers. The
failure data should be filtered and organized similarly to testing data as discussed in 5.4.4. The actual
operational failure rate is simply the failures recorded during some interval of time divided by the actual
operating hours during that time interval. The reliability and availability can also be computed from the
operational data similarly to how it is computed during testing.
Following a software product release for customer use, the customer or user of the software product may
track observed critical faults during operational use and may make software stability performance
measurements. This software performance information could be useful in sharing with the development
organization [and other customer(s)] providing useful validation feedback on the stability measure and the
software product and/or as input to a follow-on product upgrade or development.
The operational reliability figures of merit are straightforward to compute. The important part is monitoring
these metrics and archiving them in such a way that the actual field SR can be used to calibrate the
prediction models used during development as well as the estimative models used during testing. During
operation the actual number of defects discovered should be compared against the predicted and estimated.
If there are significant differences, the root causes for inaccuracy should be investigated as per Table 45.
145
IEEE Std 1633-2016
Table 45 —Factors in determining root cause inaccuracies
Pertains to reliability
prediction models
growth models
Pertains to
Potential root
Effect on
cause for Rationale Recommendation
accuracy
inaccuracy
The software is Yes. No. Linear The more code there is, the Use the difference in size to
much bigger more defects there will be. establish confidence bounds
than predicted for future predictions.
Insufficient test Yes Yes Nonlinear SR growth models can Measure test coverage as per
coverage underestimate defects if the 5.4.3.
test coverage during testing is
relatively low.
Optimistic Yes Yes Nonlinear If the software is late or if See 5.3.2.3 Steps 2 and 3
assumptions new features are introduced concerning defect pileup that
about reliability before the SR has grown this can result from overestimating
growth can affect the actual reliability growth.
reliability.
The model Yes No Linear It is not uncommon for The defect density models
assumes planned practices to be should be kept up to date
better/worse abandoned during during development. The
development development. development practices should
practices than be closely monitored to verify
actually that the model is capturing the
employed true development capabilities.
The predictive Yes N/A Un- The models with more inputs Revisit 5.3.2.3, 6.2, and
model chosen predict- are usually more accurate than Annex B for selecting the best
does not have able the models with fewer models. Predict the
many inputs inputs—but only if the model confidence bounds when
is used properly and the inputs doing the predictions.
are correct.
The model was Yes Yes Un- Small errors in using the Revisit 5.3.2.3, 6.2, and
used incorrectly predict- models can result in big errors Annex B to verify that the
able in accuracy. predictive models are used
correctly. Revisit 5.4.4, 5.4.5,
6.3, Annex C to verify that the
reliability growth models are
used properly.
0 is the checklist for comparing operational reliability to predicted reliability.
146
IEEE Std 1633-2016
a) Compute the actual effective size of the fielded software to the predicted effective size. Compute
the relative error. That relative error is directly related to any error in the reliability objective.
b) Compute the actual test coverage during testing. If it is less than 100% that could be why the
predicted reliability is optimistic.
c) Compare the actual reliability growth in terms of operational hours to the predicted reliability
growth in terms of hours. If the actual reliability growth is less than the predicted reliability growth
that will result in an optimistic prediction.
d) Compare the actual development practices that were employed on this software version to those
that were planned. If there are any differences that will usually cause the predictions to be
inaccurate either optimistically or pessimistically.
e) Review the prediction models and inputs. If the model has only a few inputs or those inputs are not
correct that could cause the predicted reliability to be different from the actual reliability.
f) If all of the preceding have been investigated and there are no discrepancies between the predicted
and actual assumptions then note the difference between the predicted and actual reliability and
proceed to 5.6.3.
Figure 100 —Checklist for comparing operational reliability to predicted reliability
5.6.3 Assess changes to previous characterizations or analysis
results of this task are supplied to the reliability engineers. In addition to making continuous improvements,
the actual reliability is measured so that previous characterizations and analyses can be changed
appropriately. Figure 101 is the checklist for assessing changes to previous characterizations or analyses.
Starting from 5.6.2:

a) The difference between the estimated and actual size should now be used to adjust future size
estimation efforts. Example: If the actual size is twice as big as predicted then future size estimates
should be calibrated by that amount.
b) The actual reliability growth should be used in future predictions unless management is willing and
able to provide extra test equipment and time to increase reliability growth.
c) Determine the source of inaccuracy using the table in 5.6.2. If the predicted models were used
improperly or with faulty inputs or assumptions, recalculate the figures of merit and compare those
to the actual figures of merit.
d) Determine any consistent relationship between predicted and actual values. If the actual reliability
figures of merit are consistently offset by some factor than one can calibrate the predictive values
by that factor.
e) Review the root causes of the software defects found in operation. Compare them to the RCA
conducted in testing. Are the root causes similar? If so, then any improvement activities related to
those root causes are still applicable. If not, it is possible that the defects found in testing are not
similar to the defects found in operation. In that case, the improvement activities such as the
SFMEA should focus on the root causes found in operation as well as testing. If the types of
failures experienced in operation are not as expected then 5.2.1 and 5.2.2 may need to be updated
accordingly. Updates may also be needed to 5.4.1.
f) Review the most severe failures found in operation. Update the FDSC as needed in 5.1.2. Define
failures and criticality, and the risks identified in 5.1.3.
147
IEEE Std 1633-2016
g) If it is clear that some of the tasks that were not included in the SRE plan should have been
included then update the plan as per step 5.1.6.
h) If there is a failure found in operation that is intermittent and likely caused by software or an
interaction between software and hardware, revisit the task to put software on the system FTA.
i) To improve the accuracy of future objectives and predictions, revisit 5.3.1. Determine system
reliability objective, 5.3.3. Sanity check the early predictions, 5.4.7. Validate the prediction and
estimation models and revisit 5.3.2.3 Step 2 to forecast defect pileup.
Figure 101 —Checklist for assessing changes to previous characterizations or analyses
5.6.4 Archive operational data
results of this task are supplied to the reliability engineers. The reliability prediction process is improved
over time by incorporating actual data with predictive data. Figure 102 is the checklist for archiving
operational data. For an example, see F.6.
a) Once there is at least 3 years of data from a particular release, archive the actual defects, failure
rate, defect profile, reliability and/or availability, and use these to predict the figures of merit for
the next operational release. Note that it is fairly typical for software releases to be spaced a few
months or a year apart. However, a particular version can still result in fielded defects well beyond
the next release because the end users have not yet uncovered all defects. Hence, 3 or more years of
data is to be collected for defects that originate in this release. The software engineer who corrects
the defect should know which version the defect originated in. Count all defects for at least 3 years
that originate with this particular release.
b) Keep in mind that the next operational release may be bigger or smaller or may have different
development practices or different people developing the software. Hence, even when actual field
data is available, it will still need to be calibrated. Several of the prediction models in 6.2 can be
used for such a calibration.
Figure 102 —Checklist for archiving operational data
6. Software reliability models
6.1 Overview
There are two basic types of software reliability (SR) models. SR prediction models are used early in
development for assessment and risk analysis. SR growth models are used during testing to forecast the
future failure rate or number of defects. Table 46 is a comparison of the prediction models versus the SR
growth models:
148
IEEE Std 1633-2016
Table 46 —Software predictor models versus software reliability growth models

Comparison
Software reliability prediction models Software reliability growth models
attribute
Used during this Any phase as long as the scope of the software is After software system testing
phase of defined commences
development
Inputs  Indicators such as development practices,  Defects observed per time or
personnel, process, inherent risks defects observed during some
 Size internal.
 Duty cycle  Testing hours per interval.
 Expected reliability growth
Outputs Predicted defects, failure rate, reliability, availability
Benefits  Allows for a sensitivity analysis before the  Identifies when to stop testing
software is developed  Identifies how many people are
 Allows for determination of an allocation needed to reach an objective
before the software is developed  Validates the prediction models
 Identifies risks early
 Useful for assessing vendors
Model framework Uses empirical data from historical projects in which Uses various statistical models to
the development practices and the delivered defects forecast based on the current trend
are known
The prediction models are discussed in 6.2 and Annex B and the reliability growth models are discussed in
6.3 and Annex C.
6.2 Models that can be used before testing
All of the following models predict defect density and can be used prior to testing because they use
empirical versus observed inputs. Refer to 5.3.2.3 and in particular Figure 41 and Table 19 for instructions
on the steps needed for prediction models prior to using them.
6.2.1 Models that predict defect density
6.2.1.1 Shortcut Defect Density Prediction Model
This model (Neufelder [B63]) assumes that the defect density is a function of the number of risks versus
strengths regarding this release of the software. The steps are shown in Figure 103. The survey is shown in
Table 47. 10
10
Shortcut Model Survey reprinted with permission Softrel, LLC. “A Practical Toolkit for Predicting Software Reliability,” A. M.
Neufelder, presented at ARS Symposium, June 14, 2006, Orlando, Florida © 2006.
149
IEEE Std 1633-2016
a) Answer yes or no to each of the questions in Table 47. For questions 5, 12, and 13 in the Strengths
and 4 and 5 in the Risks, an answer of “somewhat” is allowable.
b) Count the number of yes answers in the Strengths. Assign 1 point for each yes answer. Assign 0.5
point for each “somewhat” answer.
c) Count the number of yes answers in the Risks. Assign 1 point for each yes answer. Assign
0.5 point for each “somewhat” answer.
d) Subtract the result of step c) from the result of step b).
e) If the result of step d) ≥ 4.0, predicted defect density = 0.110, if result of step d) ≤ 0.5, predicted
defect density = 0.647, otherwise predicted defect density = 0.239 in terms of defects per
normalized EKSLOC.
f) The resulting defect density prediction is then multiplied by the normalized effective KSLOC
prediction.
Figure 103 —Predict defect density via the shortcut method
Table 47 —Shortcut model survey

Strengths
1 We protect older code that should not be modified.
2 The total schedule time in years is less than one.
3 The number of software people years for this release is less than seven.
4 Domain knowledge required to develop this software application can be acquired via public domain in short
period of time.
5 This software application has imminent legal risks.
6 Operators have been or will be trained on the software.
7 The software team members who are working on the same software system are geographically co-located.
8 Turnover rate of software engineers on this project is < 20% during course of project.
9 This will be a maintenance release (no major feature addition).
10 The software has been recently reconstructed (i.e., to update legacy design or code).
11 We have a small organization (<8 people) or there are team sizes that do not exceed 8 people per team.
12 We have a culture in which all software engineers value testing their own code (as opposed to waiting for
someone else to test it).
13 We manage subcontractors: outsource code that is not in our expertise, keep code that is our expertise in house.
14 There have been at least four fielded releases prior to this one.
15 The difference between the most and least educated end user is not more than one degree type (i.e.,
bachelors/masters, high school/associates, etc.).
Risks
1 This is brand new release (version 1), or development language, or OS, or technology (add one for each risk).
2 Target hardware/system is accessible within 0.75 points for minutes, 0.5 points for hours, 0.25 points for days,
and 0 points for weeks or months.
3 Short term contractors (< 1 year) are used for developing line of business code.
4 Code is not reused when it should be.
5 We wait until all code is completed before starting the next level of testing.
6 Target hardware is brand new or evolving (will not be finished until software is finished).
7 Age of oldest part of code >10 years.
6.2.1.2 Application type
This model assumes that software defect density is directly related to the application type or industry. The
checklist for using this model is shown in Figure 104.
150
IEEE Std 1633-2016
a) If the size estimates are in terms of KSLOC, select the application type from Table 48, which is the
closest to the software under analysis.
b) Otherwise if the size estimates are in terms of function points, select the application type from
Table 49 that most closely fits.
c) Select the associated defect density from the table.
d) If parts of the software have a different application type then compute a weighted average of the
application types based on the size of the software
e) The resulting defect density prediction is then multiplied by the normalized effective KSLOC
prediction or the estimated number of function points.
Figure 104 —Predict defect density via industry/application type average lookup tables
Note that application type lookup tables have been developed in the past. Table 48 reflects the most current
data and technology. (See Lakey, Neufelder [B46], Neufelder [B68], SAIC [B77].
Table 48 —Average defect densities by application type (EKSLOC)

Operational defect density a ± Testing defect density b ±
Defense 0.444 0.036 1.161 0.902
Space 0.229 0.046 0.646 0.129
Medical/healthcare 0.508 0.395 2.395 1.347
Commercial electronics 0.188 0.141 2.990 2.894
Commercial transportation 0.036 0.007
Commercial software 0.143 0.083 0.095 0.019
Energy 0.657 0.131
Semiconductor fabrication 0.737 0.267 5.064 5.172
Non-commercial vehicle 0.987 0.011 2.401 1.852
Satellite 0.102 0.056 0.654 0.481
Missiles 0.011 0.002 0.096 0.057
Software only 0.248 0.218 14.176 13.871
Equipment 0.704 0.248 1.451 0.496
Sensor or FW 0.229 0.046 0.461 0.092
Device 0.338 0.259 3.616 2.707
Aircraft 0.036 0.007
a
(For 3+ years of fielded operation) in defects/normalized EKSLOC.
b
Average testing defect density in defects/normalized EKSLOC does not include operational defects.
Example: Forty percent of the code is low-level software supporting a device. Sixty percent of the code is
performing a healthcare function. Predicted defect density = (0.4 × 0.338) + (0.6 × 0.508) = 0.44
defects/normalized EKSLOC. The confidence bounds on the device prediction is ±0.259 and the
confidence bounds on the medical function prediction is ±0.395. The weighted average is
0.1036 + 0.2340 = 0.3376. The upper and lower bounds for the combined prediction is therefore
0.44 defects/normalized EKSLOC ±0.3376 defects/ normalized EKSLOC, which yields a range of
0.1024 to 0.7776 defects per normalized EKSLOC. See B.1 for more information on normalized EKSLOC.
Table 49 shows average defect density by several different industries in terms of defects per function
point. 11 The first column is the average defects per function point at delivery while the last column is the
average defect per function point prior to delivery. The average defect removal efficiency is also shown.
Recall that this is the percentage of total defects that are removed prior to deployment. The typical removal
efficiencies are also shown. See Jones [B41].
151
IEEE Std 1633-2016
Table 49 —Typical defect densities by application type in terms of

defects per function points
U.S. industry quality ranges circa 2015
Defect Defects per function
Delivered defects
Industry removal point (prior to
per function point
efficiency delivery)
Manufacturing—medical devices 0.02 99.5% 4.9
Manufacturing—aircraft 0.05 99.0% 5.0
Smartphone/tablet applications 0.07 98.0% 3.3
Government—intelligence 0.08 98.5% 5.5
Manufacturing—telecommunications 0.12 97.5% 4.8
Process control and embedded 0.12 97.5% 4.9
Telecommunications operations 0.13 97.5% 5.0
Transportation—airlines 0.13 97.5% 5.0
Manufacturing—pharmaceuticals 0.14 97.0% 4.55
Software (commercial) 0.14 96.0% 3.5
Professional support—medicine 0.14 97.0% 4.8
Manufacturing—electronics 0.15 97.0% 5.0
Entertainment—films 0.16 96.0% 4.0
Manufacturing—chemicals 0.17 96.5% 4.8
Manufacturing—defense 0.17 97.0% 5.6
Manufacturing—appliances 0.17 96.0% 4.3
Manufacturing—automotive 0.18 96.3% 4.9
Insurance—Life 0.18 96.0% 4.6
Banks—commercial 0.19 95.5% 4.2
Banks—investment 0.19 95.5% 4.3
Insurance—property and casualty 0.2 95.5% 4.5
Government—military 0.2 96.5% 5.8
Pharmacy chains 0.21 94.5% 3.75
Software (outsourcing) 0.21 95.2% 4.45
Government—police 0.22 95.5% 4.8
Insurance—medical 0.22 95.5% 4.8
Open source development 0.22 95.0% 4.4
Social networks 0.22 95.5% 4.9
Games—computer 0.23 94.0% 3.75
Entertainment—television 0.23 95.0% 4.6
11
Typical defect densities by application type in terms of defects per function points reprinted with permission of Capers Jones,
“Software Industry Blindfolds: Invalid Metrics and Inaccurate Metrics,” Namcook Analytics © 2015.
152
IEEE Std 1633-2016
Table 49—Typical defect densities by application type in terms of

defects per function points (continued)
Defect Defects per function
Delivered defects
Industry removal point (prior to
per function point
efficiency delivery)
Transportation—trains 0.24 95.0% 4.7
Public utilities—electricity 0.24 95.0% 4.8
Public utilities—water 0.24 94.5% 4.4
Accounting/financial consultants 0.25 93.5% 3.9
Professional support—law 0.26 94.5% 4.75
Credit unions 0.27 94.0% 4.5
Manufacturing—nautical 0.28 94.0% 4.6
Sports (pro baseball, football, etc.) 0.28 93.0% 4.0
Publishing (books/journals) 0.29 93.5% 4.5
Manufacturing—apparel 0.3 90.0% 3.0
Transportation—bus 0.3 94.0% 5.0
Hospitals—administration 0.34 93.0% 4.8
Consulting 0.36 91.0% 4.0
Transportation—ship 0.39 92.0% 4.9
Entertainment—music 0.4 90.0% 4.0
Other industries 0.41 91.0% 4.5
Natural gas generation 0.41 91.5% 4.8
Automotive sales 0.43 91.0% 4.75
Games—traditional 0.44 89.0% 4.0
Wholesale 0.44 90.0% 4.4
Oil extraction 0.45 91.0% 5.0
Real estate—commercial 0.45 91.0% 5.0
Education—University 0.45 90.0% 4.5
Hotels 0.48 89.0% 4.4
Retail 0.53 89.5% 5.0
Stock/commodity brokerage 0.54 89.5% 5.15
Real estate—residential 0.55 88.5% 4.8
Education—primary 0.56 87.0% 4.3
Education—secondary 0.57 87.0% 4.35
Manufacturing—general 0.57 88.0% 4.75
Construction 0.61 87.0% 4.7
Mining—metals 0.61 87.5% 4.9
Waste management 0.64 86.0% 4.6
Transportation—truck 0.65 86.5% 4.8
Automotive repairs 0.65 87.0% 5.0
153
IEEE Std 1633-2016
Table 49—Typical defect densities by application type in terms of

defects per function points (continued)
Delivered
Delivered defects Delivered defects
Delivered defects per function point defects per
per function point per function point
function point
Mining-coal 0.68 86.5% 5.0
ERP vendors 0.69 89.0% 6.25
Food—restaurants 0.70 85.5% 4.8
Agriculture 0.72 87.0% 5.5
Government—municipal 0.74 86.5% 5.5
Government—state 0.76 86.5% 5.65
Government—county 0.83 85.0% 5.55
Government—federal civilian 0.92 84.8% 6
Averages 0.36 92.5% 4.69
6.2.1.3 Capability Maturity Index
This model (Neufelder [B68]) assumes that software defect density is directly related to the SEI CMMI
level. The checklist for using this model is shown in Figure 105.
a) Select the CMMI level from Table 50 that is the closest to the software organization associated
with the software LRU under analysis.
b) Select the associated defect density from the table.
c) If parts of the software are developed by different organizations then compute a weighted average
of the CMMI levels based on the relative size of the software
d) Compute the upper and lower bounds by adding/subtracting the associated value in the ± column
e) The resulting defect density prediction is then multiplied by the normalized effective KSLOC
prediction.
Figure 105 —Predict defect density via capability maturity lookup tables
Note that several models based on the CMMI have been developed over the years (Keene [B43]). However,
Table 50 has data associated with modern development and systems. Example: 40% of the code is being
developed by an organization that has been assessed at CMMI level 2. Sixty percent of the code is being
developed by an organization that has been assessed at CMMI level 3. Fielded defect density prediction =
(0.4 × 0.182) + (0.6 × 0.101) = 0.1334 defects/KSLOC. The upper and lower bounds are (0.4 × 0.086) +
(0.6 × 0.081) = 0.083 defects/KSLOC. The upper bound is therefore = 0.1334 + 0.083 = 0.2164 and the
lower bound = 0.1334 – 0.083 = 0.0504 defects/KSLOC.
154
IEEE Std 1633-2016
Table 50 —Average defect densities by capability maturity level

Defect density (for 3+ years of
SEI CMMI Testing defect density in
fielded operation) in ± ±
level a defects/normalized EKSLOC
defects/normalized EKSLOC
1 or unrated 0.548 0.208 3.563 3.140
2 0.182 0.086 3.554 2.755
3, 4, or 5 0.101 0.081 1.356 0.351
a
Capability Maturity Model Integrated (CMMI) and Software Engineering Institute (SEI).
6.2.2 Models that can be used for planning the failure rate
6.2.2.1 Exponential Model
The following exponential model is used to determine when the predicted defects will become observed
faults.
Faults predicted per month (month i) = N (exp(–Q×(i–1)/TF) – exp(–Q×i/TF)) (16)
where
N= total predicted fielded defects (area under the shaded section)

Q= growth rate in operation (post deployment) = ln (slope of the cumulative faults versus the
cumulative defect rate)
TF = number of continually operating months that software typically grows before reaching a
plateau = (default = 48 months)
i= iterates between 1 and TF
Notice that Equation (16) is an array of values starting from i = 1, which is the first month of operational
usage extending to i = n, which is the last month of growth. This is because fault rate is assumed to be
trending downward because 1) either they will be circumvented and therefore not repeated again, or
2) they will be corrected in a maintenance release or patch prior to the next major release.
The growth rate and growth period vary as a function of each other. The bigger the growth rate, the faster
the SR grows or stabilizes. So, the bigger the growth rate, the shorter the growth period and vice versa. For
experimental systems in which the hardware is often not available until deployment, the growth rate of the
software may be very high. For systems that have staggered deployment over a very long period of time,
the growth rate might be relatively flat. See Figure 106 for an illustration of the growth periods and growth
rates.
155
IEEE Std 1633-2016
Figure 106 —Fault profile prediction over the release development

via the Exponential Model
An example of how the growth rate and growth period work is shown in Figure 107.
Figure 107 —Comparison of different growth rates

The growth rate is dependent on how many systems are deployed and how many end users are using the
system as shown in Table 51. See Neufelder [B67].
156
IEEE Std 1633-2016
Table 51 —Typical growth rates

Defects
removed in first
Growth period in
year of
Description months (continually Q
continual
operating)
operation
(%)
Mass distributed software Volatile 97 32 9.0
Several installed sites but not mass Average 78 48 6.0
distributed
One of a kind system— Moderate 57 64 4.5
(<=3 total installed sites)
Very slow deployment Slow 31 96 3.3
a) Predict the total defects N from Step 1 in 5.3.2.3.

b) Predict the growth rate Q and the amount of time that the software typically grows (TF) from
Table 51.
c) Predict the array of faults per month i using the following formula:
Faults predicted per month (month i) = N (exp(–Q×(i–1)/TF) – exp(–Q×i/TF))
d) The result is a profile of the when the defects will manifest into faults over intervals of usage
time such as a month.
e) To predict the defect pileup, repeat preceding steps a) and b) for several future releases. Arrange
the profiles in order of calendar time (i.e., when the releases are scheduled to be deployed).
Review the combined profile and determine if the predicted defects to be found in operation are
increasing in size from release to release.
Figure 108 —Step 2: Predict the distribution of faults
6.2.2.2 AMSAA PM2 Model
The AMSAA PM2 utilizes planning parameters that are directly influenced by program management,
which include the following:
a) Mi , the planned initial system MTBF.
b) MS, the management strategy, which is the fraction of the initial failure rate addressable via
λB
corrective action; MS is defined as . In the definition of MS, λB and λ A represent the
λB + λ A
portion of the initial system failure intensity that program management will and will not address via
corrective action, respectively. The initial failure intensity λ= i λ A + λB and the failure modes
comprising each part of the initial failure intensity are referred to as A-modes and B-modes,
respectively. Note also that MS does not represent the fraction of corrected failure modes.
c) MG, the MTBF goal for the system to achieve at the conclusion of the reliability growth test.
d) µd, the planned average fix effectiveness factor (FEF) for corrective actions, which is defined as
the fractional reduction in the rate of occurrence for a failure mode after corrective action.
e) T, the duration of reliability growth testing.
f) The average lag time associated with corrective actions.
157
IEEE Std 1633-2016
PM2 reliability growth planning curves primarily consist of two components—an idealized curve, and
MTBF targets for each test phase. The idealized curve may be interpreted as the expected system MTBF at
test time t that would be achieved if all corrective actions for B-modes surfaced by t were implemented
with the planned average FEF. The idealized curve extends from the initial MTBF, Mi, to the goal MTBF,
MG. The idealized curve is a monotonically increasing function whose rate of increase depends on the
levels of MS, µd, and the initial and goal MTBFs used to generate the curve.
The second component of the PM2 planning curve includes a sequence of MTBF steps. Since failure modes
are not found and corrected instantaneously during testing, PM2 uses a series of MTBF targets to represent
the actual (constant configuration) MTBF goals for the system during each test phase throughout the test
program. The rate of increase in the MTBF targets depends on the planning parameters used. The targets
are also conditioned explicitly on scheduled corrective action periods, which are defined as breaks in
testing during which corrective actions to observed B-modes can be implemented.
The model assumes the following:
1) Initial failure rates for failure modes that will be addressed with corrective actions (B-modes)
constitute realizations of independent random samples from a Gamma distribution with
density [see Equation (17)].
λa λ
p (λ )
= a +1
exp(− ) (17)
ab β
2) This assumption models mode-to-mode variation with respect to the initial rates of occurrence
for the modes. As a rule of thumb, the potential number of failure modes in the system should
be at least five times the number of failure modes that are expected to be surfaced during the
planned test period.
3) The rate of occurrence for each failure mode is constant both before and after any corrective
action;
4) Each failure mode occurs independently and causes system failure.
5) Corrective actions do not create new failure modes.
Both the idealized curve and the MTBF targets are generated by the equation for the expected system
failure intensity at test time t given by Equation (18):
p (t ) = λ A + (1 − µd )[λB − h(t )] + h(t ) (18)
In Equation (19) and Equation (20):
(1 − MS )
λA =
(1 − MS )λi = (19)
Mi
MS
λB MS
= = λi (20)
Mi
h(t) is the rate of occurrence of new B-modes and is given by Equation (21):
λB
h(t ) = (21)
1+ βt
158
IEEE Std 1633-2016
where β is a scale parameter that arises from the scale parameter for the Gamma distribution. This
parameter solely determines the fraction of the initial failure intensity that is due to B-modes surfaced by
test time t, and can be represented in terms of the planning parameters by Equation (22):
 M 
 1− i 
1 M
β = ( ) G  (22)
T  Mi 
 MS µd − (1 − M 
 G 
A number of additional metrics can also be calculated from the chosen planning parameters. These include
the expected number of correctable failure modes, the expected rate of occurrence of new failure modes,
the fraction of the initial failure intensity associated with the observed failure modes, and the growth
potential MTBF. These metrics provide useful information that can aid in a number of decisions involving
the reliability growth effort, such as determining if a reliability target is feasible or planning for sufficient
engineering staff to develop corrective actions for the failure modes that are expected to be observed in a
given test period.
6.3 Models that can be used during and after testing
Do not use any of the following models before executing the instructions in 5.4.4 and 5.4.5. The Shooman
Constant Defect Removal Model is used during integration, which is generally when the defect profile is
peaking as discussed in 5.4.4. The Musa Basic and Logarithmic models can be used when the defect profile
is decreasing.
6.3.1 Shooman Constant Defect Removal
This model can be used during integration or early software systems testing. This model was applied to
Space Shuttle software. The model successfully predicted prior to the flight the number of software faults
discovered during the mission (Shooman, Richeson [B80]).
6.3.1.1 Model assumptions
This model assumes the following:
a) No new defects are introduced during the defect correction process.

b) Constant defect removal rate.
The assumption of this model is that if the removal rate stays constant after sufficient debugging (and no
new defects are introduced), all defects that were originally in the program will be removed. Since it is not
possible to remove all defects with 100% confidence, the model is only useful early in the integration
process where there are still many defects and the defect removal process is mainly limited by the number
of testing personnel. Removal of all the defects also means that the MTBF becomes infinite—another
impossibility. Still, the mathematics are simple and the model is usually satisfactory for a few months of
testing. For the latest model based on Bohr and Mandel defects, see Shooman [B83].
6.3.1.2 Estimate failure rate, MTBF, remaining defects, and reliability
The remaining defects are estimated as shown in Equation (23):
159
IEEE Std 1633-2016
N̂ 0 − ρ0 (23)
Where ρ0 is an observed constant defect removal rate and N0 is the estimated number of inherent defects.
The failure rate is estimated by Equation (24):
λ= kˆ( Nˆ 0 − ρ0τ ) (24)
Where k is a shape parameter to be estimated in the next subclause and τ is the number of hours, weeks, or
months of development and testing.
The mean time to failure is estimated by Equation (25):
1
MTBF = (25)
ˆ ˆ
k ( N 0 − ρ0τ )
The reliability is estimated by Equation (26):
ˆ ˆ
R(t ) = e − k ( N0 − ρ0τ )t (26)
Remember t is operating time and τ is development time. Typically the defect removal the rate decreases
with τ. Two possibilities are a linearly decreasing defect removal rate and an exponentially decreasing
defect removal rate. Also, if fault discovery rate is proportional to the number of defects present, the defect
removal rate becomes exponential (Shooman [B82]).
6.3.1.3 Estimate model parameters
The constant defect removal model has thee parameters: k, N0, and ρ0 .
The estimate of the constant defect removal rate, ρ0 , is simply the number of defects removed in the
interval divided by the length of the interval, τ. That leaves two remaining parameters, k, N0, which can be
evaluated from the simulation test data by equating the MTBF function to the test data. Compute the actual
MTBF1 in the first increment of time such as a week or month by dividing the total operational hours by the
total defects found during that time interval. Set the actual MTBF1 to the 1/k[N0 − ρ0 × 1]. Compute
MTBF2 for the second increment of time in the same way. There are now two equations for MTBF in
which the actual MTBF1, actual MTBF2 and ρ0 are known. Dividing one MTBF equation by the other
allows one to solve for N0. Substitution of this value into the first equation yields k.
6.3.1.4 Estimate confidence bounds
The confidence of the model is based on how complete and granular the data is, see 5.4.4. The confidence
of the estimates also depends on how much data is available. As with any statistical model, the more data
that is available, the higher the confidence of the estimates. The confidence bounds can be computed by
estimating the confidence of the two parameters k and N0. With this model, one only needs to estimate the
confidence of the N0, which is the estimated inherent defects. If there is at least 30 data points, Z charts can
be used to compute the confidence of the Y intercept estimate.
Plot each estimate of N0 for each time interval in which a fault was observed. There should be as many
estimates of N0 as there are data points on the graph. For each estimate the following:
160
IEEE Std 1633-2016
 Establish the desired confidence, which is (1–α). If 95% confidence is desired then set α to 5%.
 Using normal charts determine Z
(1–α)/2
 Lower interval = N̂ 0 – Z √(Var( N̂ 0 )) (27)

(1–α)/2
 Upper interval = N̂ 0 + Z √(Var( N̂ 0 )) (28)

(1–α)/2
The estimates will have a higher confidence as the number of data points increases. So, the range on the
estimates used should become smaller during testing. It is recommended to use confidence intervals when
specifying or measuring reliability values.
6.3.1.5 Example
Fault data is collected during testing and the defect removal rate is calculated as a constant ρ0 = 90 faults
per month. Thus, one parameter has been evaluated. The other two parameters can be evaluated from the
test data by equating the MTBF function to two different intervals of the test data.
After one month MTBF = 10/2 = 5 = 1/k[N0 − 90 × 1]
After two months MTBF = 8/1 = 8 = 1/k[N0 − 90 × 2]
Dividing one equation by the other cancels k and allows one to solve for N0, yielding N̂ 0 = 330.
1
Substitution of this value into the first equation yields kˆ = 0.008333 and = 1200 .
k̂
The resulting functions are as follows. The following can be computed for an array of values τ.
λ̂ (τ) = k̂ [ N̂ 0 − ρ̂0 τ] = 0.0008333[330 − 90τ]
R̂ (t) = exp (− k̂ [ N̂ 0 − ρ̂0 τ]t) = exp(−0.0008333[330 − 90τ]t)
 = 1/ k̂ [ N̂ − ρ̂ τ] = 1200/[330 − 90τ]
MTBF 0 0
6.3.2 General exponential models
6.3.2.1 Assumptions
This model class contains one of the simplest and most popular models. It assumes the following:
 The software is being operated similarly to the end-user OP.

 Defects are removed once observed.
 Corrective actions to remove defects do not introduce new defects.
 The rate of fault detection is decreasing or stabilizing.
 Each defect is equally likely to be observed during testing. (There are not defects masking other
defects.)
This model cannot be used if the fault rate is increasing.
161
IEEE Std 1633-2016
6.3.2.2 Estimate failure rate, MTBF, remaining defects, and reliability
The estimates for failure rate, MTBF, and reliability are shown in Table 52.
Table 52 —Estimated failure rate, MTBF, remaining, defects, and reliability

using the general Exponential Models
Estimated
Estimated current Estimated Estimated current
Model remaining
failure rate current MTBF reliability
defects
Musa Basic N0 – n λ(n) = λ0 (1–(n/N0)) The inverse of –( λ(n) × mission time)
e
Musa et al. [B59] the estimated or
Jelinski-Moranda λ(n) = k(N0–n) failure rate
Jelinski, Moranda [B37] –( λ(t) × mission time)
e
Goel-Okumoto λ(t) = N0ke–kt
Goel, Okumoto [B19]
Two of the models are “defect based.” which means the model results only change when another failure is
observed. One model is time based, which means that the model results change as a function of each testing
hour. If the software is failing fairly regularly the time-based and defect-based models will produce similar
results. However, if the software is not failing very often, or if a substantial amount of test hours have
passed since the last fault was observed, the time-based models will take that into account while the defect-
based models assume that the failure rate estimate is unchanged until the next fault occurs. Most of the
work involved in using the models is in collecting the data and estimating the parameters. If the same
parameter estimation technique (such as the following one) is used then the amount of work required to use
all three of the preceding models may not be significantly more than using only one model. It is a good idea
to have at least one defect-based model and one time-based model.
The estimated remaining defects = N̂ 0 – n where N̂ 0 is the estimated inherent defects and n is the
observed cumulative number of faults found so far. Each model estimates failure rate is a function of two
out of three of the following:
 λ̂0 —the estimated initial failure rate
 N̂ 0 —the estimated inherent defects and n is the observed cumulative number of faults observed so
far
 k̂ —the rate of change of the fault rate (the estimated per defect fault rate)
The estimated MTBF for all three models is the inverse of the estimated failure rate. The estimated
–( λ(n) × mission time)
reliability = e where mission time is the expected mission time of the software for one
complete cycle.
The model parameters for all of the General Exponential Models can be easily estimated by employing the
fault rate graph discussed in 5.4.4. As shown in Figure 109, the estimated inherent defect N̂ 0 is the
estimated Y intercept of the graph. The estimated initial failure rate is the x intercept of that same graph.
MLE and LSE (Musa et al. [B59]) can also be used to estimate the parameters. The slope parameter k can
be estimated by the absolute value of the slope of the best straight line through the data points.
162
IEEE Std 1633-2016
Estimated N0
K=(abs(1/slope))
Actual observed
n initial failure rate λ0
θ = rate of decay
Estimated initial
failure rate λ0
n/t
Figure 109 —Estimate parameters for the Musa Basic Model
The confidence of the model is based on how complete and granular the data is (see 5.4.4). The confidence
of the estimates also depends on how much data is available. As with any statistical model, the more data
that is available, the higher the confidence of the estimates. The confidence bounds can be computed by
estimating the confidence of the two parameters. Since with this model, the two parameters are proportional
to each other, one only needs to estimate the confidence of the Y intercept, which are the estimated inherent
defects. If there is at least 30 data points, Z charts can be used to compute the confidence of the Y intercept
estimate.
The confidence intervals of some estimate such as N0 are determined by the following:
Plot each estimate of N0 for each time interval in a fault was observed. There should be as many estimates
of N0 as there are data points on the graph.
 Establish the desired confidence, which is (1–α). If 95% confidence is desired then set α to 5%.
(1-α)/2
 Lower interval = N̂ 0 – Z √(Var( N̂ 0 ))

(1–α)/2
 Upper interval = N̂ 0 + Z √(Var( N̂ 0 ))

(1–α)/2
As shown in Figure 110, the estimates will have a higher confidence as the number of data points increases.
So, the range on the estimates used should become smaller during testing. Confidence intervals are
recommended when specifying or measuring reliability values.
163
IEEE Std 1633-2016
More data improves confidence of estimates. If

software is not failing often then insert a data
point for the most recent interval. Less data means wider
confidence bounds.
n
n Confidence bounds around Confidence bounds around
predicted N0 predicted N0
n/t n/t
Figure 110 —Estimate confidence bounds on parameters
6.3.2.5 Example
This example in Figure 111 uses the fault and usage data illustrated in 5.4.4, Figure 83.
Cumulative faults versus fault rate

140
y = -857.97x + 117.77
120
X intercept = .137226
100 Slope = 117.77/.137226
Cumulative Faults (n)
80 k = .137225/117.77
60 Y intercept = 117.77
40
20
0
-20 0 0.05 0.1 0.15 0.2
-40
-60
Fault Rate n/t
Figure 111 —Example of Musa Basic Model parameter estimation

The parameters estimated as per 6.3.3.4 are as follows:
 Y intercept ( N̂ 0 ) from the plot is 117.77

 X intercept ( λ̂0 ) from the plot is 0.137226
The observed data is as follows:
 Faults to date (n) is 84.
164
IEEE Std 1633-2016
 The total test hours to date (t) = 1628.

 The estimated reliability figures of merit are therefore:
Estimated remaining defects = 118 – 84 = 34
Note that the estimated inherent defects were rounded up to provide a result that is an integer. The
percentage of estimated removal is therefore 84/118 = 71%.
The estimated failure rate and MTBF of each model is shown as follows. One can see that the two defect
based models yielded virtually the same result while the time based model was more optimistic. The
expected mission time for this software is 8 h. That mission time is used to compute the estimated current
reliability. The results are shown in Table 53.
Table 53 —Estimates from the General Exponential Models

Estimated Estimated Estimated current
Estimated current failure rate in
Model remaining current reliability as a function
terms of failures per hours
defects MTBF (h) of 8 h of mission time
Musa N̂ 0 – n = 25.41366 e–(0 .03935 × 8) = 0.772993
Basic λ̂ (n) = λ̂0 (1–(n/ N̂ 0 ) =
117.77–84 = 0.137226 × (1–84/117.77) = 0.03935
Jelinski- 34 25.4181 e–(0 .03934 × 8) = 0.772999
Moranda So 71% of λ̂ (n) = k̂ ( N̂ 0 –n) =
the defects 0.001165 × (117.77–84) = 0.03934
estimated
Goel- ˆ
ˆ − kτ = 117.77 × 0.001165 48.56585 e–(0 .02059 × 8) = 0.84813
Okumoto
have been λ̂ (t) = Nˆ 0 ke
removed. × e(–0.001165 × 1628) = 0.02059
6.3.3 Musa/Okumoto logarithmic Poisson execution time model
This model is applicable when the testing is done according to an OP that has variations in frequency of
application functions and when early defect corrections have a greater effect on the failure rate than later
ones. Thus, the failure rate has a decreasing slope.
6.3.3.1 Model assumptions
The assumptions for this model are as follows:
 The software is operated in a similar manner as the anticipated operational usage.

 Failures are independent of each other.
 The failure rate decreases exponentially with usage time.
This model cannot be used if the failure rate is increasing.
6.3.3.2 Estimate remaining defects, failure rate, MTBF, and reliability
This model assumes that the total estimated defects are infinite. Hence, there is no estimated of remaining
defects. From the model assumptions, the failure rate and MTBF can be estimated as follows:
ˆ
λ̂ (n) = λˆ0 × e−(θ ×n ) is the estimated failure rate at the point in time in which n faults have been observed.
165
IEEE Std 1633-2016
Estimated MTBF = (1/ θˆ ) × ln(( λ̂0 × θˆ × t) + 1) where θˆ is the estimated rate of change of the faults
observed, t is the cumulative usage hours to date, and λ̂0 is the first observed fault rate.
–( λ(n) × mission time)

Estimated reliability = e where mission time is the expected mission time of the software
for one complete cycle.
The parameter λ0 is the initial failure rate parameter and θ is the failure rate decay parameter with θ > 0.
The model parameters can be determined graphically by using the defect rate plot in 5.4.4. Alternatively,
the MLE method (Musa et al. [B59]) can be used. Recall that the Musa Basic model uses the estimate
inherent defects N0 and the slope value k. The logarithmic model assumes that the inherent defects are
infinite. The parameters are the rate of change θ and the observed initial failure rate λ0. λ̂0 is the first failure
rate observed in the data while θˆ can be computed by plotting the natural logarithm of the fault rate versus
the cumulative faults observed. The inverse of the slope of the best straight line through the data points is
the rate of decay θˆ . Figure 112 compares the parameters estimation for this model compared to the
parameter estimation for the Basic Model:
Figure 112 —Parameters estimation for the Musa Basic and Logarithmic models
The confidence bounds for this model can be derived similarly to the confidence bounds for the Musa Basic
model. Instead of estimating the confidence of the inherent defects, one will estimate the confidence of the
failure rate decay parameter θ. Plot each estimate of θ for each time interval in which there a fault was
observed. There should be as many estimates of θ as there are data points on the graph. For each estimate:
 Establish the desired confidence that is (1–α). If 95% confidence is desired, then set α to 5%.
(1–α)/2
 Lower interval = θˆ – Z √(Var( θˆ )) (29)

(1–α)/2
 Upper interval = θˆ + Z √(Var( θˆ )) (30)

(1–α)/2
166
IEEE Std 1633-2016
estimates used should become smaller during testing. Confidence intervals should be used when specifying
or measuring reliability values.
6.3.3.5 Example
Using the same data as shown in 6.3.2.5, the plot of the natural log of the fault rate versus the cumulative
faults is shown in Figure 113:
Cumulative faults versus Natural log of fault rate

120
100
80
Cumulative faults (n)
y = -80.907x - 157.86
60
40
20
0
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
-20
-40
Ln(n/t)
Figure 113 —Natural log of fault rate versus cumulative faults

From the plot one can see that the slope of the best straight line through the data is 80.907. So
θˆ = 1/80.907 = 0.01236. The actual initial failure rate observed on the first day of testing was 0.125 (see
Figure 110).
λ̂ (84) = 0.125 × exp(–0.01236 × 84) = 0.04426 failures per hour at the point in time in which 84 faults
have been observed.
Estimated MTBF = (1/0.01236) × ln((0.125 × 0.01236 × 1628) + 1) = 101.7087 h
Estimated reliability = e–(0.04426 × 0.8) = 0.701818, which is the probability that the software will be successful
over an 8 h mission.
167
IEEE Std 1633-2016
Annex A
(informative)
Software failure modes effects analysis templates
A.1 Templates for preparing the software failure modes effects analysis (SFMEA) 12
Table A.1—SFMEA viewpoints

Software artifacts
SFMEA When this viewpoint
applicable for this Typical failure modes
viewpoint is relevant
viewpoint
Functional Any new system or any time Software requirements Faulty, contradictory, or missing
there is a new or updated set specfication (SRS) or requirements for both the nominal
of requirements. systems requirements conditions and error handling
specification (SyRS)
Interface Anytime there are complex Interface design Faulty interface data, timing, or
hardware and software documentation (IDD) sequences between software and
interfaces or software-to- software, software and hardware,
software interfaces or software and databases, software and
software-to-human OS.
interfaces.
Detailed Almost any type of system is Detailed design or code Faulty data, sequences, error handling,
applicable. Most useful for memory management, and algorithms
mathematically intensive within the software LRUs.
functions.
Maintenance An older legacy system that The code or design that Faulty corrective actions that result in
is prone to errors whenever has changed as a result of a new faults.
changes are made. corrective action
Usability Anytime user misuse can Use cases, user’s manuals, Confusing or inconsistent user
impact the overall system IDD interface or user documentation.
reliability.
Serviceability Any software that is mass Installation scripts, Software stops working after an
distributed or installed in ReadMe files, release update. Software does not operate
difficult to service locations. notes, service manauls properly when installed.
Vulnerability The software is at risk from See Detailed and Usability The system behaves erratically from
hacking or intentional abuse. deliberate hacking or sensitive
information is leaked to the wrong
people.
Software  One very serious or Software schedule, Faulty scheduling, faulty change
process costly failure has software process control, and faulty development
occurred because of the documentation, Software practices.
software. Development Plan (SDP)
 Software is causing the
system schedule to slip.
 Many software failures
are being observed at a
point in time in which
the software should be
stable.
12
All tables in Annex A reprinted with permission from Ann Marie Neufelder, Softrel, LLC “Effective Application of Software
Failure Modes Effects Analysis” © 2014 [B64].
168
IEEE Std 1633-2016
Table A.2—Artifacts required
Serviceability
Vulnerability
Maintenance
Functional
Interface
Usability
Detailed
Artifacts
Systems requirements spec One of these is required. Required

Software requirements spec The SRS is preferred over the SyRS.
System architecture design Highly recommended
Interface design document Required Highly
(IDD) recommended
Software detailed design One of Recommended
Code these is Required
required
User interface (UI) design Required Required Required Required
document
User manuals or help files or Required Required
use cases.
Field reports, list of changes Recommended Required Required
Use cases Highly recommended Required
Software test plan/ procedures May be required for Required
corrective action
Installation scripts and guide Required
Table A.3—Personnel required for each SFMEA viewpoint

SW
SFMEA SFMEA SW/FW SW SW Domain
Reqs
viewpoint facilitator engineer architect manager experts
Engineer
Functional x x x
Interface x x x
Detailed x x x
Maintenance x x x
Vulnerability x x x x
Usability x x x
Serviceability x x x
Production x x x
169
IEEE Std 1633-2016
Table A.4—Ground rules

Issue Extent the failure mode is propagated
Human error Decide whether or not to include human errors in the Functional SFMEAs. The usability SFMEA
focuses on the human error. However, it is possible to include the human aspect in the functional
SFMEA also.
Chain of In any system there will typically be a chain of interfaces. One item interfaces to another who
interfaces interfaces to another. A decision should be made up front as to how to consider chain events.
Typically, the interface SFMEA will focus only on failure modes initiated between two items. That
means that it is assumed that the failure mode does not involve other hardware or software LRUs.
Network Decide whether to assume that any network required for the system is available.
availability
Speed and Decide whether to assume that the system is performing at maximum, typical, or minimum speed
throughput and throughput.
A.2 Templates for analyzing the failure modes and root causes
The following are the SFMEA tables for each of the seven product related viewpoints as referenced by
5.2.2.
Table A.5—Functional SFMEA template

Failure mode and root cause section
SRS SRS Related
statement statement SRS Description Failure mode Root cause
ID text statements
A reference The SRS List any and Natural Faulty functionalitya Create a row for each
ID as per the statement all related language Faulty timing (if applicable to applicable root cause related
SRS itself SRS description this SRS statement) to this failure mode
document statement of this Faulty sequencing (if the SRS
numbers statement statement pertains to events)
Faulty data (if the SRS statement
pertains to data)
Faulty error handlinga
a
Failure modes that apply to all software systems.
170
IEEE Std 1633-2016
Table A.6—Interface SFMEA template

Max value
Min value
Network
Interface
measure
Default
Unit of
layers
Interface
value
Type
Size
para- Failure mode Root cause
pair
meter
Describe List each Retrieve this information from the Faulty communications Create a row for each
the from/to IDS for each interface. Faulty processing applicable root cause
direction of Faulty COTS interface related to this failure
interface This information is needed to analyze Faulty OS interface mode
pair the root causes. If the information is Faulty database interface
not available that, in itself, could be a Faulty timing
process-related failure mode. Faulty sequencing
Faulty error handling
Faulty data
Table A.7—Detailed SFMEA template

Unit has
Unit name Description Failure mode Root cause
these items
Name of Functionalitya What is this function supposed to do? Faulty Create a row for each
the module, Which SRS statement(s) does it map functionality applicable root cause
function, to? What SDS statements does it map related to this failure
class, etc. to? mode.
Sequences Describe any sequential steps that are Faulty sequences
required to happen in a particular
order.
Dataa (variables) From detailed design describe the Faulty dataa
scope, type, size, default value,
minimum and maximum value, and
unit of measure.
Exception Describe the return value from this Faulty exception
handlinga function. List any code that requires handling
exception handling or makes
assumptions about input ranges.
Algorithms Describe the algorithm, its inputs and Faulty algorithm
outputs. For each input and output
describe the type, size, default value,
minimum and maximum value, and
unit of measure.
Logic Describe the complex logic. If Faulty logic
necessary, create a truth table to show
the possible values expected.
Memory Describe any code that allocates or Faulty memory
management de-allocates memory. management
Input/output Describe any code involved with I/O. Faulty I/O
a
171
IEEE Std 1633-2016
Table A.8—Maintenance SFMEA template

Change in
Failure Root
Unit name this unit is Description
mode cause
related to…
ID and Functionalitya What is this function supposed to do? Which SRS Faulty Create a
description of statement(s) does it map to? What SDS statements does it functionality row for
corrective map to? Has the functionality changed as a result of this each
action or change? If so, how? applicable
change Sequences Describe any sequential steps that are required to happen in a Faulty root cause
particular order. Describe any changes in sequence or any sequences related to
new sequences introduced by this corrective action. this
Dataa From detailed design describe the scope, type, size, default Faulty dataa failure
(variables) value, minimum and maximum value, and unit of measure. mode.
Describe any changes made to any of the variables,
particularly global variables as well as any new variables.
Exception Describe the return value from this function. List any code Faulty
handlinga that requires exception handling or makes assumptions about exception
input ranges. Describe any changes to the exception handling. handling
Describe any new code that requires exception handling.
Algorithms Describe the algorithm, its inputs and outputs. For each input Faulty
and output describe the type, size, default value, minimum algorithm
and maximum value, and unit of measure. Describe any
changes to any existing algorithms or inputs to algorithms as
well as new algorithms.
Logic Describe the complex logic. If necessary, create a truth table Faulty logic
to show the possible values expected. Describe any changed
or new logic.
Memory Describe any code that allocates or de-allocates memory. Faulty
management Describe any changes made to memory management or any memory
new memory management code. manage-
ment
Input/output Describe any code that is involved with I/O. Describe any Faulty I/O
changes made to the I/O or any new code that has introduced
I/O that previously did not exist.
a
172
IEEE Std 1633-2016
Table A.9—Usability SFMEA Template

Function
Description Failure mode Root cause
name
User Describe the use case or Faulty supporting List every root cause related to faulty supporting
interface user action and the documentation documentation
with XYZ software feature
feature
Overly cumbersome software List every root cause related to overly
operations cumbersome software operations
Software is not robust for List every root cause related to robustness for
common human errors common human errors
Faulty assumptions about the List every root cause related to faulty
end user assumptions about the end users
Legal users use the software List every root cause related to misuse or abuse
for the wrong purpose
Legal users perform software List every root cause related to misuse or abuse
tasks that are not appropriate
for their access or security
level
Table A.10—Serviceability SFMEA template

Function Description Failure mode Root cause
Installation Describe the software to be Insufficient personnel or List every root cause related to insufficient
package installed and the people who resources required for personnel or resources required for the install
XYZ will perform the installation. install or update or update
Installation package and List every root cause related to installation
scripts package and scripts
Table A.11—Vulnerability SFMEA template

Unit Description Failure mode Root cause
Unit of List the Direct access to application memory is allowed via buffer overruns List CWE
code affected Direct access to application memory is allowed via numerical overflow and entries that
code here calculations pertain to
Uncontrolled format strings each failure
Unchecked inputs in web pages modea
Unwanted commands are injected
Inputs result in faulty security decisions
Overly broad error handling or faulty error handling
Too many security related error messages
Improper authentication
Information needed to attack the software is leaked by the software itself
Insufficient memory management
Global resources are modified without locking via timing and state issues
Generally poor coding practices
a
Common Weakness Enumeration at: http://cwe.mitre.org/.
173
IEEE Std 1633-2016
Table A.12—Production SFMEA template

Software
Description Failure mode Root cause
project
Version X Describe the Insufficient scheduling and List every root cause related to insufficient scheduling
for Project particular sizing methods methods
XYZ for release Insufficient personnel staffing List every root cause related to insufficient personnel
Product staffing
ABC Insufficient requirement List every root cause related to insufficient requirements
analysis practices analysis practices
Insufficient design practices List every root cause related to insufficient design
practices
Insufficient implementation List every root cause related to insufficient implementation
practices practices
Insufficient test practices List every root cause related to insufficient test practices
Insufficient defect prevention List every root cause related to insufficient defect
practices prevention practices
Insufficient tools List every root cause related to insufficient tools
Insufficient change control List every root cause related to insufficient change control
A.3 Template for consequences
Table A.13—Consequences section of SFMEA template

Failure modes Consequences Mitigation
See templates in section A.2
Compensating Provisions
Preventive Measures
Effect on subsystem
Corrective action
Effect on system
Revised RPN
Failure mode
Local effect
Description
Root cause
Likelihood
Severity
RPN
174
IEEE Std 1633-2016
A.4 Template for mitigation
Table A.14—Consequences section of SFMEA template

Failure modes Consequences Mitigation
See templates in section A.2
Compensating Provisions
Preventive Measures
Effect on subsystem
Corrective action
Effect on system
Revised RPN
Failure mode
Local effect
Description
Root cause
Likelihood
Severity
RPN
175
IEEE Std 1633-2016
Annex B
(informative)
Methods for predicting software reliability during development
The instructions for predicting SR before the software is in a testable state are discussed in 5.3.2.3. The
following procedures support the procedures in 5.3.2.
B.1 Methods for predicting code size
B.1.1 Predict effective size for in-house developed LRUs
If the defect density models discussed in 6.2.1, B.2.1, B.2.2, B.2.3, B.2.5, or B.2.6 are used, then the
effective size should be predicted to yield a prediction of total defects. The effective size is the amount of
software code that is subject to having undetected defects. If the software is being developed by a
contractor or subcontractor then this method is applicable. If the software LRU is either COTS or FOSS the
method in B.1.2 is applicable.
The checklist for predicting size for in-house developed LRUs is shown in Figure B.1.
a) Determine whether size will be measured in KSLOC or function points. Function points are a
preferred measure to KSLOC. See B.1.1.1 for more information.
b) Select a method to predict the size in either KSLOC or function points as per B.1.1.2.
c) If the size is predicted in KSLOC it should be normalized for both language type and effectiveness
as per B.1.1.3.
d) If the unit of measure is functions points then use the model in Table 49. If the unit of measure is
normalized EKSLOC then the models in Table 48 and B.1.1.1 can be used.
Figure B.1—Checklist for predicting size for in-house developed LRUs
B.1.1.1 Units of measure
There are two methods of size estimation used in industry: function points and KSLOC. Function points are
a measure of software size that does not require normalization. So, a function point on one software
program is comparable to a function point on another program. Function points have an advantage over
KSLOC size predictions in that they do not require normalization by language type or effectiveness (Jones
[B39])
Function points—A unit of size measurement to express the amount of functionality an information
system (as a product) provides to a user (Cutting [B11]).
KSLOC is a unit of measure that is not implicitly normalized. So, in order for it to be useful, particularly
across different software programs and applications, it should be normalized by both the language type and
the effectiveness. A line of reused code (code that has not been modified but it reused from a previous
software project) will have a different exposure to defects than a line of code that is new and has not been
deployed or used operationally. Lines of code that are modified typically have a defect exposure that is less
than new code but more than reused code. Auto-generated code (code generated by a tool) typically has a
176
IEEE Std 1633-2016
much lower defect exposure than code that is not. Hence, in order to have an accurate size prediction the
number of reused, modified, auto-generated, and new lines of code should be identified. Additionally a line
of code is C will perform a different amount of work than a line of code in C++ or C# or Java. Hence, in
order to compare KSLOC across different projects with different languages the language needs to be taken
into consideration.
KSLOC—1000 source lines of code. This is not normalized by language or effectiveness.
EKSLOC—Effective KSLOC. This is a weighted average of new, modified, reused, and auto-generated
code.
Normalized EKSLOC—EKSLOC that has been normalized for the language(s) so that it can be directly
multiplied by the defect density predictions that are in terms of defects/normalized EKSLOC.
The practitioner will need to determine which unit of measure is being employed on each of the software
LRUs. The LRUs that have size estimates in terms of KSLOC will need to be normalized for both language
type and effectiveness while LRUs that have size estimates in terms of function points will need no
normalization. Function points are hence a preferred measure of size.
The defect density prediction models in this document are presented in either defects/function points or
defects/normalized EKSLOC. The practitioner should be cautious of using any KSLOC-based defect
density prediction models that have not been normalized. These models will penalize software developed in
higher order languages such as C#, Java, etc. The practitioner should also verify that the KSLOC
predictions are normalized prior to multiplying them by the predicted defect density tables in this
document.
B.1.1.2 Methods for predicting size
There are several methods for predicting the size of the software in terms of either KSLOC or function
points or both, before the code exists. It is not the purpose of this recommended practice to cover all of
these or to make a recommendation. A summary is shown as follows: 13
 Software Risk Master (SRM) (Jones [B42])

 COCOMO® (Boehm et al. [B7])
 SEER for Software (Fischman et al. [B18])
 Price® (DeMarco [B12])
 SLIM-Estimate (Putnam [B71])
 ExcelerPlan
In addition to the preceding methods, an organization can develop their own size prediction model as
follows:
a) When the code is complete, measure its actual:

 Functions points OR language type, new code size, modified code size and auto-generated
code size, using a static code analyzer. Note that function point sizing is recommended.
 Total number of work hours, months, or years it took to develop that code
13
COCOMO is a registered trademark of Barry W. Boehm. Price is a registered trademark of Price Systems, L.L.C. This information
is given for the convenience of users of this standard and does not constitute an endorsement by the IEEE of these products.
Equivalent products may be used if they can be shown to lead to the same results.
177
IEEE Std 1633-2016
 Total number of calendar months from design to deployment

 Type of software developed
 Maturity of the software developed (new product, legacy, etc.)
 Relative experience of the software engineers on the project as well as any unusual risks
b) Maintain a database of code sizes, which can then be used to predict the size of future systems via
comparison to similar staffing, application type, product maturity, and domain experience.
B.1.1.3 Computed effective normalized EKSLOC
If the practitioner is predicting size in terms of KSLOC, the size predictions will require normalization for
both effectiveness and language type. If the size is predicted in terms of function points the following steps
are not required.
EKSLOC = New KSLOC + (A × major modified KSLOC) + (B × moderate modified KSLOC) + (C ×

Minor modified KSLOC) + (D × reused KSLOC without modification) + (E × auto-generated KSLOC)
 Reused code—Code that has been previously deployed on a previous or similar system without
modification.
 New code—Code that has not been used in operation.
 Modified code—Reused code that is being modified.
 Auto generated code—Code that is generated by an automated tool.
The steps shown in Table B.1 (Fischman et al. [B18], Neufelder [B68]) are executed to predict the effective
KSLOC. Effective code is code that has not been deployed operationally and therefore has not experienced
reliability growth.
Table B.1—Typical effectiveness multipliers

Multiplier Typical ranges Comments Function of
A—major ≥ 40% Requirements, design, and code are Magnitude of requirements change.
modification modified
B—moderate 20% to 40% Design and code are modified Magnitude of design change and
modification cohesiveness of design.
C—minor 5% to 20% Code is modified but not design Cohesiveness of code (ability to
modification change it without breaking an
unrelated function).
D—reused 0% to 30% This assumes that the reused code Cohesiveness of design.
has been recently deployed and is How long the reused code has been in
completely unchanged for this operation.
version
E—auto 0% to 10% This assumes that the auto- The ability to program the automated
generated generated code meets the tool to provide code as per the design
requirements and requires no and requirements.
modification
New code is 100% effective. Code that is reused without modification is fractionally effective as there may
still be a few latent defects in that code. Code that is reused with modifications will have an effectiveness
that is between these two extremes. Reused code that is subject to major modifications, for example, may
be almost as effective as new code. Reused code with minor cosmetic changes, for example, may be
slightly more effective than reused code with no modifications. Auto-generated code typically has an
effectiveness that is similar to reused and not modified code since it is generated by a tool that has been
fielded. Deleted lines of code are counted depending on whether the deletion is within a function of code or
178
IEEE Std 1633-2016
whether an entire function has been deleted. If lines within a function are deleted, the deletion counts as a
modification. If entire functions, files, or classes are deleted, the deletion simply results in less KSLOC. A
variety of tools exist for size prediction as discussed in the next subclause.
Once the effective KSLOC is predicted the last step is to normalize the EKSLOC based on the language
used to develop the code. Some languages such as object oriented languages are denser than other
languages. This normalization allows for defect density to be applied across projects developed in different
languages. Multiply the corresponding conversion shown in Table B.2 to the predicted EKSLOC to yield
the normalized EKSLOC.
Table B.2—Typical ratios between languages and assembler

Language type Normalization conversiona
High order language such as C, Fortran, etc. 3.0
Object oriented language such as C++, C# 6.0
Hybrid 4.5
a
Lakey, Neufelder [B45].
Example: There are two organizations developing software for a system. Both organizations are developing
their code in C++. Weightings are determined as follows: A - 0.40, B - 0.30, C - 0.15, D - 0.05, E - 0.01.
The EKSLOC normalizations are shown in Table B.3.
Table B.3—Example of EKSLOC normalizations

New KSLOC
generated
Moderate
EKSLOC
modified
modified
modified
KSLOC
KSLOC
KSLOC
KSLOC
KSLOC
Reused
code in
Minor
Major
Auto-
Total
LRU
Weight 100% 40% 30% 15% 5% 1%
A 50 0 0 0 0 0 50
B 100 10 20 30 100 0 100 + 4 + 6 + 4.5 + 5 = 119.5
The total EKSLOC is then shown as follows with the language type and normalization. The normalized
EKSLOC is multiplied by the predicted defect density to yield the predicted defects as shown in Table B.4.
Note that even though defects are an integer value, the predictions retain the significant digits since the
predicted defects will be used in the MTBF calculations.
Table B.4—Final computation for example of EKSLOC normalizations

Predicted Predicted defects =
Normalization
Total Normalized operational predicted defect
Language for this
EKSLOC EKLSOC defect density × normalized
language
density EKSLOC
LRU A 50 C++ 6 300 0.239 71.70
LRU B 119.5 C++ 6 717 0.110 78.87
Total 150.57
B.1.2 Predict effective code size for third-party LRUs
Figure B.2 shows the steps for predicting the code size when the source code is not available, which is
typically the case for COTS LRUs.
179
IEEE Std 1633-2016
a) Is this COTS component part of the operational environment? If not it should not be included in the
predictions. If the COTS is part of the operational environment then proceed to step b).
b) Has the COTS component ever been used for a previous system? If so, compute the actual number
of defects (even if it is zero) from that previous system and use that for the prediction. Otherwise
proceed to step c).
c) Install the COTS software exactly as it will be installed at deployment. Count up the number of
kilobytes (kB) of all executables and DLLs.
d) Multiple the result of step c) by 0.051. This now has the estimate size in terms of KSLOC of C
code (Hatton [B26]). Since the each line of C code expands to approximately 3 lines of assembler
code, the expansion ratio from assembler to C code is about 3:1. The normalized KSLOC is
therefore 3 × 0.051 × number of kilobytes.
e) Multiply the result of step d) by 0.1 if the COTS software has been deployed for at least 3 years.
f) Estimate the total number of installed sites for the COTS component. Multiply this by the
appropriate value in Table B.5.
Figure B.2—Checklist for predicting size for third-party LRUs
Table B.5—Multipliers for COTS software

Number of deployed systems Multiplier
Mass deployed (thousands or more) 0.01
In between limited and mass deployed 0.10
Limited install base (a few installations) 1.00
Example: COTS product that will be deployed with the system is installed. It has never been used before on
any past system so there is no past history regarding actual fielded defect data. It is installed exactly as it
will be deployed and count of the number of kB in all applicable and executable type files. These files have
a suffix of “.dll” or “.exe”. The total count is 1000 kB. As per step d), the estimated KSLOC in C code is
therefore 1000 × 0.051 = 51 KSLOC and the normalized KSLOC is therefore 51 × 3 = 153 KSLOC. The
COTS software has been mass deployed several years so as per step e) the normalized EKSLOC is
multiplied by 0.1 to yield 15.3 EKSLOC. The COTS software is mass deployed so as per step f) the final
normalized EKLSOC = 15.3 × 0.01 = 0.153 EKSLOC.
B.1.3 Predict effective code size for firmware
The easiest way to predict the effective size of the firmware is to predict the executable size first and then
convert that to EKSLOC as per B.1.2. Alternatively if one can predict how many rungs of ladder logic code
pertain to a line of assembler or C then the size can be predicted from that.
B.2 Additional models for predicting defect density or defects
In 6.2 the three simplest methods for predicting defect density (which is necessary for predicting the
software failure rate) were presented. This subclause provides for additional methods for predicting defect
density.
180
IEEE Std 1633-2016
B.2.1 Full-scale detailed survey assessment
In 6.2.1.1 a method of predicting defect density based on 22 parameters was presented. All of the 22 input
parameters are relatively easy to collect or observe. The 22 parameters provide for some level of sensitivity
analysis. The full-scale model (Neufelder [B67], SAE [B86]) has three different forms ranging from
94 questions to 377 questions for those who are interested in a more detailed prediction as well as a detailed
sensitivity analysis. As shown in Table B.6, the full-scale model form A has 94 questions that are usually
described in the software development plan or via interviews with the software development team. The
full-scale model B has 132 additional questions that require knowledge of the software plans and schedule.
The full-scale model C has 151 additional questions that pertain to the requirements, design, and test
strategy. This form is useful for those who need to improve the effectiveness of their development
deliverables.
Table B.6—Summary of the full-scale model forms

Number of questions that Questions answered by
Questions answered
Model can be answered via an detailed review of Total
by review of project
form interview and/or requirements, design, and questions
plans and schedules
review of SDP test plan artifacts
A 94 0 0 94
B 94 132 0 226
C 94 132 151 377
Each of the forms covers the factors shown in Table B.7, which have been shown to correlate quantitatively
to fielded defects. Each of the columns shows how many questions are in that model form that pertain to
the categories shown to the left. For example, there are 13 questions related to avoidance of big blobs in
model form B.
Table B.7—Summary of factors that have been correlated to fielded defects

Model Model Model
Category of questions
form A form B form C
Avoiding big blobs—“Aim small miss small.” Small increments, more granular
13 16 16
tasks, less time between reviews, smaller releases made more frequently.
Change management—Ability to control and manage changes to the
5 15 15
requirements, design, code, and product.
Coding practices. 5 11 17
Defect reduction techniques and tools. 1 15 15
Defect tracking methods and tools. 4 9 9
Design practices. 2 10 40
Domain expertise—How many software engineers have been an end user,
3 4 4
visited an end user, etc.
Execution of project—Ability to manage the tasks and people. 5 16 20
Inherent risks—These are things that cannot be changed such as the industry,
2 22 22
the maturity of the product, etc.
Personnel—How they are organized and their experience level. 7 15 15
Planning—Ability and willingness to plan the software features and tasks in
4 11 11
advance.
181
IEEE Std 1633-2016
Table B.7—Summary of the factors that have been correlated to fielded defects
(continued)
Model Model Model
Category of questions
form A form B form C
Process—Ability to repeat the practices for developing the software. 5 23 25
Requirements—Ability to clearly define the requirements for what the software
11 7 33
is required to do as well as what the software should not do.
System testing practices—Ability to test the OP, design, requirements, failure
18 19 63
modes, etc.
Unit testing practices—Ability to test the design from the developer’s point of
8 7 38
view.
Visualization—Use of pictorial representations whenever possible. 2 3 13
B.2.2 Metric-based software reliability prediction system (RePS)
The metric-based software reliability prediction system (RePS), illustrated in Figure B.1, is an approach
that bridges the gap between a software metric and software reliability. RePS is a bottom-up approach
starting with a root, which is a user selected software engineering Metric (M) and it associated
measurement Primitives (P). The selection of the root metric is normally based on measurement data that
are available to an analyst. Software Defects (D) can then be predicted through a Metric-Defect Model (M-
D Model). The Defect-Reliability Model (D-R Model) further derives SR predictions based on software
defects and the operational profile. Detailed M-D models and D-R models are explained in the next
subclauses.
M-D Models
Software metrics can be directly or indirectly connected to defects information, e.g., the number of defects,
the exact locations of the defects, and types of defects. The connection can be built based on the
measurements of primitives through rigorous measurement rules. For instance, defect information could be
obtained through software quality assurance activities such as formal inspections and peer reviews or
through empirical models. There are three cases of defect information that can be derived from the M-D
models based on the current IEEE study of software metrics (IEEE Std 982.1™).
a) Only the number of defects can be estimated for the current version of software product;
b) The exact content (e.g., the number, location and type) of the defects are known for the current
version of software product;
c) The estimated number of defects in the current version of software product and the exact content of
defect found in an earlier version of the software product are known.
Thirteen such metrics have been investigated and detailed measurements rules for obtaining the preceding
defect information can be found in Smidts [B85]. The RePS proposes three different D-R models for each
of the defect information case as follows.
182
IEEE Std 1633-2016
Figure B.3—Metric-based RePS
B.2.2.1 D-R Model I: Reliability Prediction Model using only the number of defects
As shown in 5.3.2.3 Steps 1 through 5 and in 6.2.2, the General Exponential Model is a popular model for
predicting the failure rate based on an estimate of the number of defects remaining, which is given as Ne,MC.
See Equation (B.1).
λ(i) = Ne,MC (e(–Q(i–1)/TF) – e(-Qi/TF))/Ti (B.1)
Thus, the probability of success over the expected mission time t is obtained using Equation (B.2):
–Ne,MC (e(–Q(i–1)/TF) – e(–Qi/TF))/Ti)

R(τ) = e(–λt) = e ( (B.2)
where
i = time interval such as a week or a month

Ti = expected duty cycle in that interval of time
Q = expected growth rate
TF = expected growth period
Ne,MC = number of defects estimated using specific software metrics [the root metric (M) and the
corresponding support metrics of a RePS] and the M-D model for this RePS (whose outcome is
Ne,MC)
τ = average mission time for the software to complete one cycle of operation
Since a priori knowledge of the defects’ location and their impact on failure probability is not known, the
average growth value given in 5.3.2.3 Step 3 can be used.
B.2.2.2 D-R Model II: Reliability Prediction Model using the exact defect content
When the exact content of the defects is known to the analyst, the failure mechanism can be explicitly
modeled using the propagation, infection, and execution (PIE) theory (Voas [B94]). Per the PIE theory, the
failure mechanism involves three steps: first, defect needs to be observed in execution (E), then the
execution of this defect should infect (I) the state of the execution, and finally the abnormal state change
183
IEEE Std 1633-2016
should propagate (P) to the output of the software and manifest itself as an abnormal output, i.e., a failure.
Failure probability can therefore be estimated using the PIE model:
 j =t /τ N 
=Pf Pr    ( EE ( i, j ) ∩ I E ( i, j ) ∩ PE ( i, j ) )  (B.3)
 =j 1 =i 1 
where
EE(i,j), IE(i,j), and PE(i,j) are the events “execution of the location occupied by the ith defect during the jth
iteration,” “infection of the state that immediately follows the location of the ith defect during the
jth iteration,” “propagation of the infection created by the ith defect to the program output during
the jth iteration,” respectively
τ is the expected mission time
N is the number of defects found
An extended finite state machine model (EFSM) has been proposed and validated (Shi, Smidts [B79]) to
solve Equation (B.3). The EFSM starts with modeling software system using states and transitions. Defects
and their effects on the software are then modeled explicitly as additional states of the EFSM. In addition,
EFSM can incorporate the operational profile (OP), which is a quantitative characterization of the way in
which a system will be used (Musa [B60]). It associates a set of probabilities to the program input space
and therefore describes the behavior of the system. In D-R model II, the software OP is mapped directly in
the EFSM as the probability of transitions.
B.2.2.3 D-R Model III: Reliability Prediction Model using estimate of number of defects in
current version and exact defect content found in earlier version
Model I alone overlooks the available defect content information found in previous versions of the
software. Both Model I and Model II can be used to make use of the information found in previous
versions. More specifically, since the defect location in previous versions of the software is known, the PIE
model can be used first to obtain a software-specific growth rate Q through the propagation of known
defects in an early version of the software system using the PIE theory and the inverse of the General
Exponential Model. That is:
This new calculated Q will be much more accurate than the average Q used in Model I. Once the new
growth rate is obtained, Model I is then used for reliability prediction knowing the number of defects
remaining in the software. This model is thus named as the Combinational Model (Model III).
As part of the validation of the metric-based SR prediction system, twelve software engineering metrics are
investigated and associated detailed M-D model and D-R model are successfully developed. The thirteen
root metrics are: defects per line of code, cause and effect graphing, software capability maturity model,
completeness, cyclomatic complexity, coverage factor, defect density, defect days number, function point
analysis, requirement specification change request, requirements traceability, and test coverage. Further
RePS details can be found in Smidts [B85] and Shi et al. [B78].
B.2.3 Historical defect density data
An organization can define their own defect density prediction model/averages by collecting historical data
from similar projects. The similar historical data should be derived from software releases for similar
products that were deployed in the last decade. The more similar the historical release is to the release
under analysis, the more accurate the estimate. Historical data from similar application types can be
grouped. The historical data should then be calibrated based on the development practices employed for
184
IEEE Std 1633-2016
that historical project. Any of the survey based defect density prediction models can be used to calibrate the
historical data. The checklist for using historical data to predict defect density is shown in Figure B.4.
Predict testing defect density as follows:

a) Collect the total number of discovered defects during any post integration test. The historical
data should be from a release in which all testing is complete and therefore the number of
defects found during testing is an actual as opposed to an estimate.
b) Divide the result of step a) by the actual effective size of the code associated with those testing
defects. Since the code is already complete this will be an actual as opposed to an estimate.
Remember to factor in what part of the code was reused, modified, or new as well as the
language of the historical code. See 5.3.2.3 and B.1 for instructions on calculating the
normalized effective size.
c) Calibrate the result of step b) by analyzing the differences between the historical project and the
current project under analysis by using any of the assessment models in 6.2.1 or B.2.
Predict fielded defect density as follows:
The preceding steps are employed except that in step a) collect the total number of software defects
found in the field for a particular product release that has been fielded for at least 3 years so that the
number of historical defects found in operation is an actual as opposed to an estimate. Continue to
steps b) and c) from above.
Figure B.4—Predict defect density via historical data
Example: As shown in Table B.8, there is historical data from three previously fielded systems that have
the same application and industry type. Release X and release Y were deployed more than 3 years ago.
However, release Z was not. It is removed from the historical data. The CMMI level was the same on all
historical projects—level 2.
Table B.8—Example of historical defect density prediction

Historical release X Historical release Y Historical release Z
Total defects found after
170 200 150
software release was deployed
Total effective KSLOC 100 125 160
Language Hybrid Hybrid Hybrid
Normalized EKSLOC 450 562.5 720
CMMI level 2 2 2
Years since deployed 4 3 2
Calculated fielded defect
0.378 0.356 0.208
density
The historical fielded defect density is 0.367 defects/EKSLOC, which is the average of the defect density
for release X and release Y since release Z has not been operational long enough to be included. It is known
that the current release will be operating at CMMI level 1. The historical defect density of 0.367 is
calibrated using the CMMI lookup tables in 6.2.1.3. Since the average CMMI defect density at level 1 is
0.548 and the average at level 2 is 0.182, one can use that to calibrate the current release to the historical
averages.
Predicted defect density for current release = calibrated historical average = (0.548/0.182) × 0.367 =
3.01 × 0.367 = 1.104 defects/EKSLOC
185
IEEE Std 1633-2016
B.2.4 Rayleigh Model
It has been observed that the distribution of software defects over the life of a particular development
version will resemble a bell shaped (or Rayleigh curve) for virtually all software projects (Putnam [B71])
as illustrated in Figure B.5. Statistically, at the peak about 39% of the defects have been found. If one can
predict the peak one can predict the total defects by simply multiplying the peak by 2.5. The total defects
can be predicted in advance by using industry available databases and tools.
Peak
Figure B.5—Illustrative Rayleigh defect discovery profile over the development stages
B.2.5 RADC TR-92-52
This model was developed in 1992 (SAIC [B77]). Several components of the model are outdated due to the
fact that software engineering and software products have changed significantly since 1992. Interestingly,
the parts of the model that are not outdated are still relevant today as shown in the B.3.
B.2.6 Neufelder Prediction Model
The Neufelder Prediction Model is based on the Quanterion Solutions Incorporated 217Plus™:2015
Reliability Prediction Methodology, as implemented in the Quanterion 217Plus™:2015 Calculator
(Quanterion [B72]). 14 This model provides for a way to predict software defect density using the Process
Grade Factors defined in the 217Plus™:2015 Handbook and Calculator. Hence, the practitioner can predict
software reliability and hardware reliability using one method.
B.3 Factors that have been correlated to fielded defects
This subclause contains a summary of the factors that appear the most often in industry available SR
assessments such as the AMSAA Software Reliability Scorecard, the Rome Laboratories prediction model,
the Shortcut model, and the Full-scale model. Also shown is the percentage of organizations that had
successful, mediocre, and distressed SR when the software was deployed. More than 500 software
characteristics have been correlated to the success (from a reliability standpoint) of the software (Neufelder
[B68]). Table B.10 represents those that are referenced in the most number of predictive models and are the
most sensitive to the outcome of the software reliability, which is shown in Table B.9. One can see that
many of the factors pertain to white box unit testing (5.3.9.2), planning ahead, tracking the progress, test
metrics and test suites (5.3.9, 5.4.6, 5.4.1), subcontractor management (5.3.2.5) and identifying and testing
the exceptions and exception handling (5.4.1.5).
14
217Plus is a trademark of Quanterion Solutions Incorporated. This information is given for the convenience of users of this
standard and does not constitute an endorsement by the IEEE of these products. Equivalent products may be used if they can be
shown to lead to the same results.
186
IEEE Std 1633-2016
Table B.9—Definition of successful, mediocre, and distressed

Actual outcome Average 3-year fielded defect density Average defect removal upon initial
of project (defects per normalized EKSLOC) deployment to field
Successful 0.0269 to 0.111 75% or more
Mediocre 0.111 to 0.647 ≥ 40% but < 75%
Distressed 0.647 and up < 40%
For a summary of the top factors referenced in most SR assessment models and are the most sensitive to
outcome of the release, reference Table B.10.
Table B.10—Summary of top factors referenced in most SR assessment models

Number of Percentage of organizations with
times following outcome for which
Indicator referenced this factor existeda
in a SR
assessment
Software testers start writing the test plan well before
6 1.00 0.82 0.25
software testing begins
One point for at least one domain expert, 2 points for all
software personnel are domain experts, 0.5 points if a 5 1.33 0.86 0.25
domain expert is available only part of the time.
Detection and recovery for I/O faults, HW faults,
computation faults is designed and unit tested from a 5 1.00 0.13 0.13
white box testing perspective
All exceptions during runtime are handled by the
5 0.83 0.13 0.13
software
Regular progress and status reporting during development
5 0.89 0.55 0.20
and testing- performance against plan
During unit testing, the software engineer explicitly tests
5 0.50 0.00 0.00
the paths as well as the module exception handling
The organization avoids reinventing the wheel—they use
5 1.00 0.94 0.60
off the shelf whenever possible
There is a Version Description Document (VDD) 5 0.44 0.37 0.06
Detection and recovery for HW faults is captured by the

5 0.33 0.20 0.13
software requirements
Short term contractors are avoided for line of business
code (contractors are used for code that does not require 5 1.00 0.88 0.80
industry domain expertise)
The software system test plan is formally reviewed 4 1.00 0.71 0.00
187
IEEE Std 1633-2016
Table B.10—Summary of top factors referenced in most SR assessment models

(continued)
Number of Percentage of organizations with

times following outcome for which
Indicator referenced this factor existeda
in a SR
assessment
Test metrics exist and are used 4 1.00 0.71 0.00
Requirements coverage is measured 4 1.00 0.53 0.00
Software development methods and tools are defined 4 1.00 0.42 0.10
The best software LCM is executed with respect to the
4 1.00 0.36 0.11
particular software project
There is a coding standard 4 0.88 0.70 0.00
There are regular status reviews between software system
4 1.00 0.83 0.17
testers and SW management
The requirements documents are kept up to date after
4 1.00 0.89 0.20
development begins
There is a formal means by which to filter, assign
4 0.89 0.58 0.10
priority, and schedule customer requests
There are software requirements for boundary conditions 4 1.00 0.33 0.25
There is software subcontractor management 4 0.67 0.53 0.00

The total schedule time for this software release is less
4 0.80 1.00 0.14
than 1 year
Test beds exist and are used 4 0.83 0.72 0.33
Every phase of the software life cycle is scheduled and
executed for every project. Requirements translation, 4 0.78 0.53 0.30
design, code, unit test, system test.
This particular release is a small/incremental/spiral
4 0.44 0.09 0.00
release (< 10% of existing code changed or added)
Developer unit test plans and results are formally
4 0.67 0.23 0.25
reviewed by a non-peer software subject matter expert.
The customers have direct or indirect access to the defect
4 0.63 0.57 0.22
tracking system (i.e., via helpdesk)
There are formal software requirements reviews 4 0.72 0.55 0.33
The customer requirements are partitioned from SW/FW
requirements (i.e., code is not developed directly from the 4 0.71 0.57 0.44
customer requirements)
a
See Neufelder [B68].
188
IEEE Std 1633-2016
Annex C
(informative)
Additional information on software reliability models used during testing
Prior to using Annex C the reader should select the best reliability growth models as per 5.4.5. Since the
SRG models are dependent on the demonstrated failures per time period and since there are four
possibilities for what the failure trend can be at any given time, there is no “one” best model that works for
all applications. The practitioner needs to organize and analyze the failure data during testing as per 5.4.5
and then select the model(s) that are applicable or provide a good practical solution. The following models
are shown in order of ease of use from easiest to most difficult with regard to the data that needs to be
collected and the calculations that need to be performed.
C.1 Models that can be used when the fault rate is peaking
See 6.3.1.
C.2 Models that can be used when the fault rate is decreasing
C.2.1 Models that assume a linearly decreasing fault rate
All of the following models all assume:
a) The software is tested similarly as to how it will be operated once deployed.

b) No new defects are introduced during the defect correction process.
c) The underlying defects are removed once the faults occur.
d) The fault detection rate is linearly decreasing.
e) The number of inherent defects is finite.
f) The faults are independent of each other.
The estimated defects, current failure rate, current MTBF, and current reliability are shown in Table C.1.
189
IEEE Std 1633-2016
Table C.1—Estimations
Estimated
Estimated current reliability Parameter
Estimated remaining Estimated current
Model current as function of estimation
defects failure rate
MTBF required mission reference
time
Musa Basic N0 – n λ(n) = λ0 (1–(n/N0)) The inverse –( λ(n) × mission time) 6.3.2.3 and
e
Jelinski- λ(n)= k(N0–n) of the or 6.3.2.4
Moranda estimated –( λ(t) × mission time) confidence
e
Goel- λ(t) = N0ke–kt failure rate bounds
Okumoto
Shooman
 t  λ(t) = C.2.1.1
linearly N 0 − Kt  1 −  k[ N 0 −
decreasing  2τ 0 
defect
removal  t 
Kt  1 − ]
 2τ 0 
Duane Infinite λ(t)=btα C.2.1.3
Geometric See NOTE. λ(n)=λ0pn C.2.1.3
Where: n = observed cumulative faults found so far; t = observed total test hours so far; mission time = desired or
required mission time of the software in operation.
NOTE—The Geometric Model assumes infinite defects. However, the removal level is estimated by 1 – λ0n.
The first three following parameters are estimated as per 6.3.2.3. The confidence of the estimates is
estimated as per 6.3.2.4.
k—rate at which faults are decreasing

N0—the estimated number of inherent defects within the software
λ0—estimated initial failure rate of the software
τ0—slope parameter. When t = 2 τ0 the decreasing error removal rate drops to zero. For large τ0 the slope is
gradual.
K—A proportionality constant in the expression relating MTTF to the remaining number of errors
The following parameter is specific to the Geometric Model. Refer to C.2.1.3 for instructions on how to
estimate this parameter.
p—The Geometric Model takes its name from the fact that the term pi is a decreasing geometric sequence
when p is in the interval (0, 1). In the Geometric Model, the removal of the first few defects decreases the
failure rate more significantly, while the removal of later defects decreases the failure rate much less. These
trends often agree with software testing in practice because the defects discovered earlier are often those
that reside on more commonly executed paths and therefore be more likely to occur. Defects discovered
later may be more difficult to detect because they are only exposed when relatively rare combinations of
commands are executed, thereby resulting in a fault. Since pi tends to zero only as i goes to infinity, the
Geometric Model assumes an infinite number of failures, and it is not meaningful to estimate the number of
defects remaining. Instead, the purification level is used to estimate the defect removal.
While each of the models has similar assumptions, Table C.2 shows that some of them have different
assumptions for the likelihood of each fault being observed:
190
IEEE Std 1633-2016
Table C.2—Assumptions
Model Likelihood of each fault being observed
Musa Basic Equal
Jelinski-Moranda Equal
Goel-Okumoto No assumption
Shooman linearly decreasing Equal
defect removal a
Geometric Each defect causes failures with a rate that does not change over time. However,
this rate differs across defects, and the defects with a higher rate tend to be
detected earlier during testing.
a
See Shooman [B82], pp. 254–255.
C.2.1.1 Estimate model parameters for the Shooman Model
There are two parameters to estimate in the defect removal model K and τ0. A simple way to estimate these
parameters is to compute the derivative of the defect removal model using Equation (C.1):
d ( Remaining defects (τ ) )  τ 
= K  1 −  (C.1)
dτ  2τ o 
τa τb
Compute the derivatives at the midpoints of the first two intervals, and τ a + , using Equation (C.2)
2 2
and Equation C.3):
Corrected defects ( ∆τ α )  τ 
= K 1 − a  (C.2)
∆τ a  4τ 0 
τ
Corrected defects ( ∆τ b ) τ a + b
= K (1 − 2 ) (C.3)
∆τ b 2τ 0
Solve these two equations that yield K and τ0
Solve for the parameters k and ET , using Equation (C.4) and Equation (C.5):
Ha 1
MTTF
=α
= (C.4)
ra  τ 
k [ N 0 − K  1 − a ]
 2τ 0 
Hb 1
MTTF
=b

= (C.5)
rb  τ 
k [ N 0 − K  1 − b ]
 2τ 0 
191
IEEE Std 1633-2016
C.2.1.2 Estimate model parameters for Duane’s Model
Plot the observed time between failures versus test hours on log-log paper. Draw a best straight line
through the data. Using the formula for a straight line Y = mX+c.
 α= the slope m
 ln(b) = c = the Y intercept of this plot
C.2.1.3 Estimate parameters for Geometric Model
The parameters λ0 and p are unknown values that are estimated. Once inter-failure times data set such as
the one given in 5.4.5.5 have been collected, one can estimate p with Equation (C.6).
n
( i − 1) ˆ
pn n
∑
= ∑ ( i − 1) pˆ (i−2) xi (C.6)
∑ pˆ xi i 1
n
=i 1 = ˆ
p i
i =1
Where xi is the ith inter-failure time and n is the total number of faults observed. The parameter estimation
can be accomplished with an available SRG tool, a spreadsheet, graphical methods, or numerical
algorithms. Once p has been estimated, λ0 can be estimated by substituting the estimate for p into
Equation (C.7):
ˆ
pn
= (C.7)
∑ i=1pˆ i xi
n
The confidence bounds for this model can be derived similarly to the confidence bounds for the Musa Basic
Model. Instead of estimating the confidence of the inherent defects, one will estimate the confidence of the
initial failure rate parameter λ. Calculate an estimate of λ for each time interval in which there a fault was
observed. There should be as many estimates of λ as there are data points on the graph. For each estimate:
 Establish the desired confidence which is (1–α). If 95% confidence is desired then set α to 5%.
(1–α)/2
 Lower interval = λ – Z √(Var(λ))

(1–α)/2
 Upper interval = λ + Z √(Var(λ))

(1–α)/2
estimates used should become smaller during testing. Confidence intervals are recommended when
specifying or measuring reliability values.
C.2.2 Models that assume a nonlinearly decreasing fault rate
All of the following models all assume the following:

a) The software is tested similarly as to how it will be operated once deployed
b) The fault detection rate is decreasing but not necessarily at a linear rate
c) The faults do not have an equal probability of being found
192
IEEE Std 1633-2016
Table C.3 shows the estimated remaining defects, current failure rate, current MTBF and current reliability
for each model.
Table C.3—Summary of models that assume a nonlinearly decreasing fault rate

Estimated Estimated Parameter
Estimated current Estimated current
Model remaining current failure estimation
MTBF reliability
defects rate reference
Musa Okumoto Not λ(n) = λ0e–θ n (1/θ)×ln((λ0θt)+1) e–( λ(n) × mission time) 6.3.3.3 and
Logarithmic applicable or 6.3.3.4
since e–( λ(t) × mission time)
assumed to
be infinite
Shooman N0 – n λ ( t ) = N 0 e − at 1 e–( MTBF × mission time) C.2.2.1
exponentially
decreasing KN 0 e −ατ
defect removal
Log Logistic N0 – n λκ ( λt )
κ −1 See C.2.2.2. See C.2.2.2. C.2.2.2.
N 0 2
1 + λt κ 
 ( ) 
 
Where
The observed values are: θ = the rate of change of the faults observed; n = observed cumulative faults found so far;
t = observed total test hours so far; mission time = desired or required mission time of the software in operation.
The estimated values are: N0 = the number of inherent defects within the software; λ0 = estimated initial failure rate
of the software.
For the Logarithmic Model refer to 6.3.3.3 for instructions on estimating the parameters. For the Shooman
Model, Duane Model, and Log Logistic Model, the instructions are as follows.
C.2.2.1 Estimate model parameters for Shooman Model
Assume that there are two sets of fault data collected in intervals 0 – τa and 0 – τb . The resulting
estimator formulas are shown in Equation (C.8) and Equation (C.9):
ln{n (τ a )} − ln[n (τ b )]
α = (C.8)
τa −τb
ln [n (τ a )] + ln [n (τ b )] + α (τ a + τ b )
ln[ N 0 ] = (C.9)
2
Once N0 and α are determined, one set of integration test data can be used to determine k, as shown in
Equation (C.10):
1
MTTFa = (C.10)
kNe −ατ a
That yields Equation (C.11):
193
IEEE Std 1633-2016
1 1
=k −ατ a
× (C.11)
N 0e MTTFa
C.2.2.2 Estimate model parameters for Log-Logistic Model
Solve the following system of two simultaneous equations [Equation (C.12) and Equation (C.13)] where:
nλˆκˆ tnκˆ
λˆ = (C.12)
tiκˆ λˆκˆ
(
2 1 + κˆtn κˆ
) ∑ i =1 1 + t κˆ λˆκˆ
n
( ) ( )
κˆ
n n log λˆtn n n λˆti log λˆti
= ˆ
− n log λ − ∑ log ti + 2∑ (C.13)
( )
κˆ
κˆ  κˆ 2

ˆ
1 + λ tn ( )


=i 1 =i 1 1 + λˆti
Then, substitute λ̂ and κ̂ into for Equation (C.14):
( )
κˆ
n 1 + λˆtn 

N0 =   (C.14)
( )
κˆ
λˆt n
C.3 Models that can be used with increasing and then decreasing fault rate
The following model can be used when the fault rate is increasing and then decreasing. However, it should
be pointed out that the models in C.2 can be used if one filters the fault data for only the most recent
segment of data that has a decreasing fault rate.
C.3.1 Yamada (Delayed) S-shaped Model
The Yamada (Delayed) S-shaped model is one of the earliest and simplest models that can fit fault
detection processes that exhibit an S shape. It has only two parameters, making it easier to apply than some
other S-shaped models. Many data sets that cannot be characterized by a model with an exponential exhibit
an S shape.
This model uses times of failure occurrences. This model was proposed by Yamada, Ohba, and Osaki
[B97]. The difference is that the mean value function of the Poisson process is S shaped. At the beginning
of the testing phase, the fault detection rate is relatively flat but then increases exponentially as the testers
become familiar with the program. Finally, it levels off near the end of testing as faults become more
difficult to uncover.
C.3.1.1 Assumptions
The assumptions are as follows:

 The software is operated during test in a manner similar to anticipated usage.
 The failure occurrences are independent and random.
194
IEEE Std 1633-2016
 The initial defect content is a random variable.

 The time between failure i – 1 and failure i depends on the time to failure of failure i – 1.
 Each time a failure occurs, the defect that caused it is immediately removed, and no other defects
are introduced.
C.3.1.2 Estimate failure rate, MTBF, remaining defects, and reliability
The estimated remaining defects =a – n, where n = actual cumulative number of defects found so far in
testing.
The estimated failure rate = ab2te–bt, with both a, b > 0, where b is a shape parameter.
The estimated MTBF is a function of the parameter estimation. The practitioner should rely on an
automated tool to compute this.
− bˆ( t )
The estimated reliability at time t is estimated as R ( t ) = e − aeˆ .
C.3.1.3 Estimate model parameters
The maximum likelihood estimate for the failure detection rate parameter is shown in Equation (C.15):
ˆ e
at
ˆ
2 − bt n
= ∑
n
i i −1 i (
 ( y − y ) t 2 e − btˆ i − t 2 e − btˆ i−1
i −1 )  (C.15)
n

( ˆ
i =1  1 + bti −1 e

−
)
ˆ
bt i −1
(
− 1 + bt i )
ˆ e − btˆ i 

Where n is the number of observation intervals, ti the time at which the ith interval ended, ki the number
yi = ∑ j =1 ti is the cumulative number of faults detected by the
i
of faults detected in the ith interval, and
end of the ith interval. The only unknown in this equation is b. Thus, the estimate of b is the value that
makes the equation on the left equal to zero. The graphical method can be applied here.
Given the estimate of parameter b, the maximum likelihood estimate for the number of faults is shown in
Equation (C.16):
∑
n
k
i =1 i
aˆ =
(C.16)
( ( ˆ e − btˆ n
1 − 1 + bt n ) )
C.3.1.4 Estimate confidence bounds
To estimate the confidence intervals for the model parameters, substitute the numerical parameter estimates
â and bˆ into the following 2×2 matrix of equations and compute the inverse:
−1
x xab 
Σ =  aa
 xab xbb 
195
IEEE Std 1633-2016
where the entries of the matrix are shown in Equation (C.17), Equation (C.18), and Equation (C.19):
xaa =
1 − 1 + bt(
ˆ e − btˆ n
n ) (C.17)
aˆ
ˆ 2 e − btˆ n
xab = bt (C.18)
n
2
ti2 e − btˆ i − ti2−1e − btˆ i−1 
n
ˆ ˆ2 ∑
xbb = ab   (C.19)
i =1 

ˆ
1 + bti −1 e ( ˆ
− bt i −1
) (ˆ
− 1 + bti e − bti 
ˆ
)
The numerical value in Σ1,1 = Var [ aˆ ] , while the value in Σ 2,2 = Var bˆ  . The upper and lower
 
ˆ
confidence intervals for â and b can then be calculated by substituting the estimates for the parameters
and their variances into the following:
aˆ ± Z α Var [ aˆ ]
1−
2
bˆ ± Z Var bˆ 
1−
α  
2
 α
Where Z α is the 1 −  critical value of a standard normal distribution.
1−
2  2
C.4 Models that can be used regardless of the fault rate trend
C.4.1 Weibull Model
C.4.1.1 Model assumptions
This model assumes the following:
a) The defects do not necessarily cause failures with the same rate (i.e., the detection rates of the
defects may differ).
b) Once a defect causes a failure, it is corrected without introducing any new faults.
c) The software is tested similarly to how it will be used in an operational environment.
d) Specifically, it is assumed that the failure rate function has the shape of a Weibull probability
density function. Thus, this model can accommodate cases where the failure rate increases at the
beginning of testing and then decreases.
196
IEEE Std 1633-2016
C.4.1.2 Estimate failure rate, MTBF, remaining defects, and reliability
Estimated remaining defects = a – n
where
n is the observed cumulative number of defects found so far

a is an estimate of the total defects in the software
b and c are shaping parameters. Note that if c = 1, this model reduces to the Goel-Okumoto model.
− kt c c −1
Estimated failure rate = abce t where b and c are shaping parameters to be estimated and t = the
cumulative amount of operational testing so far.
1 
Γ  + 1
The estimated mean time to failure is given by MTTF =
 cˆ  where Γ is the Gamma function.
1
bˆ cˆ
ˆ cˆ
− aˆ  e− bt 
Estimated reliability as a function of t hours is R (t ) = e  
.
C.4.1.3 Estimate model parameters
For failure count data, the maximum likelihood estimates for the parameters b and c are obtained by
solving the following system of two simultaneous equations [Equation (C.20) and Equation (C.21)]:
n ( ˆ cˆ
ki ti cˆ e − bti − ti −1cˆe − bti−1
ˆ cˆ
)−t e
ˆ cˆ
cˆ − bt n
∑
n
k
∑
n i =1 i
ˆ cˆ
− bt ˆ cˆ ˆ cˆ 0
= (C.20)
i =1 e i −1
− e −bti 1 − e −btn
n (
kibˆ ti cˆ ln ti e − bti − ti −1cˆ ln ti −1e − bti−1
ˆ cˆ ˆ cˆ
) − btˆ cˆ ˆ
ln tne − btn
cˆ
∑
n
k
∑
n i =1 i
ˆ cˆ
− bt ˆ cˆ
− bt ˆ cˆ
− bt
0
= (C.21)
i =1 e i −1
−e i
1− e n
Where n is the number of observation intervals, ti the time at which the ith interval ended, and ki the
number of faults detected in the ith interval.
Given the estimates of the parameters b and c, the maximum likelihood estimate for the initial number of
defects is shown in Equation (22):
∑
n
k
i =1 i
aˆ = ˆ cˆ
− bt
(C.22)
1− e n
C.5 Models that estimate remaining defects
In the course of the software development, techniques commonly used to uncover defects before software
testing are peer review, walkthrough, and inspection (IEEE Std 1012™-2012 [B33], IEEE Std 1028™, and
197
IEEE Std 1633-2016
IEEE Std 12207-2008). Although peer review, walkthrough, and inspection differentiate themselves in their
processes and formats (inspection has the most formal process, but peer review and walkthrough are less
formal), they are all intended to discover defects in software artifacts. Results from these activities are
normally defects discovered in the software artifacts and often used to verify whether the entry criteria to
the next development phase are met.
The number of remaining defects escaped from the peer review, walkthrough, or inspection should be
accounted to better control the software development process. A statistical technique, capture/recapture
(CR), has been exploited to estimate the number of remaining defects after the review and the inspection.
The CR model was introduced in biology to estimate the size of an animal population in one closed area. In
doing so, traps are in place, animals are captured (trapped), marked, and released. The captured animals
have chances to be recaptured. The number of captured and recaptured animals correlates to the animal
population in such a way that one can argue the more recaptured, the less population size, and vice versa.
The CR model, however, was developed to statistically estimate the population size.
This CR concept can be transferred to the software engineering world in such a way that defects are
analogous to animals, inspectors/reviewers are analogous of traps. Thus defects discovered are analogous to
animals trapped, total defects in a software artifact are analogous to animal population in a closed area.
CR models assume inspectors (traps) are independent. Researchers in statistics developed multiple CR
models to reflect different assumptions on detection probability (animal trapped, defect detected) and
detection capability (trap captures animal, inspector/reviewer discovers defect). Some CR software, such as
CAPTURE, is available in public domain, to offer the estimates of the total number of defects. Multiple
estimators are provided by this software. Refer to Chao et al. [B9], Rexstad [B76], and White et al. [B96]
for more information.
C.6 Results of the IEEE survey
The following survey was completed by 15 members of the IEEE 1633 Working Group (WG). The results
were used to identify the most popular SR growth models, which were selected in 5.4.5. While the results
from the IEEE 1633 WG are provided, the determination of which model to use should be based on two
key factors: 1) the software model's assumptions are met, and 2) the plot of the actual software failure data
(test or operational data) aligns with the model’s data plot and successfully passes a goodness-of-fit test.
1. What is your primary role?
Reliability/RAM 8
Software engineering 8
Other 2
Note the numbers do not sum to the number of responses (n = 15) because some individuals
indicated both reliability/RAM and software.
198
IEEE Std 1633-2016
2. Have you ever used software reliability growth models (SRGM) in your work?
Yes 12
No 3
80% of respondents have used SRGM previously.
3. Would you ever use a model that is not automated?
Yes 6
Prefer automation/Maybe 3
No 3
4. If you answered yes to question 2, please answer the following.
On a scale of from 1 to 5 with 1 being the best and 5 the lowest, please rank the following models
according to your individual experience. For example, if a particular model has characterized a
data set you analyzed very well you should assign it a score of 1. For models that rarely or never
achieved a good fit to your data, a lower score would be appropriate.
Model n Average Best Worst

1. Bayesian Jelinski-Moranda (BJM) 3 3.67 2 5
2. Brooks and Motley (BM) 3 4.33 4 5
3. Duane (DU) 5 3.20 1 5
4. Geometric (GM) 4 3.00 1 5
5. Goel-Okumoto (GO) 8 1.88 1 4
6. Jelinski Moranda (JM) 4 2.25 1 3
7. Keiller-Littlewood (KL) 3 4.33 3 5
8. Littlewood (LM) 4 2.25 1 4
9. Littlewood Non-Nomogeneous Poisson Process
(LNNHPP) 6 1.83 1 3
10. Littlewood-Verrall (LV) 4 3.25 2 5
11. Lyu-Nikora combination (LNC) 4 3.75 2 5
12. Musa-Okumoto (MO) 7 2.00 1 3
13. Generalized Poisson (PM) 6 2.50 1 3
14. Schneidewind (SM) 8 3.50 2 5
15. Yamada Delayed S-Shaped (YM) 8 2.13 1 3
16. Musa Logarithmic (ML) 5 2.00 1 3
17. Log-logistic (LL) 1 1.00 1 1
18. Schick-Wolverton 1 3.00 3 3
19. Sweep/Raleigh 1 3.50 3.5 3.5
Not all WG members with SRGM experience answered this question (n = 12). Half said they would use a
model that is not automated. The percentage of users of this document that would use a model that is not
automated may be significantly lower.
199
IEEE Std 1633-2016
Annex D
(informative)
Estimated relative cost of SRE tasks
Table D.1 provides for an estimate of the typical relative effort and relative calendar time for each of the
tasks in this document. Some of the tasks require a culture change more than a cost in that it means doing
existing tasks differently as opposed to adding additional tasks. Some require calendar time but not so
much effort. Some require automation but not so much effort. Some tasks may require effort and
automation and calendar time but those are typically not the essential tasks. Recall that prediction and SR
growth models were presented that range from simple to detailed. The practitioner can reduce expenses and
calendar time by choosing the simplest of practices. As with any new engineering practice, it is the most
costly to execute the first time. After the practices have been deployed and people become more familiar
with them, the relative cost generally diminishes.
Key:
L—Low
M—Medium
H—High
V—Varies depending on the scope selected
*The same tools used for hardware reliability can be used for SR.
B—Basic tools such as spreadsheets, etc.
Culture change—Many of the SRE tasks require more of a culture change and less of a cost. For example,
SFMEAs are typically difficult to implement because software engineers have not been trained to view the
failure space. Once they accept the failure space viewpoint the analyses are much easier to perform and
take less time.
Effort—This is in terms of work hours and not necessarily calendar time. Several tasks can be combined
with already existing software tasks. For example, the SFMEA can be combined with an existing
requirements, design or code review. See the last 2 columns to identify which tasks can be merged with
either software development tasks or reliability engineering tasks.
Calendar time—Some tasks do not require a significant amount of work time but do require calendar time.
For example, it takes time for people to get trained on something new.
Requires automation—Some of the tasks are difficult to do without some automated tool. Tasks that require
only basic tools are identified as well as those that can use the same tools that are used for hardware
reliability tasks.
200
IEEE Std 1633-2016
Table D.1—Relative culture change, effort, calendar time, and automation required to
implement the SRE tasks
Culture change
Calendar time
Merge
Automation
Merge with
with
Effort
existing
SRE task existing
reliability
SW
practices?
practices?
Planning for SR
5.1.1 Characterize the software system L L L Yes
5.1.2 Define failures and criticality L L L Yes
5.1.3 Perform a reliability risk assessment M L L Yes
5.1.4 Assess the data collection system L L L
5.1.5 SRP—Software Reliability Plan L-M L-M L- Yes
M
Develop failure modes model
5.2.1 Perform software defect root cause M M L B Yes
analysis
5.2.2 Perform SFMEA M-H V V B Yes
5.2.3 Include software in the system FTA M-H V V B Yes
Apply SR during development
5.3.1 Identify/obtain the initial system reliability L L L Yes
objective
5.3.2 Perform a software reliability assessment M-H M-H M L-
and prediction M
5.3.3 Sanity check the early prediction M L L
5.3.4 Merge the SR predictions into the overall L-M L-M L * Yes
system predictions
5.3.5 Determine an appropriate overall SR L L L Yes
objective
5.3.6 Plan the reliability growth M-H L L L Yes
5.3.7 Perform a sensitivity analysis M M L L
5.3.8 Allocate the required reliability to the M-H L-M L
software LRUs
5.3.9 Employ SR metrics for transition to testing L-M L-M L B
Apply SR during testing
5.4.1 Develop a reliability test suite M-H M-H M Y Yes
5.4.2 Increase test effectiveness via software M-H M-H M Y
fault insertion
5.4.3 Measure test coverage M-H M-H M- Y Yes
H
5.4.4 Collect fault and failure data L L L B
5.4.5 Select reliability growth models M L L
5.4.6 Apply SR metrics L-M L L B
5.4.7 Determine the accuracy of the prediction L-M L L B
and reliability growth models
5.4.8 Revisit the defect root cause analysis M L-M L B
201
IEEE Std 1633-2016
Table D.1—Relative culture change, effort, calendar time, and automation required to
implement the SRE tasks (continued)
Merge
Automation
Merge with
Calendar
with
Culture
change
Effort
existing
time
SRE task existing
reliability
SW
practices?
practices?
Support release decision
5.5.1 Determine release stability M-H L L
5.5.2 Forecast additional test duration M-H L L L
5.5.3 Forecast remaining defects and effort M-H L L L
required to correct them
5.5.4 Perform an RDT M M H L Y
Apply SR in operation
5.6.1 Employ SRE metrics to monitor field SR M-H M L
5.6.2 Compare operational reliability to M-H L L
predicted reliability
5.6.3 Assess changes to previous M-H M L
characterizations or analyses
5.6.4 Archive operational data M L L
202
IEEE Std 1633-2016
Annex E
(informative)
Software reliability engineering related tools
The automation capabilities associated with each of the SRE tasks in this document are discussed in 5.1.5.
Table E.1 shows some of the tool sets from academia and industry that can be used for the SRE tasks. The
following information is given for the convenience of users of this standard and does not constitute an
endorsement by the IEEE of these products. Equivalent products may be used if they can be shown to lead
to the same results.
Table E.1—SRE related tools

SRE Task Automation considerations
5.1.3 Perform a reliability risk assessment The Software Reliability Scorecard assesses
relevant development practices that yield a low,
medium or high risk result. This tool is
available to DoD contractors and US
Government employees.a
5.2.1 Perform software defect root cause Several commercially available defect tracking
analysis tools exist.
5.2.2 Perform SFMEA SFMEA toolkit.
5.2.3 Include software in the system FTA The same tool used for the system FTA should
be used for the software FTA
5.3 Apply software reliability during development
5.3.2 Perform a software reliability Software Reliability Toolkit, Frestimate
assessment and prediction
Size prediction SEER SEM, COCOMO (Constructive Cost
Model)
Defect density prediction Software Risk Master, Software Reliability
Toolkit, Hdbk—217Plus, Frestimate
5.3.3 Sanity check the early prediction Frestimate
5.3.4 Merge the predictions into the overall Several RBD software packages exist for
system predictions hardware. These can be used for software.
5.3.6 Plan the reliability growth AMSAA ACTM and RGTM.
5.3.7 Perform a sensitivity analysis Frestimate, Software Reliability Toolkit,
AMSAA Software Reliability Scorecard.
5.3.8 Allocate the required reliability The same tools that are used for the hardware
predictions can be used for software as long as
they allow software components to be added.
203
IEEE Std 1633-2016
Table E.1—SRE related tools (continued)
SRE Task Automation considerations

5.4 Apply software reliability during testing
5.4.3 Measure test coverage Many test coverage tools exist. Code coverage
tools are usually language specific. Look for
code coverage tools for C, C++, C#, .Net, Java,
PHP.
5.4.4 Collect fault and failure data Several commercial tools exist.
5.4.6 Select reliability growth models Some tools include SRDAT, SWRGCalc v1.3,
Frestimate Manager’s Edition.
Reliability Growth Models, Synthesis ReliaSoft
Reliability Growth Analysis (RGA). b, c
Note that SMERFS and CASRE does not
operate on modern operating systems.
5.4.7 Determine the accuracy of the These tools perform the accuracy calculation—
prediction and reliability growth SWRGCalc V1.3 and Frestimate Manager’s
models Edition
5.5.1 Determine release stability
5.5.2 Forecast additional test duration SWRGCalc V1.3 and Frestimate Manager’s
Edition
5.5.3 Forecast remaining defects and effort SWRGCalc V1.3 and Frestimate Manager’s
required to correct them Edition, Reliability Growth Models, Synthesis
ReliaSoft Reliability Growth Analysis (RGA)
5.5.3 Forecast defect pile up Software Reliability Toolkit, Reliability
Growth Models, Synthesis ReliaSoft Reliability
Growth Analysis (RGA)
5.5.4 Perform an RDT
5.6 Apply software reliability in operation
5.6.2 Compare operational reliability to Reliability growth models, Synthesis ReliaSoft
predicted reliability Reliability Growth Analysis (RGA)
a
The AMSAA Software Reliability Scorecard is available from http://www.amsaa.army.mil/CRG_Tools.html.
b
The Software Reliability Data Analysis Tool (SRDAT) is available for download at http://srt.umassd.edu/.
c
The Software Reliability Growth Tool (SWRGT) was developed by US ARMY ARDEC as an experiment to aid
engineers in software reliability modeling. The tool can be distributed among the software reliability community to
understand the significance and use of reliability models for software projects and its application. However, ARDEC is
not liable to resolve or support any inherent issues found with the use of the tool set or the application of its results.
204
IEEE Std 1633-2016
Annex F
(informative)
Examples
F.1 Examples from 5.1
F.1.1 Example of bill of materials (BOM)
The following is an example of a hardware and software BOM:
Figure F.1—Hardware bill of materials example
205
IEEE Std 1633-2016
Software BOM classes

210
SW
Item BOM
class
SW assembly 210 213 221 210
224
Company
221
owned 232
224
224
COTS 213
231
232
FOSS 224 213 213
213 224 221
License based 232

232
Figure F.2—Software bill of materials example

Key:
Software assembly—Consists of other software LRUs

Company owned—Software LRU is not outsourced
COTS—Commercial-off-the-shelf software
FOSS—Free open sourced software
License based—Requires a software license to use
The following attributes can be applied for each item on the BOM:
Table F.1—Example properties of each software LRU in BOM

Software
Attributes COTS Company owned FOSS
assembly
Version <Version number> <Version number> <Version <Version
number> number>
IP ownership (commercial Commercial In-house
or in-house)
Market type—Mass Mass distributed Several installed Mass distributed Several installed
distributed, single use, etc. sites world wide sites world wide
Candidate for reuse Yes Yes
F.1.2 Example of an operational profile
The example shown in Figure F.3 is a system is a commercial multi-function printer that has printing,
scanning, and faxing capabilities. The software determines what will be printed, faxed, or scanned and
how. For example, it determines the pages, orientation, quality, color, etc. The software is also responsible
for detecting errors in the hardware. The printer is sold only in the US.
The customer profile is determined by analyzing similar recent printers and determining groups of
customers and the percentage of total customers that fall into each group. A similar previous printer was
sold to small businesses 70% of the time and copy shops 30% of the time. Based on the registration data for
the printers, 40% of the small business customers are professionals such as lawyers or accountants, while
60% are high-tech professionals such as engineers and computer programmers. The copy shops have two
users—the customer who walks in the store and wants to use the printer on a page fee basis and the copy
206
IEEE Std 1633-2016
shop employee who is performing a service for a walk-in customer. Based on surveys conducted of the
copy shops the printer is used by walk-in customers 25% of the time and copy shop employees the
remainder.
Customer Small businesses Copy shops

profile 70% 30%
High tech
Professionals (lawyers,
professionals Walk in customer Copy shop employee
User profile accountants, etc.)
(engineers, computer
40% 60% 25% 75%
Scanning
Scanning
Scanning
Scanning
Printing
Printing
Printing
Printing
Faxing
Faxing
Faxing
Faxing
System mode
profile
50% 25% 25% 60% 40% 0% 95% 5% 0% 10% 30% 60%
80% Manual dial
13.50% 100% Manual dial

11.20% 80% Legal size
5% Legal size
15% Legal size
15% Legal size

50% Auto feed
10.08% 60% Auto feed
0% Auto feed
50% Auto feed

20% Auto dial
0% Auto dial
50% Manual
40% Manual
0.38% 100% Manual
50% Manual
20% 8.5x11
16.38% 65% 8.5x11
70% 8.5x11
70% 8.5x11
0% 11x17
30% 11x17
15% 11x17
15% 11x17
Functional
profile
0.00%
2.80%
3.50%
3.50%
1.40%
5.60%
1.26%
7.56%
6.72%
1.07%
1.07%
4.99%
0.00%
0.34%
0.34%
1.58%
3.38%
3.38%
0.00%
Operational
profile
Figure F.3—Example of an operational profile

The printer has three basic modes of operation: printing, scanning, and faxing. Based on surveys and
service logs the professionals are printing 50% of the time, scanning 25% of the time, and faxing 25% of
the time. The high-tech professionals, however, are printing 60%, scanning 40% but not using the fax. The
user groups have indicated that the high-tech professionals choose to have electronic fax machines or
email. The walk-in customers at the copy shop are printing most of the time and occasionally scanning
while the copy shop employee is mostly faxing 60% of time, scanning for customers 30% of time, and
printing for customers 10% of the time. This is because the walk-in customers can print without an
assistants help but cannot fax without an assistants help.
The functions are printing in legal size, 11x17, and 8.5x11. The auto feeder supports scanning of multiple
page documents while the manual scan is used for one-page document or pictures. The fax can be initiated
via an auto dial or by manually dialing. The auto dial is good for several fax jobs while the manual fax is
good for a one-time fax to one recipient. Based on interviews with end users and the past service logs the
percentage of time that each of the end users at each of the customer sites is performing a particular
function has been identified as per Figure F.3.
Finally, when the profile is complete, the OP is computed for each customer type, end-user type, function
mode, and function by multiplying the probabilities. Not surprisingly printing in 8.5x11 is a high
probability function.
F.1.3 Example of including software in the system design
Example 1:
It is required that a system have water flow at 8 gal a minute upon command, and that the water stop
flowing upon command and/or if the level reaches a necessary level. Water level is critical and has to be
207
IEEE Std 1633-2016
supplied when too low and also cannot overflow. The valve is located 100 m from tank receiving the water.
As this is a critical function, for software it is a has-to-work function and the system (with the human
operator) needs to be of the highest reliability.
Solution 1:
 HW: Single setting valve (on/off) w. electronic actuator (highly reliable); two highly reliable
sensors: water detector at maximum height—allowable flow from valve over the 100 m, water
detector at minimum height.
 SW: Upon command from operator (who monitors water level) activates the valve to open with
three consecutive open commands; upon detection of high water mark or command from operator,
shuts off valve (again w three consecutive close commands).
Some improvements to this solution are:
Solution 2:
 HW: Single setting valve (on/off) w. electronic actuator (highly reliable); six highly reliable
sensors: three water detectors at maximum height minus allowable flow from valve over the
100 m, three water detectors set 1 m to 2 m above minimum height.
 SW: Upon command from operator (who monitors water level) activates the valve to open with
three consecutive open commands OR upon detecting and reporting two or more low water level
sensors, turns the valve on and alerts the operator; upon detection of two or more high water mark
sensors, or at the command from operator, reports high water level achieved to operator and shuts
off valve (again with three consecutive close commands).
While both solutions increase the reliability of sensing and responding to a low or high water situation, the
complexity of the software has increased to a minor degree with the addition of the voting scheme and
algorithm for the extra sensors and the closed loop reporting. However, even the following third solution
may not be optimal if the system is safety critical such as would be the case if the cooling water was needed
for the rods in a nuclear reactor.
Solution 3:
– Hardware:
 Single setting valve (on/off) w. electronic actuator (highly reliable);
 CHANGE: Three reliable float tank level detectors with switches at the minimum and
maximum levels to continuously monitor and report on water level;
 Two separate water detectors at maximum height minus allowable flow from valve over the
100 m;
 Two separate water detectors set 1 m to 2 m above minimum height.
 ADD: A flow detector to assure valve opened and water is flowing at expected volume.
– Software:
 Monitor water level of tank and report.
 Upon command from operator or a signal from either the tank level detector or the low water
sensors, command the valve to open,
 Monitor for appropriate flow from the flow sensor upon an open command from any source.
 If flow is not detected within 10 s, the send an emergency alert to Operator and the plant.
 If flow is below a specified level insufficient within 10 s to 20 s, the send an emergency
alert to Operator.
 Upon command OR upon detecting approach of the tank high-level mark OR or upon two or
more high water level sensors, turn the valve OFF.
208
IEEE Std 1633-2016
 Monitor flow sensor level and determine if the valve is shut.

 Alerts the operator upon detection of valve commanded to Closed and flow still detected
after 20 s.
In the preceding example both the software and the hardware design evolved. The software could not detect
and react to a low or high water situation without additional sensors. The problem of no flow could go
undetected without the closed loop feedback provided by the flow sensor and the tank level monitoring.
From a reliability aspect, the simpler the software is, the more reliable it likely is. However, when software
is chosen as a means to detect, identify, and respond (recover, reduce functionality, report, restart, etc.) to
hardware malfunctions, then the hardware impacts the software design and software may need to make
additional requirements on the hardware such as additional and smarter instrumentation (sensors and
actuators).
When software itself may fail, additional hardware may also be needed to improve the reliability of the
system. While redundant software (multiple implementations of the same requirements) is usually
considered to add so much complexity that it lowers the desired reliability, there are times and places where
back-up software of reduced functionality may still be warranted, perhaps the use of a fail over to an FPGA
or ASIC upon detection of a software failure. Watchdog timers are frequently used to monitor critical
software and restart or put the system into a hold state if the software does not keep its heartbeat resetting
the timer. Hardware and software engineers should work together to find the best system solution.
F.2.1 SFMEA Example
This example is from a software application that estimates the reliability of software using a variety of
software reliability growth models.
F.2.1.1 Prepare the SFMEA
The only available artifacts for the software are an overview and a set of high-level requirements.
F.2.1.1.1 Overview
The application will automate several software reliability growth models. Previously the CASRE tool had
been in use but it no longer works on modern operating systems. CASRE also was prone to crashing
whenever the input data was not as the tool expected. For example, if the failure rate is increasing or is
increasing and decreasing CASRE often crashed without any notification to the user.
The software reliability tool runs on a Linux virtual machine hosted by the University of Massachusetts
(UMass) Dartmouth Computing Information Technology Services (CITS). When ready, the tool will be
accessible online. 15 However, at this time, it is restricted to individuals with access to the University's
virtual private network (VPN) and beta testers who have shared the MAC address of their machine.
The application functionality has been developed in the R programming language and the graphical user
interface (GUI) exposes this functionality through Shiny, a web application framework for R. R is a
procedural programming language that enables modularization. These modules include sets of functions to:
1) manage various input failure data formats, 2) perform trend test calculations, 3) identify the maximum
15
This tool will be accessible at http://srt.umassd.edu/.
209
IEEE Std 1633-2016
likelihood estimates of software reliability models, 4) determine testing time requirements for fault
detection and reliability targets, and 5) compute goodness of fit measures.
The Shiny application framework defines GUI components such as layouts, including tabs, frames, text
boxes, combo boxes, radio buttons, input text boxes, sliders, and buttons, as well as graph objects. These
layouts are defined with scripts and the underlying R functionality is included in a manner similar to
general purpose programming language header file inclusion preprocessor directives. This enables the R
functions to be invoked by events triggered within the Shiny GUI.
Specification
a) The user will be presented with a GUI consisting of four tabs to 1) select analyze and filter data, 2)
set-up and apply models, 3) query model results, and 4) evaluate models.
b) The first tab (select analyze and filter data) allows the user to:
1) Specify an input file with inter-failure, failure time, or failure count data in Excel® or CSV
format. 16
2) Plot the data as time between failures, failure rate, or cumulative failures by selecting one of
these options from a combo box.
3) Execute test such as the Laplace trend test and running arithmetic average to assess if the data
set exhibits reliability growth.
4) Save plots in various image file formats by clicking a button and specifying a name within a
file dialog box.
c) The second tab (set-up and apply models) allows the user to:
1) Select a subset of the data to which models will be applied.
2) Indicate the prefix of the data that will be used to estimate parameters.
3) Select one or more models (Jelinski-Moranda, geometric, exponential, Yamada delayed
S-shaped, and Weibull) from a list and estimate their parameters by clicking a button.
4) Plot the data and model fits as time between failure, failure rate, or cumulative failures by
selecting one of these options from a combo box.
d) The third tab (query model results) allows the user to:
1) Estimate the time required to observe k additional failures.
2) Estimate the number of failures that would be observed given an additional amount of testing
time.
3) Estimate the additional testing time required to achieve a desired reliability given a fixed
mission time.
e) The fourth tab (evaluate models) allows the user to:
1) Apply goodness of fit measures such as the Akaike information criterion (AIC) and predictive
sum of squares error (PSSE).
2) Rank models in a table according to their performance on goodness of fit measures, while also
reporting raw numerical values of these measures.
The available personnel for this SFMEA are two software reliability subject matter experts. Both are
experts with SRG models and one is an expert in SFMEA. Since there is no design documentation and no
design engineers available for the analysis, the interface and detailed SFMEA viewpoints can be
16
Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries. This information is given for
the convenience of users of this standard and does not constitute an endorsement by the IEEE of these products. Equivalent products
may be used if they can be shown to lead to the same results.
210
IEEE Std 1633-2016
eliminated. Since this is a new software system, the maintenance SFMEA can be eliminated from scope.
The serviceability is not applicable at this time since no installation scripts exist. Since there are no defined
use cases the usability viewpoint is eliminated. Since this is a brand new software version, there is no past
history for which to perform the process SFMEA. The only applicable viewpoint is therefore the functional
viewpoint.
There are no safety-related or safety-critical features. Since there is no existing code there is no assessment
of which parts of the software are high risk from a development point of view. The project manager is most
concerned with the stability of the numerical routines to fit models and the GUI logic that could lead to
mishandling of data. So, the risk assessment is based on this concern and on the features that are most
critical for the functionality required by the user.
The failure modes from CASRE are identified as follows and the FDSC is developed from this.
 The calculations overflow

 The calculations do not work for all data sets
 The user is not advised when a calculation is not possible
 The calculations are correct but are not accurate
 The results of the wrong model are displayed
The failure definition and scoring criteria (FDSC) is defined as follows. There are 3 levels of criticality:
a) The results are not accurate; the results cause an overflow; no results are generated when the
selected model should generate results; the software crashes prior to generating a result, results are
generated when the model should not be used; the results of the wrong model are displayed.
b) The software takes too long to generate a result or the user has to perform too much manual labor
to use the software.
c) Any other defects.
The likelihood is identified as follows:
d) Virtually certain to happen to at least one user

e) Very likely to happen to at least one user
f) Unlikely to happen to any one user
It is decided that the mitigation will be based on this RPN:
 1 or 2—Will mitigate
 3 or 4—Should mitigate
 5 or 6—Will mitigate if time allows
The specifications are analyzed to determine which are more critical than the others. The analysts
determine that the flow of data that leads to the results starts with tab 1 and ends at tab 4. Hence any serious
defects in tab 1 will affect the results of every other tab. Since the results is the most critical and since past
history on the CASRE tool indicates that most of the problems were due to a lack of filtering on the input
data, it is determined that the following two functional requirements are at this point in development the
most critical.
 Specify an input file with either inter-failure, failure time, or failure count data in Excel or CSV
format.
211
IEEE Std 1633-2016
 Execute test such as the Laplace trend test and running arithmetic average to assess if the data set
exhibits reliability growth.
F.2.1.2 Analyze failure modes
The functional SFMEA template from Table A.5 is copied into the SFMEA worksheet. The two
specification statements are copied into the first column. The applicable functional failure modes are also
copied into the worksheet Table F.2 Step 1.
Table F.2—SFMEA Step 1

SRS
Related SRS
statement SRS statement text Description Failure mode Root cause
Statements
ID
2a Specify an input file None The user selects a Faulty Create a row for
with either inter- CSV or Excel file functionality each applicable
failure, failure time, that has failure Faulty timing root cause
or failure count data data in one of Faulty sequencing related to this
in Excel or CSV three supported Faulty data failure mode
format. formats. Faulty error
handling
The five failure modes are analyzed with respect to this requirement and it is determined that “Faulty data”
and “Faulty error handling” are the most applicable. Three root causes of faulty error handling are
identified while brainstorming. The file may be valid but not have failure data in it. The file may also be in
use or the file may have more than one data format in it. Five possible root causes for faulty data are also
identified. Regardless of the format the software expects the cumulative failure count to be increasing over
time. If it is decreasing that would indicate faulty data. Similarly test time can also not be decreasing and
the time between failures cannot be non-positive. There also cannot be zero data points. At the other end of
the spectrum there have to be enough data points for the software to analyze. At this point it is assumed that
2 or 3 data points are needed as a minimum, but this will be revisited in the mitigation subclause The
results of this step are shown in Table F.3 SFMEA Step 2.
212
IEEE Std 1633-2016

SRS SRS Related
ID text Statements
2a Specify an None The user Faulty error handling The user is allowed to
input file selects a select file that does not
with either CSV or have any valid format. It is
inter-failure, Excel file a CSV or Excel file but it
failure time, that has does not have failure data
or failure failure data in it.
count data in in one of Faulty error handling The input file is already in
Excel or three use.
CSV format. supported Faulty error handling The end user has more
formats. than one format in the
same file.
Faulty data Failure count is not
increasing.
Faulty data Time is not increasing.
Faulty data Inter-failure time is not
positive.
Faulty data There are 0 data points.
Faulty data There are fewer than
(minimum required) data
points.
Next the faulty sequencing failure mode is analyzed. The overall results will be erroneous if the Laplace
test is coded such that the code that does the Laplace test happens to be executed before the code that
checks the data format. Since there is no design and no data flow diagrams then it is assumed that the code
can be developed with the wrong sequence of operations. Next the faulty data failure mode is analyzed.
From past history on CASRE it is known by the experts that if the input data has both decreasing and then
increasing reliability growth that is in a U shape that the requirement fail. It can generate both a false
positive and a false negative. Similarly, the data may be N-shaped in that it may be increasing and then
decreasing reliability growth. It is also known that the reliability growth trend may be S shaped, which is
both U and N shaped. That shape can be increasing to decreasing.
The results from this step are shown in Table F.4.
213
IEEE Std 1633-2016

SRS SRS Related
ID text Statements
2a Execute None The Faulty error False positive result.
test such as software handling
the Laplace checks the Faulty error False negative result.
trend test reliability handling
and growth Faulty error Reliability growth is neither
running using the handling positive nor negative.
arithmetic Laplace test Faulty data No result is generated at all.
average to to provide Faulty timing It takes too long to generate a
assess if the for positive result (too many data points).
data set reliability Faulty sequencing Software erroneous runs
exhibits growth. Laplace test before data format
reliability is checked.
growth. Faulty data U-shaped data has both + and –,
which generates false positive.
Faulty data U-shaped data has both + and –,
which generates false negative.
Faulty data N-shaped data causes false
positive.
Faulty data N shaped data causes false
negative.
Faulty data S shaped (U and N)
Faulty data Decreasing S shaped.
Faulty data Increasing S shaped.
F.2.1.3 Identify consequences
The analysts proceed to identifying the consequences of requirement 2a. The local effect is the effect on the
software itself. The effects on the software itself are either a software crash or the Laplace transform is not
working. At the system level the effects are either that the user needs to select another file or that the results
will be unpredictable. As defined in the FDSC unpredictable results is a severity 1 defect while
inconvenience is a severity 3. The likelihood is analyzed for each, based on expert knowledge. The RPN is
then computed. There are four rows that would require mitigation and two rows that should be mitigated.
The results of this step are shown in Table F.5.
214
IEEE Std 1633-2016
Likelihood
Potential R
Severity
Local Preventive
failure Potential root cause System effect P
effect measures
mode N
Faulty The user selects a file that does Software The user has an Inconvenience 3 2 6
error not have any valid format. It is a crashes opportunity to
handling CSV or Excel file but it does select a valid
not have failure data in it. file.
Faulty The input file is already in use. Software The user may not Inconvenience 3 2 6
error crashes know that the file
handling is in use or why
the software
crashed.
Faulty The end user has more than one Laplace Unpredictable Confidence 1 3 3
error format in the same file. might not outcome on level on data
handling work Laplace test.
Faulty data Failure count is not increasing. Laplace Unpredictable Confidence 1 2 2
might not outcome on level on data
work Laplace test.
Faulty data Time is not increasing. Laplace Unpredictable Confidence 1 2 2
work Laplace test.
Faulty data Interfailure time is not positive. Laplace Unpredictable Confidence 1 2 2
work Laplace test.
Faulty data There are 0 data points. Laplace Unpredictable Confidence 1 3 3
work Laplace test.
Faulty data There are fewer than (minimum Laplace Unpredictable Confidence 1 2 2
required) data points. might not outcome on level on data
work Laplace test.
Requirement 2c is analyzed for consequences. For each failure mode and root cause the effects on the
software are either a wrong or missing result. The effect on the user is that they are either allowed to use a
model that they should not use or they are not allowed to use a model that they can use. The likelihood is
assessed based on expert knowledge of the input data. There is one failure mode and root cause that has an
RPN of 1 and all are required for mitigation.
The results of this step are shown in Table F.6.
215
IEEE Std 1633-2016
Likelihood
Potential R
Severity
Local Preventive
failure Potential root cause System effect P
effect measures
mode N
Faulty error False positive result Result is The user will be allowed to Confidence 1 2 2
handling wrong use the models when they level on
should not data
Faulty error False negative result Result is The user is not allowed to use Confidence 1 2 2
handling wrong the model and should be level on
data
Faulty error Reliability growth is Result is The user will be allowed to Confidence 1 2 2
handling neither positive nor wrong use the models when they level on
negative should not data
Faulty data No result is generated at No result The user will be allowed to Confidence 1 2 2
all use the models when they level on
should not data
Faulty It takes too long to No result The user will be allowed to Confidence 1 2 2
timing generate a result (too use the models when they level on
many data points) should not data
Faulty Software runs Laplace No result The user will be allowed to Confidence 1 2 2
sequencing test before data format use the models when they level on
is checked should not data
Faulty data U-shaped data has both Result is The user will be allowed to Confidence 1 2 2
+ and –, which wrong use the models when they level on
generates false positive should not data
Faulty data U-shaped data has both Result is The user is not allowed to use Confidence 1 2 2
+ and –, which wrong the model and should be level on
generates false negative data
Faulty data N-shaped data causes Result is The user will be allowed to Confidence 1 1 1
false positive wrong use the models when they level on
should not data
Faulty data N-shaped data causes Result is The user is not allowed to use Confidence 2 1 2
false negative wrong the model and should be level on
data
Faulty data S shaped (U and N) Result is Unpredictable outcome on None 1 2 2
wrong Laplace test
Faulty data Decreasing S shaped Result is Unpredictable outcome on None 1 2 2
wrong Laplace test
Faulty data Increasing S shaped Result is Unpredictable outcome on None 1 2 2
wrong Laplace test
F.2.1.4 Identify mitigation
The corrective action for each of the failure modes and root causes is analyzed one at a time. The RPN is
adjusted for any item that can be mitigated. Several columns have been removed to fit page width. Most of
the failure modes and root causes can be fixed by either a change to the specification or by testing that
scenario and then modifying the specification, design and code appropriately. One of the root causes cannot
be corrected so it will be clearly identified in the user’s manual to reduce the risk of it happening. There are
216
IEEE Std 1633-2016
several corrective actions that are similar. In the next step these will be consolidated. The results for this
step are shown in Table F.7.
Likelihood
R
Severity
Compensating
Potential root cause System effect Corrective actions P
provisions
N
The end user has Unpredictable It is difficult for the software The user can fix 1 3 3
more than one format outcome on to detect if the user mixed the file but only if
in the same file Laplace test formats so make it clear in they know about
the user manual not to do it.
this.
Failure count is not Unpredictable Modify the spec to define The user can fix 1 4 4
increasing outcome on what the software should do the file but only if
Laplace test if the fault count is irregular. they know about
it.
Time is not Unpredictable Modify the spec to define The user can fix 1 4 4
increasing outcome on what the software should do the file but only if
Laplace test if the time count is irregular. they know about
it.
Interfailure time is Unpredictable Modify the spec to identify The user can fix 1 4 4
not positive outcome on what the software should do the file but only if
Laplace test in this case. they know about
it.
There are 0 data Unpredictable Modify the spec to identify The user can fix 1 4 4
points outcome on what the software should do the file but only if
Laplace test in this case. they know about
it.
There are fewer than Unpredictable Identify the fewest number The user can fix 1 4 4
(minimum required) outcome on of data points that can be the file but only if
data points Laplace test used for a trend and then they know about
write code to advise user that it.
they do not have enough data
points.
False positive result The user will be Run many sets of different None. 1 4 4
allowed to use data and verify the output of
the models when the Laplace independently of
they should not the other results.
False negative result The user is not Run many sets of different None 1 4 4
allowed to use data and verify the output of
the model and the Laplace independently of
should be. the other results.
Reliability growth is The user will be Modify the code to handle None 1 4 4
neither positive nor allowed to use this case.
negative the models when
they should not.
No result is The user will be Test many data sets to see if None 1 4 4
generated at all allowed to use this ever happens
the models when
they should not.
It takes too long to The user will be Test very large data sets to None 1 4 4
generate a result (too allowed to use see how long it takes to get
many data points) the models when an answer
they should not
217
IEEE Std 1633-2016
Table F.7—SFMEA Step 6 (continued)
Likelihood
R
Severity
Compensating
provisions
N
Software runs The user will be Define a transaction or data None 1 4 4

Laplace test before allowed to use flow diagram to make the
data format is the models when design more clear
checked they should not
U-shaped data has The user will be Define the set of possible None 1 4 4
both + and –, which allowed to use data trends instead of
generates false the models when assuming that the data is
positive they should not perfect
U-shaped data has The user is not Define the set of possible None 1 4 4
both + and –, which allowed to use data trends instead of
generates false the model and assuming that the data is
negative should be perfect
N-shaped data causes The user will be Define the set of possible None 1 4 4
false positive allowed to use data trends instead of
the models when assuming that the data is
they should not perfect
N-shaped data causes The user is not Define the set of possible None 1 4 4
false negative allowed to use data trends instead of
the model and assuming that the data is
should be perfect
S-shaped (U and N) Unpredictable Define the set of possible None 1 4 4
outcome on data trends instead of
Laplace test assuming that the data is
perfect
Decreasing S shaped Unpredictable Define the set of possible None 1 4 4
perfect
Increasing S shaped Unpredictable Define the set of possible None 1 4 4
perfect
The user selects a The user has an Fix the code to test for this The user can 3 4 12
file that does not opportunity to select a valid file
have any valid select a valid
format. It is a CSV file.
or Excel file but it
does not have failure
data in it.
The input file is The user may Fix the code to test for this The user can 3 4 12
already in use not know that close the file and
the file is in use reopen
or why the
software crashed
F.2.1.5 Generate CIL
The SFMEA is now sorted in order of RPN. The corrective actions that are similar are consolidated. There
are now a total of 12 corrective actions of which 10 will be mitigated. The final SFMEA is shown in
Table F.8.
218
IEEE Std 1633-2016
Likelihood
R
Severity
Compensating
provisions
N
The end user has more Unpredictable It is difficult for the The user can fix 1 3 3
than one format in the outcome on software to detect if the the file but only if
same file Laplace test user mixed formats so they know about it.
make it clear in the user
manual not to do this.
Failure count is not Unpredictable Modify the spec to define The user can fix 1 4 4
increasing outcome on what the software should the file but only if
Laplace test do if the fault count is the user knows
Time is not increasing
irregular. about it.
Interfailure time is not

positive
There are 0 data
points
There are fewer than Unpredictable Identify the fewest The user can fix 1 4 4
(minimum required) outcome on number of data points the file but only if
data points Laplace test that can be used for a the user knows
trend and then write code about it.
to advise user that they
do not have enough data
points.
False positive result The user will be Run many sets of None 1 4 4
allowed to use the different data and verify
models when the the output of the Laplace
user should not independently of the
other results.
False negative result The user is not Run many sets of None 1 4 4
allowed to use the different data and verify
model and should the output of the Laplace
be independently of the
other results.
Reliability growth is The user will be Modify the code to None 1 4 4
neither positive nor allowed to use the handle this case.
negative models when the
user should not
No result is generated The user will be Test many data sets to None 1 4 4
at all allowed to use the see if this ever happens.
models when the
user should not
It takes too long to The user will be Test very large data sets None 1 4 4
generate a result (too allowed to use the to see how long it takes
many data points) models when the to get an answer.
user should not
219
IEEE Std 1633-2016
Table F.8—SFMEA Step 7 (continued)
Likelihood
R
Severity
Compensating
provisions
N
Software runs Laplace The user will be Define a transaction or None 1 4 4

test before data format allowed to use the data flow diagram to
is checked models when the make the design more
user should not clear.
U-shaped data has The user will be Define the set of possible None 1 4 4
both + and –, which allowed to use the data trends instead of
generates false models when the assuming that the data is
positive user should not perfect.
N-shaped data causes
false positive
U-shaped data has The user is not
both + and –, which allowed to use the
generates false model and should
negative be
N-shaped data causes
false negative
S-shaped (U and N) Unpredictable
Decreasing S shaped outcome on
Increasing S shaped Laplace test
The select a file that The user has an Fix the code to test for The user can select 3 4 12
does not have any opportunity to this. a valid file.
valid format. It is a select a valid file.
CSV or Excel file but
it does not have
failure data in it.
The input file is The user may not Fix the code to test for The user can close 3 4 12
already in use know that the file this. the file and reopen.
is in use or why
the software
crashed.
F.3.1 Reliability specification
Problem: A system is being conceived that will be a successor to an existing system. The existing system
was first deployed 7 years ago. The average MTBF was 300 h for the entire system. Software failures were
40% of the total failures for the existing system. The goal is to derive a system MTBF objective for the new
system.
Predicted result: On the existing system one can compute the software and hardware MTTF by applying the
40% and 60% to the known 300 hardware MTBF. Therefore the existing system’s software average MTBF
is 750 h while the hardware MTBF is averaging 500 h. Since the software was developed 10 years ago, one
will first compute its relative size when compared to the existing system. If the software size grows 10% a
year it will be about 2 times larger after 7 years. Size is inversely proportional to MTBF. Hence that means
that the software MTBF will likely be 375 h if the development practices and all other parameters remain
220
IEEE Std 1633-2016
the same other than the size. If one assumes that the hardware MTBF will be approximately the same as the
existing system, then the new specification is approximately equal to: 1/((1/375) + (1/750)) = 214 h.
F.3.2 SRE predictions
These examples are for 5.3.2.3, Steps 1 through 5, and for 6.2.
Step 1. Predict the defect density. The Shortcut model was selected to predict defect density because it
has more parameters than the lookup tables, which allows for sensitivity analysis. This example shows the
reliability figures of merit for software that is the very first version for a particular product. Both the
hardware and software are brand new. There is no reused code. There are 12 software engineers who all
report to one software lead engineer. Some of the software engineers are in California and some are in
Texas. All 12 software engineers have been with the organization for several years and no turnover is
expected. There are not any short-term contractors or subcontractors. There is one COTS vendor. The
software is a military system that will be operated by persons who typically have a high school degree or
Bachelor’s degree. The system will be used almost anywhere in the world, and it can take days to reach the
equipment to perform a software upgrade. The time between the start of requirements and deployment is
expected to be 2 years. According to the software development plan, there will be a waterfall type software
life cycle. Software engineers are required to test their own code prior to delivery to software and systems
testing and the software testing starts when all code is complete. Table F.9 shows an example set of
answers from filling out the Shortcut Model Survey.
Table F.9—Example of the Shortcut Model Survey

Strengths
1 We protect older code that shouldn't be modified. NA
2 The total schedule time in years is less than one. No
3 The number of software people years for this release is less than seven. No
4 Domain knowledge required to develop this software application can be acquired via public domain in No
short period of time.
5 This software application has imminent legal risks. No
6 Operators have been or will be trained on the software. Yes
7 The software team members who are working on the same software system are geographically co- No
located.
8 Turnover rate of software engineers on this project is < 20% during course of project. Yes
9 This will be a maintenance release (no major feature addition). No
10 The software has been recently reconstructed (i.e., to update legacy design or code). No
11 We have a small organization (<8 people) or there are team sizes that do not exceed 8 people per team. No
12 We have a culture in which all software engineers value testing their own code (as opposed to waiting for Yes
13 We manage subcontractors—Outsource code that is not in our expertise, keep code that is our expertise Yes
in house.
14 There have been at least four fielded releases prior to this one. No
15 The difference between the most and least educated end user is not more than one degree type (i.e., No
bachelors/masters, high school/associates).
Risks
1 This is brand new release (version 1), or development language, or OS, or technology Yes
(add one for each risk).
2 Target hardware/system is accessible within 0.75 points for minutes, 0.5 points for hours, 0.25 points for 0.25
days, and 0 points for weeks or months.
3 Short term contractors (< 1 year) are used for developing line of business code No
4 Code is not reused when it should be NA
5 We wait until all code is completed before starting the next level of testing Yes
6 Target hardware is brand new or evolving (will not be finished until software is finished) Yes
7 Age of oldest part of code >10 years No
221
IEEE Std 1633-2016
Number of strengths = 4.0 and number of risks = 3.25
Net Result = 4 – 3.25 = 0.75

Predicted defect density is therefore 0.239 defects/normalized EKSLOC
The size is predicted so as to yield the total predicted defects.
All of the code is new and is predicted to be 102 KSLOC of object oriented code. The EKSLOC is
therefore 102 since all code is new. The normalized EKSLOC = KSLOC × language type factor =
102 × 6 = 612. There is one COTS component, which is 13 529 kB installed. It has been mass deployed for
2 years. As per B.1.2, the effective normalized EKSLOC is therefore = kB (13 529) × mass produced factor
(0.01) × 0.051 (conversion from kB to KSLOC) × 3 (language factor) = 20.699 EKSLOC. The total
normalized EKSLOC is therefore 632.699. The COTS vendor has an unusually good relationship with the
development team and has also answered the shortcut survey and had the same result of
0.239 defects/KSLOC.
The total predicted defects = 632.699 × 0.239 = 151.011 defects. Obviously defects cannot be a fractional
value but the fractional value is retained so that the estimated defects per month can be accurately estimated
as shown next.
Step 2. The fault profile is predicted. The exponential model discussed in 6.2.2.1 is employed because
typical growth rates for the Duane Model or the AMSAA Crow Model are unknown. The growth rate (Q) is
estimated to be 6 and the growth period is estimated to be 48 months since there will be several installed
sites but the software will not be mass deployed. The next feature drop is schedule one year after the
deployment of this version. Hence, the reliability growth is limited to only one year.
The predicted fault profile over the 12 month growth period for each month i =
151.111 × (e(–6×(i-1)/48) – e(–6×i/48))
Table F.10 shows the total faults predicted for each month of growth.
Table F.10—Predicted fault profile

Month after Total faults predicted for Total faults predicted for
Month after delivery
delivery this month this month
1 17.8 7 8.4
2 15.7 8 7.4
3 13.8 9 6.5
4 12.2 10 5.8
5 10.8 11 5.1
6 9.5 12 4.5
Step 3. The failure and MTBF, MTBCF are predicted. The duty cycle for the first 12 months of this
software release is expected to be continuous. Based on past history the 10% of all operational faults were
of a critical severity. There are not estimates of the percentage of faults that historically resulted in system
aborts or essential function failures. The predicted faults to be observed for each month are divided by the
predicted duty cycle for each month to yield the predicted failure rate as shown in Table F.11.
222
IEEE Std 1633-2016
Table F.11—Example of failure rate prediction
Total faults Predicted Predicted Predicted

Month after Predict failure
predicted for this duty cycle MTBF MTBCF
delivery rate per hour
month (h) (h) (h)
1 17.8 730 0.02439 41.011 410.011
2 15.7 730 0.021507 46.497 464.97
3 13.8 730 0.018904 52.899 528.99
4 12.2 730 0.016712 59.836 598.36
5 10.8 730 0.014794 67.593 675.93
6 9.5 730 0.013014 76.842 768.42
7 8.4 730 0.011507 86.905 869.05
8 7.4 730 0.010137 98.649 986.49
9 6.5 730 0.008904 112.308 1123.08
10 5.8 730 0.007945 125.862 1258.62
11 5.1 730 0.006986 143.137 1431.37
12 4.5 730 0.006164 162.222 1622.22
Step 4. Predict reliability
Example: Continuing from the previous example, the system will have a mission time of 8 h. When solving
for reliability (8 h mission) for the array of predicted failure rates the results are shown in Table F.12.
The predicted reliability at month 1 for an 8 h mission = e(–8/410.011) = 0.825307
Table F.12—Example of a reliability prediction

Total faults Predicted duty Predicted Predicted reliability as
Month after
predicted this cycle MTBCF a function of an
delivery
month (h) (h) 8 h mission
1 17.8 730 410.011 0.980677
2 15.7 730 464.97 0.982942
3 13.8 730 528.99 0.984991
4 12.2 730 598.36 0.986719
5 10.8 730 675.93 0.988234
6 9.5 730 768.42 0.989643
7 8.4 730 869.05 0.990837
8 7.4 730 986.49 0.991923
9 6.5 730 1123.08 0.992902
10 5.8 730 1258.62 0.993664
11 5.1 730 1431.37 0.994427
12 4.5 730 1622.22 0.995081
Step 5. Predict availability
Example: Continuing from the previous example, the MTSWR is predicted to be 1 h as calculated in
Table F.13.
223
IEEE Std 1633-2016
Table F.13—Example MTSWR prediction

B. Average time C. Estimated percentage of
required for this time that this restore activity # min = column B ×
A. Restore activity
restore activity is executed once operationally column C
(min) deployed
Restart 4 0.279 1.116
Reboot 11 0.6 6.6
Downgrade 500 0.001 0.5
Reinstall 180 0.01 1.8
Workaround 20 0.1 2.
Corrective action 4800 0.01 48
Total restore time 1h
When solving for availability for the array of predicted MTBF values the results are shown in Table F.14.
Table F.14—Example of software availability predictions

Predicted Predicted
Month Total discovered Predicted Predicted
reliability with a availability with
after faults predicted duty cycle MTBCF
mission time of a restore time of
delivery this month (h) (h)
8h 1h
1 17.8 730 410.011 0.980677 0.997567
2 15.7 730 464.97 0.982942 0.997854
3 13.8 730 528.99 0.984991 0.998113
4 12.2 730 598.36 0.986719 0.998332
5 10.8 730 675.93 0.988234 0.998523
6 9.5 730 768.42 0.989643 0.99870
7 8.4 730 869.05 0.990837 0.998851
8 7.4 730 986.49 0.991923 0.998987
9 6.5 730 1123.08 0.992902 0.99911
10 5.8 730 1258.62 0.993664 0.999206
11 5.1 730 1431.37 0.994427 0.999302
12 4.5 730 1622.22 0.995081 0.999384
F.3.3 Example of incremental development predictions
Example: Assume that there are three incremental releases of the same effective size. This is a very
realistic scenario if an organization has the same number of people on each increment and the calendar time
is the same for each increment and the difficulty of the code to be implemented is similar. In each
increment it is predicted that there will be 50 EKSLOC of code that is a hybrid of object oriented and
second generation language. The defect density is predicted to be 0.09 defects/EKSLOC of normalized
code. The system is predicted to be deployed to several sites but not mass distributed so the growth rate is
predicted to be 6 and the growth period is predicted to be 48 months. The software is expected to operate
continually once deployed, which means that the duty cycle per month will be 730 h. The first increment is
scheduled for completion on January 1 of 2016, the second increment on July 1 of 2016, and the third and
final increment on January 1 of 2017. On January 1 of 2018 the next major release is scheduled. Hence the
predictions will only extend to January 1 of 2018.
Scenario #1: The requirements are defined up front and each increment is a design/implementation
increment. As per the preceding steps the defects predicted in each increment are summed and equal 60.75.
The operational MTBF is then computed based on the total defects from all increments as shown in
Table F.15.
224
IEEE Std 1633-2016
Table F.15—Example of scenario 1

Increment Increment Increment Total of all
1 2 3 increments
a. Predicted EKSLOC of Hybrid language type 50 50 50 50
b. Predicted EKSLOC normalized for language 225 225 225 675
(see Annex B)
c. Predicted defect density (see 5.3.2.3 Step 1 0.09 0.09 0.09 0.09
and 6.2.1)
d. Predicted operational defects (b × c) 20.25 20.25 20.25 60.75
Predict growth rate (see 5.3.2.3 Step 3) 6
Predict growth period (see 5.3.2.3 Step 3) 48
Predict duty cycle each month of operation (see 730
5.3.2.3 Step 3)
The predicted MTBF and fault profile for the 12 months of operational deployment are shown in
Table F.16. The predicted MTBF prior to the next major release is about 404 h.
Table F.16—Prediction results for scenario 1

Predict faults per month Predicted MTBF =
Month of operation
(see 5.3.2.3 step 2) 730/predicted faults
January 1, 2017 7.138 102.265
February 1, 2017 6.300 115.881
March 1, 2017 5.559 131.311
April 1, 2017 4.906 148.795
May 1, 2017 4.330 168.607
June 1, 2017 3.821 191.056
July 1, 2017 3.372 216.495
August 1, 2017 2.976 245.321
September 1, 2017 2.626 277.985
October 1, 2017 2.317 314.999
November 1, 2017 2.045 356.940
December 1, 2017 1.805 404.466
Scenario #2: In the following scenario, Table F.17, the requirements are developed incrementally as well
as the design and code. Hence, the fault profile is computed for each increment independently starting from
the deployment of the first increment on January 1, 2016.
225
IEEE Std 1633-2016
Table F.17—Example of scenario 2

Increment 1 Increment 2 Increment 3
Predicted EKSLOC in Hybrid language type 225 225 225
Predicted defect density 0.09 0.09 0.09
Predicted operational defects 20.25 20.25 20.25
Predict growth rate 6 6 6
Predict growth period 48 48 48
Predict duty cycle each month of operation 730 730 730
Table F.18 shows the predicted MTBF and fault profile for each of the increments layered. Note that the
predicted MTBF for December 1 of 2017 is higher than the predicted MTBF for scenario 1. That is because
the increments are based on independent software requirements, which can be tested prior to transitioning
to the next increment.
Table F.18—Prediction results for scenario 2

Total faults Total faults Total fault
Total from Predicted
predicted this predicted this predicted this
Month all MTBF
month from month from month from
increments (h)
increment 1 increment 2 increment 3
1-Jan-16 2.379 2.379 306.795
1-Feb-16 2.100 2.100 347.644
1-Mar-16 1.853 1.853 393.933
1-Apr-16 1.635 1.635 446.384
1-May-16 1.443 1.443 505.820
1-Jun-16 1.274 1.274 573.169
1-Jul-16 1.124 2.379 3.503 208.369
1-Aug-16 0.992 2.100 3.092 236.113
1-Sep-16 0.875 1.853 2.728 267.551
1-Oct-16 0.772 1.635 2.408 303.175
1-Nov-16 0.682 1.443 2.125 343.542
1-Dec-16 0.602 1.274 1.875 389.284
1-Jan-17 0.531 1.124 2.379 4.034 180.947
1-Feb-17 0.469 0.992 2.100 3.560 205.040
1-Mar-17 0.413 0.875 1.853 3.142 232.341
1-Apr-17 0.365 0.772 1.635 2.773 263.276
1-May-17 0.322 0.682 1.443 2.447 298.331
1-Jun-17 0.284 0.602 1.274 2.159 338.054
1-Jul-17 0.251 0.531 1.124 1.906 383.065
1-Aug-17 0.221 0.469 0.992 1.682 434.069
1-Sep-17 0.195 0.413 0.875 1.484 491.865
1-Oct-17 0.172 0.365 0.772 1.310 557.356
1-Nov-17 0.152 0.322 0.682 1.156 631.567
1-Dec-17 0.134 0.284 0.602 1.020 715.660
F.3.4 Allocations to LRUs
Continuing from F.3.1. The system reliability objective is 214 h. That system requirement needs to be
allocated to each of the software and hardware LRUs. There are 5 software CSCIs and 5 hardware
configuration items. The first step is to perform a bottom-up analysis. The MTBF and failure rate for each
of the 10 configuration items is calculated using the best available hardware and software models. The
predicted results are shown in Table F.19:
226
IEEE Std 1633-2016
Table F.19—Predicted MTBF and failure rate for each configuration item using
bottom-up analysis
Predicted MTBF Predicted failure Allocation for each
Configuration item
(h) rate configuration item
SW 1 1940 0.000515 0.000474
SW 2 1650 0.000606 0.000558
SW 3 1595 0.000627 0.000577
SW 4 1723 0.00058 0.000534
SW 5 1489 0.000672 0.000618
HW 1 2700 0.00037 0.000341
HW 2 2500 0.0004 0.000368
HW 3 2300 0.000435 0.0004
HW 4 2540 0.000394 0.000362
HW 5 2120 0.000472 0.000434
Total 197.1994 0.00507101 0.004666667
Requirement 214.2857143 0.004666667 0.004666667
Difference between predicted 8%
and required
The allocations down to each of the LRUs are computed as a function of the top-level requirement of
214 h. Each prediction is offset by 8% to yield the allocation for each configuration item. This method
allocates to each LRU proportionally to that LRUs contribution to the system.
The second method of allocation is to allocate from the hardware allocation down to the hardware LRUs
and from the software allocation down to the software allocations. This is shown in Table F.20 and
Table F.21. The major problem with this allocation is that if the top-level allocations were not based on an
achievable prediction the resulting allocations for the software or hardware components may be out of
proportion with what is relatively achievable. In other words, some components have more cushion,
relatively speaking, than others.
Table F.20—Predicted MTBF and failure rate for each SW configuration item—
second method
Configuration item
SW 1 1940 0.000515 0.000458
SW 2 1650 0.000606 0.000539
SW 3 1595 0.000627 0.000557
SW 4 1723 0.00058 0.000516
SW 5 1489 0.000672 0.000597
Total 197.1994 0.00507101 0.002667
Requirement 375 0.002667 0.002667
Difference between predicted 11.111%
and required
227
IEEE Std 1633-2016
Table F.21—Predicted MTBF and failure rate for each HW configuration item—
second method
Configuration item
HW 1 2700 0.00037 0.000357
HW 2 2500 0.0004 0.000386
HW 3 2300 0.000435 0.00042
HW 4 2540 0.000394 0.00038
HW 5 2120 0.000472 0.000456
Total 197.1994 0.00507101 0.002
Requirement 214.2857143 0.004666667 0.002
Difference between predicted 3.4283%
and required
F.3.5 Example of sensitivity analysis
This example continues from F.3.2. The predicted MTBF is determined to be too big to meet the system
allocation. The software group first revisits the defect density prediction model to see if there are any trade-
offs that are applicable. As per the model, a net score of 4 is needed for the prediction to be upgraded from
medium to low risk. Currently the number of strengths = 4 while the number of risks = 3.25. So, the net
score that is now predicted to be 4 – 3.25 = 0.75 needs to be increased by 3.25 to reach the low risk
designation, which is needed to reduce the defect density.
Some of the items cannot be changed such as #3, #9, #10, #14, and #15 in the strengths section. In the risks
section numbers #1, #2, #4 and #6 cannot be changed. The items shaded green are optimized. That leaves
items #2, #7, #11 in the strength section and #5 in the risk section.
The group realized immediately that mitigating number #5 from the risk section does not require additional
people or calendar time—it requires only a change to the “throw over the wall” culture. They also realize
that they can mitigate #11. Their software lead would like to promote one of the senior developers so that
each has five direct reports. This would allow for smaller group sizes, which has been shown to positively
correlate to fewer defects due to less complex communication paths.
They need only one more mitigation to reduce the predicted risk from medium to low. They know that
there is nothing that they can do in the immediate timeframe to resolve the fact that they have developers in
two different parts of the country. So, they focus on #2 in the strengths section. They realize that by
employing an incremental development model instead of a waterfall model they can complete the required
features in a shorter period of time and reduce the overall risk. They decide to mitigate three development
practices.
The adjusted score is now = 4 assuming that these mitigations are made as planned. The new predicted
defect density = 0.1108 defects/normalized EKSLOC. This is less than half of the original prediction of
0.239 defects/normalized EKSLOC.
The shortcut survey results are shown in Table F.22.
228
IEEE Std 1633-2016
Table F.22—Shortcut survey results

Strengths
1 We protect older code that should not be modified. NA
2 The total schedule time in years is less than one. No
3 The number of software people years for this release is less than seven. No
4 Domain knowledge required to develop this software application can be acquired via public domain in No
short period of time.
5 This software application has imminent legal risks. No
6 Operators have been or will be trained on the software. Yes
7 The software team members who are working on the same software system are geographically co- No
located.
8 Turnover rate of software engineers on this project is < 20% during course of project. Yes
9 This will be a maintenance release (no major feature addition). No
10 The software has been recently reconstructed (i.e., to update legacy design or code). No
11 We have a small organization (<8 people) or the team sizes do not exceed 8 people per team. No
12 We have a culture in which all software engineers value testing their own code (as opposed to waiting for Yes
13 We manage subcontractors—Outsource code that is not in our expertise, keep code that is our expertise Yes
in house.
14 There have been at least four fielded releases prior to this one. No
15 The difference between the most and least educated end user is not more than 1 degree type (i.e., No
bachelors/masters, high school/associates).
Risks
1 This is brand new release (version 1), or development language, or OS, or technology (add one for each Yes
risk).
2 Target hardware/system is accessible within 0.75) min 0.5) h .25) days 0) weeks or months. 0.25
3 Short term contractors (< 1 year) are used for developing line of business code. No
4 Code is not reused when it should be. NA
5 We wait until all code is completed before starting the next level of testing. Yes
6 Target hardware is brand new or evolving (will not be finished until software is finished). Yes
7 Age of oldest part of code >10 years. No
They now plan on the predicted MTBCF = 820 h at initial delivery. Next they review the system reliability
block diagram (RBD). They notice that their one CSCI is supporting three multiple hardware interfaces. As
shown in Table F.23, the reliability prediction for one of the subsystems is below the required 90%.
Table F.23—Prediction results
HW predictions Software prediction Subsystem reliability prediction

HWCI A = .001 f/h CSCI = .00122 f/h 0.948115
HWCI B = .0035 f/h CSCI = .00122 f/h 0.892901
HWCI C = .003 f/h CSCI = .00122 f/h 0.90368
They revise the system prediction to assume that the one software CSCI is split into three independent
CSCIs. The updated RBD shows that with the cohesive design, the reliability for each component is now
predicted to be at least 90%. The cohesive architecture also supports their software team, which is co-
located across two time zones. Now that they are not writing code for the same CSCI, they are able to work
independently with less risk. The updated results are shown in Table F.24.
229
IEEE Std 1633-2016
Table F.24—Prediction results—updated

HW predictions Software prediction Subsystem reliability prediction
HWCI A = .001 f/h CSCI = 0.000406667 f/h 0.966804
HWCI B = .0035 f/h CSCI = 0.000406667 f/h 0.910501
HWCI C = .003 f/h CSCI = 0.000406667 f/h 0.921493
F.4.1 Example of an operational profile test
Revisit the printer example in F.1.2. As a minimum the OP can be used to increase the testing focus in the
areas that are most likely to be exercised by the end users. The faxing, for example, is about 20% of the
profile while almost half of the profile is printing. Therefore, the amount of test focus should be relatively
in line with that percentage. The same applies to the customer modes and user modes. The test suite should
exercise the major functions with roughly the following percentages, which are all derived and computed
from the OP shown in 5.1.1.3. Keep in mind that some modes may take longer to execute or be a higher
development risk so the word “roughly” is emphasized.
Printing 48.58%
Scanning 30.93%
Faxing 20.50%
When printing is tested the paper used should be roughly as follows. When scanning is tested, the test effort
should be approximately 55% for auto feed and 45% for manual. When the fax mode is tested the test effort
should be roughly 7% for autodial and 93% for manual dial.
Legal size 28.54%

11x7 18.46%
8.5x11 52.99%
As far as what the end users are actually printing, scanning and faxing, this should be thoroughly
investigated. The actual media tested should be roughly as shown as follows. For example, what it is that
high-tech small business people are printing, scanning, and faxing should be approximately 42% of the
media tested. How big are the documents? How many pages? Are there any images or are the documents
exclusively text? Similarly the high-tech professionals can be surveyed to determine what kinds of
documents they are printing in 11×17. The faxing accounted for 19% of the OP. The professionals and
copy shop employees can be queried to determine the typical length of the fax as well as how many faxes
they send per day. It is almost certainly not sufficient to simply test a 1-page document either in the
printing mode, scanning more or faxing mode.
Professionals 28.0%
High-tech small businesses 42.0%
Walk-in copy shop 7.5%
Copy shop employee 22.5%
F.4.2 Example of model testing
Example: System Under Test is an elevator simulator developed in LabView. See Lakey, Neufelder [B45],
[B46], Lakey [B47].
230
IEEE Std 1633-2016
Identify All Interfaces

Input Interfaces
• Floor Selector in Cabin
• Floor Controls on Wall
• Start and Stop Buttons
• Time (abstract interface)
Output Interface
• Floor Position indicator
Figure F.4—Elevator control example

For simplicity, the specification will cover only Floors 0 through 3.
Identify Inputs
Floor Selection In Cabin
Press Cabin Floor 0
Press Cabin Floor 1
Press Cabin Floor 2
Press Cabin Floor 3
Floor Controls on Wall
Press Floor 0 Up
Press Floor 1 Up
Press Floor 1 Down
Press Floor 2 Up
Press Floor 2 Down
Press Floor 3 Down
Application Buttons
Press Start Button
Press Stop Button
Timers
Travel Timer Expires
Floor Timer Expires
231
IEEE Std 1633-2016
Identify Outputs
Floor Position
Arrive at Floor 0
Arrive at Floor 1 Up
Arrive at Floor 1 Down
Arrive at Floor 2 Up
Arrive at Floor 2 Down
Arrive at Floor 3
Stop at Floor 0
Stop at Floor 1
Stop at Floor 2
Stop at Floor 3
Application Responses
Application Started
Application Stopped
Identify Operating Variables
Current Position [0, 0-1, 1, 1-2, 2, 2-3, 3]
Current Direction [Stationary, Up, Down]
Motion Status [not moving, moving]
Travel Timer [inactive, active]
Floor Timer [inactive, active]
High Floor Selected [0, 1, 2, 3, null]
Low Floor Selected [0, 1, 2, 3, null]
Cabin Floor 0 [not selected, selected]
Floor 0 Up [not selected, selected]
Floor 1 Down [not selected, selected]
Specify Behaviors
Input: Press Cabin Floor 0
Operating Variable Constraints:
Cabin Floor 0 = “not selected”
Current Position <> “0”
Behavior:
IF (Cabin Floor 0 Pressed) System WILL Set Cabin Floor 0 = Selected
AND System WILL set Low Floor Selected = 0
IF (High Floor Selected = Null) System WILL Set High Floor Selected = 0
AND Set Current Direction = Down AND Set Motion Status = Moving
AND Set Travel Timer = active
Behavior:
IF (Low Floor Selected <> 0) System WILL Set Low Floor Selected = 1
IF (High Floor Selected <> 2 or 3) System WILL Set High Floor Selected = 1
IF (Current Direction = Stationary) THEN
[IF (Current Location < 1) System WILL Set Current Direction = Up
ELSE System WILL Set Current Direction = Down]
AND Set Motion Status = Moving and Travel Timer = Active
232
IEEE Std 1633-2016

Behavior:
IF (Low Floor Selected <> 0 or 1) System WILL Set Low Floor Selected = 2
IF (High Floor Selected <> 3) System WILL Set High Floor Selected = 2

Behavior:
AND System WILL set High Floor Selected = 3
IF (Low Floor Selected = Null) System WILL Set Low Floor Selected = 3
AND Set Current Direction = Up AND Set Motion Status = Moving
Input: Press Floor 0 Up
Floor 0 Up = “not selected”
Behavior:
IF (Floor 0 Up Pressed) System WILL Set Floor 0 Up = Selected
AND System WILL set Low Floor Selected = 0
IF (High Floor Selected = Null) System WILL Set High Floor Selected = 0
AND Set Current Direction = Down AND Set Motion Status = Moving
Behavior:
Input: Press Floor 1 Down

Floor 1 Down = “not selected”
Behavior:
IF (Floor 1 Down Pressed) System WILL Set Floor 1 Down = Selected
233
IEEE Std 1633-2016


Behavior:

Behavior:

Behavior:
AND System WILL set High Floor Selected = 3
IF (Low Floor Selected = Null) System WILL Set Low Floor Selected = 3
AND Set Current Direction = Up AND Set Motion Status = Moving
Input: Travel Timer Expires – Floor 0

Travel Timer = Active
Current Direction = Down
Motion Status = Moving
Current Position = 0-1
Behavior:
IF (Travel Timer Expires at Floor 0) System WILL do the following:
Set Current Position = 0
Set Motion Status = not moving
Set Travel Timer = not active
Set Floor Timer = active
Set Cabin Floor 0 = not selected
Set Floor 0 Up = not selected
Set Low Floor Selected based on status of other buttons
Set High Floor Selected based on status of other buttons
234
IEEE Std 1633-2016
Input: Floor Timer Expires – Floor 0

Floor Timer = Active
Motion Status = Not Moving
Current Position = 0
Behavior:
IF (Floor Timer Expires at Floor 0) System WILL :
Set Floor Timer = not active AND
IF (High Floor Selected > 0) System WILL do the following:
Set Current Direction = Up
Set Current Location = 0-1
Set Motion Status = Moving
Set Travel Timer = Active
ELSE System WILL do nothing else
Input: Travel Timer Expires – Floor 1 Up

Current Direction = Up
Behavior:
IF (Travel Timer Expires at Floor 1 Moving UP) System MUST:
IF ((Floor 1 UP OR Cabin Floor 1 = selected )
OR (Floor 1 Down = Selected AND High Floor Selected = 1)) System MUST
ELSE System WILL continue traveling Up and Set Current Position = 1-2
Input: Travel Timer Expires – Floor 1 Down

Behavior:
IF (Travel Timer Expires at Floor 1 Moving Down) System MUST:
IF ((Floor 1 Down OR Cabin Floor 1 = selected)
OR (Floor 1 Up = Selected AND High Floor Selected = 1)) System MUST
Set Floor 1 Down = not selected
ELSE System WILL continue traveling Down and Set Current Position = 0-1
235
IEEE Std 1633-2016
Input: Floor Timer Expires – Floor 1 Up

Behavior:
Set Floor Timer = not active
ELSE IF (Low Floor Selected < 1) System WILL do following:
Set Current Direction = Down
Input: Floor Timer Expires – Floor 1 Down

Behavior:
IF (Low Floor Selected < 1) System WILL do the following:
ELSE IF (High Floor Selected > 1) System WILL do following:
Input: Travel Timer Expires – Floor 2 Up

Behavior:
IF ((Floor 2 UP OR Cabin Floor 2 = selected)
236
IEEE Std 1633-2016

Input: Travel Timer Expires – Floor 2 Down

Behavior:
IF (Travel Timer Expires at Floor 2 Moving Down) System MUST:
IF ((Floor 2 Down OR Cabin Floor 2 = selected )
OR (Floor 2 Up = Selected AND High Floor Selected = 2)) System MUST
ELSE System WILL continue traveling Down and Set Current Position = 1-2
Input: Floor Timer Expires – Floor 2 Up

Behavior:
ELSE IF (Low Floor Selected < 2) System WILL do following:
Input: Floor Timer Expires – Floor 2 Down

Behavior:
237
IEEE Std 1633-2016

ELSE IF (High Floor Selected > 2) System WILL do following:
Input: Travel Timer Expires – Floor 3

Behavior:
IF (Travel Timer Expires at Floor 3) System WILL do the following:
Input: Floor Timer Expires – Floor 3

Behavior:
Set Floor Timer = not active AND
This concludes the specification of the elevator simulator. The resulting specification is a state machine. It
specifies all possible state combinations, all possible transitions between states, and the required response
to all inputs (transitions). It also represents the structure of the OP. To complete the OP specification for
reliability estimation purposes, relative likelihood values should be assigned to each of the inputs in the
specification. This is addressed later in the subclause.
The OP testing process is illustrated as follows. The two main elements of the process are test generation
and test execution. In Figure F.5, tests are generated from a behavior model. These tests are converted to
executable scripts that exercise the system under test in its target environment.
238
IEEE Std 1633-2016
System
Requirements
Technical Specification
Usage Usage Model Editor

Model
1
Requirements Test Generator
Test Traceability
Cases
Model
Coverage
Test
Scripts
2 Adaptor
Reliability
Create Test Scripts
Test Execution Mean Time
Platform To Failure Application Test Bench
Intensity
System
of Test
Under
Test Test Results Analysis Sys
Test
Figure F.5—Model-based process for example
F.4.2.1 Behavior model
The next step in the reliability testing process is to convert the software specification into the form of a
behavior model that can be operated on to produce test cases. This is best accomplished with a software
tool. The practitioner may select a commercial tool or develop their own. In either case, the objective is to
represent the state machine specification as a model to automatically generate test cases.
Continuing with the elevator example, a graphical model has been constructed using a commercial tool. 17
The Figure F.6 shows the main operating modes of the elevator simulator. The elevator may either be
stationary, going up, or going down. The second figure, Figure F.7, shows state transitions while the
elevator is in stationary mode. Using the modeling tool, the entire specification defined previously is
precisely replicated so that the Markov Chain Usage Model (MCUM) contains all of the conditions and
stimuli that the elevator control system may be subjected to during operation.
17
The tool illustrated here is for example purposes only; this recommended practice does not promote or sponsor a specific tool.
239
IEEE Std 1633-2016
Figure F.6—Main operating modes of the elevator simulator

Figure F.7 shows states and transitions while the elevator is stationary.
Figure F.7—Stationary states and transitions of the elevator simulator
240
IEEE Std 1633-2016
F.4.2.2 Test generation
The next step in the OP testing process is to generate test cases from the MCUM. An illustration of this step
using a commercial tool is provided in Figure F.8. A number of tests are selected.
Figure F.8—Generation of test cases using the tool
241
IEEE Std 1633-2016
The results of random test generation are illustrated in Figure F.9. Notice the coverage numbers.
Figure F.9—Results of random test generation

A key part of software reliability testing is considering how large a test sample is sufficient to obtain
reliability estimates that have a high (or at least reasonable) degree of confidence. In the preceding example
with 10 test cases, the elevator project could achieve roughly 60% state and 40% transition coverage. This
level is too low to assess reliability with confidence.
Fortunately, with a tool that automatically generates random tests based on the OP, one can experiment
with varying numbers of test cases to determine the level of structural model coverage the sample would
achieve. A sample of 100 tests was generated from the elevator model and the following statistical results
were obtained in Figure F.10.
242
IEEE Std 1633-2016
Figure F.10—Sample of 100 tests generated for elevator model with statistical results
Observe that nearly 97% of states and more than 90% of transitions are covered by the 100 tests. If all of
these tests are executed, then one could be confident that the failure rate obtained from executing those
tests would be fairly representative of the population. A larger sample would increase confidence.
The simplified elevator simulator is a relatively basic application. In more complex applications the state
space in a behavioral model will be much larger than this example. Some applications consist of hundreds
of thousands or millions of states. For those systems, achieving 90% structural cover may require thousands
or tens of thousands of test cases. Automation is a necessity for complex systems.
F.4.2.3 Test adapter development
Test cases generated from a modeling tool are sequences of abstract actions. The test cases are not
executable. The abstract actions for the elevator example follow. They include Input and Response actions.
Many tools and methods exist for creating executable test scripts for this purpose.
243
IEEE Std 1633-2016
Floor Selection In Cabin Floor Position

• Press Cabin Floor 0 • Arrive at Floor 0
• Press Cabin Floor 1 • Arrive at Floor 1 Up
• Press Cabin Floor 2 • Arrive at Floor 1 Down
• Press Cabin Floor 3 • Arrive at Floor 2 Up
Floor Controls on Wall • Arrive at Floor 2 Down
• Press Floor 0 Up • Arrive at Floor 3
• Press Floor 1 Up • Stop at Floor 0
• Press Floor 1 Down • Stop at Floor 1
• Press Floor 2 Up • Stop at Floor 2
• Press Floor 2 Down • Stop at Floor 3
• Press Floor 3 Down
Application Buttons Application Responses
• Press Start Button • Application Started
• Press Stop Button • Application Stopped
Timers
• Travel Timer Expires
• Floor Timer Expires
A tool for test adapter creation is not illustrated here; refer to the literature for test execution tools.
A recommended methodology for constructing test scripts associated with OP model actions is described in
this subclause. The approach is simply this: all discrete events in an auto-generated test sequence should
have an associated adapter that is a self-contained executable test function. This concept is very basic, but it
is also powerful and enables a project to achieve software reliability testing.
As stated previously, finite state machines for complex systems can be very large. Test cases generated
from those models can be long and highly variable. The sequences of events for almost every test will be
different. As a result, it is not practical to develop and deploy test functions that have dependencies on
other test functions. This means that the order and sequences of events impact the logic in a given test
function; in other words, test actions would need to know the history of previous events. Maintenance of
test cases could become unwieldy. This is not a recommended practice.
Test function independence means that a test function can execute correctly every time based upon the
current state of usage of the system when the event occurs. When a test function is called, it first searches
the global variable space of the SUT and determines current usage, then invokes the action. It also verifies
that the system responds correctly to the event given the current (known) usage state.
To illustrate the idea of self-contained test functions we’ll refer to the elevator example again. Consider the
input “Press Cabin Floor 1.” This may occur anywhere in a randomly generated test case. When called, the
associated test function should first check the current values of certain operating variables, specifically
current location, current direction, high floor selected and low floor selected. Then it should trigger the
action “Press Cabin Floor 1” and implement the logic specified for this action, as follows.
Behavior:
No matter what elevator actions have been previously executed in a test sequence, whenever the Press
Cabin Floor 1 Event occurs the preceding logic is executed and the system response is confirmed.
244
IEEE Std 1633-2016
Let us take one more action, “Travel Timer Expires—Floor 2 Up.” When this test function is called it will
first query the global variable space for the values of Floor 2 UP, Cabin Floor 2, and Floor 2 Down and
High Floor Selected variables. Once this information is obtained it triggers the abstract event (a timer
expiring is tracked internally; there is no concrete trigger from an external interface to the SUT). Then the
test function implements the specified behavior as follows.
Behavior:
IF ((Floor 2 UP OR Cabin Floor 2 = selected )
With this self-contained concept each of the test functions implement, precisely, the specification of the
action associated with that function. All of the functions perform three generic tasks: 1) obtain current
global state, 2) trigger test action, and 3) verify correct response based on behavior specification.
The test functions are implemented in this manner regardless of the test tool used, the test language, or the
communication channels implemented to interface with the system under test. The test adapter for each
function implements the specified behavior requirements for the action. Each test function is executable
against the SUT.
F.4.2.4 Test case translation
Before any auto-generated test case can be executed, it should be converted to an executable test script. By
employing appropriate tools, this can be achieved fairly easily. A general method that is supported by some
test automation tools involves assigning a unique test function to each transition in the MCUM associated
with an input to the SUT. The test function development environment is natively supported by the
automatic test generation tool. In this way, an auto-generated test is a sequence of test function calls that is
directly executable in the test function execution environment. Auto-generated tests simply need to be
saved in the format that is compatible with the test execution environment. The literature contains case
studies of projects that have successfully utilized commercial test tools for this purpose.
Illustrated as follows, a test model integrated with a set of test function adapters can be utilized to
automatically generate executable test scripts. The basic test building blocks are constructed separately.
The test model contains all possible test sequences. A test execution tool contains a library of all test
functions for a SUT. With these basic elements, an unlimited number of executable tests can be created and
executed, enabling software system reliability to be estimated with a high degree of confidence. Just to
emphasize, the Markov Chain Usage Model (test model) and the self-contained test functions represent the
essential elements needed to perform OP testing. This formula is generic, robust, and repeatable. Projects
that follow this formula can produce high quality reliability estimates.
245
IEEE Std 1633-2016
Figure F.11—Sample of tool outputs

By automatically translating the test cases into executable scripts a large test sample can be created very
quickly and can be executed for the purpose of determining the reliability of the SUT.
F.4.2.5 Test execution
The final step in the OP testing process is to execute the auto-generated, executable test sequences. For a
given sample, every test in the sample should be executed in the selected test execution environment.
Again, there are many tools to support test execution and test case management. Refer to the literature for
available tools and their applicability.
To support effective and efficient OP testing, the test execution environment should have the capability to
report pass/fail status on every event in a test sequence, and every test in a sample. Offline analysis of test
case results should be avoided to the extent possible so that the maximum information on test result success
can be obtained with minimum effort.
The outcome of executing a sample of tests generated from the test model is quantitative information on
test case successes versus test case failures. This data can be evaluated and manipulated in a number of
ways in order to estimate software reliability. This topic is covered in 6.3 and Annex C.
F.4.2.6 Profile variation
This subclause has purposely deferred a discussion on variations in an OP. The approach to OP testing is
the same regardless of the assignment of probability values to transitions in the MCUM. Getting the test
model structure correct is paramount to successful OP testing. The distribution of likelihood values across
the model is secondary, though still necessary for software reliability estimation.
246
IEEE Std 1633-2016
In the elevator example the usage model was created with no special assignment of probability values;
every event in the model was assumed to be equally likely. The software tool that was used for model
development assigns a default value of “normal” to every transition in the Markov chain. See the following.
Figure F.12—Sample of outputs from tool for elevator model

The tool allows a relative frequency value to be assigned on each arc in the model. For instance, the event
“Select Higher” in Figure F.13 is assigned a relative frequency of very often(X10). This means that the
event will be 10 times more likely than an event with a normal frequency. The tool handles these relative
frequencies quantitatively by ensuring that all transitions emanating from a state sum to 1.0.
Figure F.13—Sample output showing selection of frequency of activity
247
IEEE Std 1633-2016
F.4.3 Example of failure modes testing
Refer to the SFMEA example in F.2.1. The results of the SFMEA show some potential root causes for
failure modes. When testing for failure modes, one needs to instigate the particular failure mode and record
the results. Some potential root causes and how to test them are shown in Table F.25.
Table F.25—Using the results of a SFMEA to develop a failure modes test

Potential root cause How to test
The end user has more than one format in Develop a file that has both time and fault based data.
the same file
Failure count is not increasing Develop a set of test data in which the number of cumulative faults is
(erroneously) not increasing.
Time is not increasing Develop a set of test data in which the amount of cumulative test hours
is (erroneously) not increasing.
Interfailure time is not positive Develop a test file that has either 0 or negative numbers as interfailure
time values.
There are 0 data points Develop a test file that has 0 data points.
There are fewer than (minimum required) Develop 5 test files that have 1, 2, 3, 4, and 5 data points.
data points
False positive result Develop many different sets of data in which the allowable models are
known in advance. Run many different sets of data and verify that all
False negative result
valid models are available and no invalid models are available.
Reliability growth is neither positive nor Develop a test file that has the same number of faults detected for each
negative day of testing.
No result is generated at all Develop a set of data in which none of the models will generate an
answer. Verify that the software detects it and advises the user that
there are no model results.
It takes too long to generate a result (too Develop a test file with 1000, 5000, 10 000, 20 000 data points and
many data points) record the time it takes for the software to produce and answer.
Software runs Laplace test before data The user will be allowed to use the models when they should not.
format is checked
U-shaped data has both + and –, which Develop a test file that has increasing and then decreasing fault rate.
generates false positive
U-shaped data has both + and –, which
generates false negative
N-shaped data causes false positive Develop a test file that has decreasing and then increasing fault rate.
N-shaped data causes false negative
S shaped (U and N) Develop a test file that has increasing, then decreasing then increasing
fault rate then decreasing fault rate.
Decreasing S shaped Develop a test file that has decreasing, then increasing, then decreasing
fault rate.
Increasing S shaped Develop a test file that has increasing, then decreasing, then increasing
fault rate.
The select a file that does not have any Create a test file that has junk in it but no test data.
valid format. It is a CSV or Excel file but
it does not have failure data in it
The input file is already in use Open a test file. Press ctl-alt-del to crash the application. Open the same
test file.
F.4.4 Example of incremental development software reliability growth estimations
Increment 1 is developed and tested. The non-cumulative defects found per testing day are plotted. Then
increment 2 is developed and tested. Its non-cumulative defects are plotted per day as well. Based on
Figure F.14 there are two possible options for estimating the software reliability growth. The first is to
combine the defects from both increments and estimate reliability growth. The second is to apply the SRG
models to each increment independently and then merge the estimated defects.
248
IEEE Std 1633-2016
The exponential model is applied to increment 1 and the estimated inherent defects are computed as
133 of which 66 have been found so far. For increment 2, the defect rate has just recently started to
decrease so the only model that can estimate the inherent defects is the Rayleigh model. The peak occurred
at week 27 when a total of 105 defects had been found: 105 × 2.5 = 263 of which 171 have been discovered
so far.
Figure F.14—Example of using software reliability growth models in incremental

development
The data from both increments is combined into one data as shown in Figure F.15.
Figure F.15—Combined data

If the data is combined into one data set, the estimated inherent defects is 402.5 using the Rayleigh Model.
The peak was at week 27 and the total cumulative amount of both increments up to that time was 161. The
exponential model cannot be used because the defect rate has so recently decreased. When using the
models separately the estimated inherent defects = 133 + 263 = 396 inherent defects. In this case either
approach is estimating approximately the same number of defects.
249
IEEE Std 1633-2016
F.4.5 Accuracy verification
F.4.5.1 Accuracy of the Predictive Model
Refer to example F.3.3. Following is the prediction for defects for each increment. The predicted defect
density is 0.09 defects/normalized EKSLOC and the predicted size for each increment is
50 normalized EKLSOC. There are two variables that should be monitored: first, the prediction for size,
and second, the prediction for defect density. See Table F.26.
Table F.26—Example of incremental predictions inputs
Predicted values Increment X

a. Predicted EKSLOC of Hybrid language type 50
b. Predicted EKSLOC normalized for language =50 × 4.5 225
(See Annex B)
c. Predicted defect density 0.09
d. Predicted operational defects (225 × 0.09) 20.25
The predicted defect density, however, is for operational defects and not testing defects. Hence, to verify
the accuracy of the operational defect density during the testing process, the analyst needs to determine the
typical ratio between testing and operational defects. As follows, the average system testing defect density
in terms of defects per normalized EKSLOC is between 0.056 and 3.062 for software systems that are
predicted to have between 0.0269 and 0.111 defects/normalized EKSLOC. The example system is
predicted to have a defect density of 0.09 defects/normalized EKSLOC, hence, if the prediction is accurate
the testing defect density should be in the range shown in Table F.27.
Table F.27—Example of incremental predictions outputs

Predicted outcome Average 3-year fielded defect density Average system testing defect density
of project (defects per normalized EKSLOC) (defects per normalized EKSLOC)
Successful 0.0269 to 0.111 0.056 to 3.062
Mediocre 0.111 to 0.647 3.062 to 0.996
Distressed 0.647 and up 7.551 and up
The first increment of software system testing is complete and the following actual values are measured in
Table F.28.
Table F.28—Example of incremental predictions—final prediction results

Predicted values Increment X
Actual EKSLOC of hybrid language from static analysis tool 65
Actual normalized EKSLOC = 65 × 4.5 as per B.1 292.5
Actual defects found during testing of increment 1 941
Actual testing defect density 3.217
The predicted size is 30% higher than predicted. The predictions for future increments should be revised
since each increment was assumed to have the same amount of effective KSLOC. The actual size is used to
compute the actual testing defect density = 941/292.5 = 3.217 defects/normalized EKSLOC. The testing
defect density in increment X is outside of the expected range. This means that either the testing
organization was exceptionally aggressive at discovery faults in the code, or there are more defects in the
code than predicted. If fewer than 17 defects (0.0269 × 292.5) had been found during testing that would
have been an indication that the testing effort is possibly insufficient as it is below the typical expected
defect density.
250
IEEE Std 1633-2016
F.4.5.2 Accuracy of the SRG model estimate
Refer to the example in 6.3.2.5. The following estimations have been made. So the question is which model
is trending the closest?
Table F.29—Example of SRG model estimate accuracy

Model Estimated current MTBF (h)
Defect based 25.41366
Time based 48.56585
In the last week of testing there was 142 h of usage time and 5 failures. Hence the most recent actual
MTBF = 28 h. At this point in time the defect based model is trending the closest to the actual MTBF. The
relative error = (28 – 25.4)/28 = 9.3%.
A software version is being considered for deployment. The software is the very same software illustrated
in 6.3.2.5. What is known or estimated criteria for acceptance is shown in Table F.30:
Table F.30—Criteria for acceptance
Criteria Determination
Adequate defect removal
The current fault rate of the software is not increasing Rate is decreasing as per 6.3.2.5..
If a selected task, the results of any SFMEA indicates that there are no Not selected
unresolved critical items.
The estimated remaining defects does not preclude meeting the reliability 71% estimated removal
goal and/or does not require excessive resources for maintenance.
The estimated remaining escaped defects are not going to result in defect Not predicted
pileup.
Reliability estimation confidence
The relative accuracy of the estimations from 5.4.7 indicate confidence in Yes, model is tracking
the software reliability growth measurements.
Release stability—Reliability goal has been met or exceeded. No specific requirement, however,
the estimate MTBF of 25 h is
concerning.
If a selected task, the RDT indicates “accept.” Not selected
Adequate code coverage
Recommended: 100% branch/decision coverage with minimum and Not selected
maximum termination of loops.
Adequate black box coverage
An OP is developed and validated. Yes
Requirements are covered with 100% coverage. Yes
Every modeled state and transition has been executed at least once. Yes
Adequate stress case coverage Not selected
The risks are therefore as shown in Table F.31. The risk of acceptance is high because of inadequate defect
removal, no measured code coverage and no stress case coverage.
251
IEEE Std 1633-2016
Table F.31—Acceptance decision factors and confidence

Adequate Reliability Adequate Adequate Adequate
defect estimation code black box stress case Risk of
removal confidence coverage coverage coverage acceptance
No None No No NA Very high
Yes None No No NA High
No None Yes No NA High
No None No Yes NA High
Yes None Yes No NA High
No Low No Yes No High
Yes None Yes No NA Moderate
No Low No Yes Yes Moderate
Yes Moderate No Yes No Moderate
No Moderate Yes Yes No Moderate
Yes High No Yes Yes Low
No High Yes Yes Yes Low
Yes High Yes Yes No Low
Yes Very high Yes Yes Yes Very low
An organization has deployed a defense related software release Version 1.0. Over the next 4 years, they
collect field trouble reports related to the software. They release versions 2.0, 3.0, and 4.0 after 12, 24, and
36 months respectively. When trouble reports are reported from the field the software engineers investigate
them and determine which release the defect was originally introduced in. They record the dates of every
trouble report as well. Using the graphical techniques in 5.4.4 they can estimate the defect removal to date
for each version. They also have static code analysis tools that can determine the actual effective size of
each version deployed. The data that they have collected is shown in Table F.32:
Table F.32—Computation of actual defect density

Observed data Computed data
Number of Defect
trouble removal via
Actual Data of Computed
Months Shortcut reports due to use of
effective size last defect
Version since survey a software technique
normalized trouble density to
deployed result defect in this in discussed
by language report date
version since in 5.4.4
deployment (%)
12
Medium 1000
1.0 48 200 months 0.2 99
risk EKSLOC
ago
6 months
2.0 36 Low risk 120 800 EKSLOC 0.15 95
ago
Medium 1200 1 month
3.0 24 220 0.184 87
risk EKSLOC ago
Last
4.0 12 Low risk 44 600 EKSLOC 0.072 76
week
Version 1 has been deployed for several years and has not experienced a fault in a year. Since its estimated
defect removal is very high one can use this data as historical data for predicting future software releases.
This was a medium risk project so the average defect density of 0.2 is now recorded as an historical data
252
IEEE Std 1633-2016
point for a medium risk project. It is noted that the Shortcut model prediction for medium risk defect
density is 0.239.
Version 2 has been deployed for 3 years and has not experienced a fault in several months and has a very
high defect removal percentage. So, it is also applicable to use as historical data for a low risk project. Now
there are historical data points for both low and medium risk projects. It is noted that the Shortcut model
prediction for low risk defect density = 0.1109 compared to 0.15 for the historical data. It is decided to
continue to use the Shortcut model to predict low, medium and high risk but the historical data is used for
the actual predicted defect density.
Version 3 and 4 have not been deployed long enough to be used for historical data. Next year the data will
be revisited.
253
IEEE Std 1633-2016
Annex G
(informative)
Bibliography
Bibliographical references are resources that provide additional or helpful material but do not need to be
understood or used to implement this standard. Reference to these resources is made for informational use
only.
[B1] “A Comparative Study of Test Coverage-Based Software Reliability Growth Models,” Proceedings
of the 2014 11th International Conference on Information Technology: New Generations (ITNG ‘14),
IEEE Computer Society, Washington, DC.
[B2] Ambler, Scott, and Mark Lines, Disciplined Agile Delivery: A Practitioner’s Guide to Agile
Software Delivery in the Enterprise. IBM Press, 2012.
[B3] AMSAA Technical Report No. TR-652, AMSAA Reliability Growth Guide, US Army Material
Analysis Activity, Aberdeen Proving Ground, MD, 2000.
[B4] Beizer, Boris, Software Testing Techniques. Van Nostrand Reinhold, 2nd Edition, June, 1990.
[B5] Binder, Robert V., Beware of Greeks bearing data. Copyright Robert V. Binder, 2014.
[B6] Binder, Robert V., Testing Object Oriented Systems—Models, Patterns, and Tools. Addison-Wesley,
1999.
[B7] Boehm, Barry, et al., Software Cost Estimation with COCOMO II (with CD-ROM). Englewood
Cliffs: Prentice-Hall, 2000.
[B8] Buglione, Luigi, and Christof Ebert, “Estimation Tools and Techniques,” IEEE Software, May/June
2011.
[B9] Chao, B., S. M. Lee, and S. L. Jeng, “Estimating Population Size for Capture-Recapture Data When
Capture Probabilities Vary by Time and Individual Animal,” Biometrics, vol. 48, pp. 201–16, 1992.
Available at http://warnercnr.colostate.edu/~gwhite/software.html.
[B10] Common Weakness Enumeration, A community developed dictionary of software weakness types.
CWE Version 2.6, edited by Steven Christey, Ryan P.Coley, and Janis F Glenn. Kenderdine and Mazella,
Project Lead: Robert B. Martin, copyright Mitre Corporation, 2014. Available at http://cwe.mitre.org/.
[B11] Cutting, Thomas, “Estimating Lessons Learned in Project Management—Traditional,” January 9,
2009.
[B12] DeMarco, Anthony, White Paper: The PRICE TruePlanning Estimating Suite, 2007.
[B13] Department of the Air Force, Software Technology Support Center, Guidelines for Successful
Acquisition and Management of Software-Intensive Systems: Weapon Systems Command and Control
Systems Management Information Systems, Version 3.0, May 2000.
[B14] Duane, J. T., “Learning curve approach to reliability monitoring,” IEEE Transactions on Aerospace,
vol. 2, no. 2, pp. 563–566, April 1964.
[B15] Engineering Design Handbook: Design for Reliability, AMCP 706-196, ADA 027370, 1976.
[B16] Erickson, Ken, “Asynchronous FPGA risks,” California Institute of Technology, Jet Propulson
Laboratory, Pasadena, CA 91109, 2000 MALPD International Conference, September 26–28, 2000.
[B17] Farr, Dr. William, “A Survey of Software Reliability Modeling and Estimation,” NSWC TR 82-171,
Naval Surface Weapons Center, Dahlgren, VA, Sept. 1983.
[B18] Fischman, Lee, Karen McRitchie, and Daniel D. Golorath, “Inside SEER-SEM,” CrossTalk, The
Journal of Defense Software Engineering, April 2005.
254
IEEE Std 1633-2016
[B19] Goel, B., and Okumoto, K., “Time-dependent error-detection rate for software reliability and other
performance measures,” IEEE Transactions on Reliability, vol. R-28, no. 3, pp. 206–211, 1979.
[B20] Gokhale, S., and K. Trivedi, “Log-logistic software reliability growth model,” Proceedings IEEE
High-Assurance Systems Engineering Symposium, pp. 34–41, 1998.
[B21] Grottke, Michael, Allen Nikora, and Kishor Trivedi, “An empirical investigation of fault types in
space mission system software,” Proceedings 40th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks, pp. 447–456, 2010.
[B22] Grottke, Michael, and Benjamin Schleich, “How does testing affect the availability of aging software
systems?” Performance Evaluation 70(3):179–196, 2013.
[B23] Grottke, Michael, and Kishor Trivedi, “Fighting bugs: Remove, retry, replicate, and rejuvenate,”
IEEE Computer 40(2):107–109, 2007.
[B24] Grottke, Michael, et al., “Recovery from software failures caused by Mandelbugs,” IEEE
Transactions on Reliability, 2016 (in press)
[B25] Grottke, Michael, Rivalino Matias Jr., and Kishor Trivedi. “The fundamentals of software aging,”
Proceedings First International Workshop on Software Aging and Rejuvenation/19th IEEE International
Symposium on Software Reliability Engineering, 2008.
[B26] Hatton, Les, “Estimating source lines of code from object code: Windows and embedded control
systems,” CISM University of Kingston, August 3, 2005. Available at http://www.leshatton.org/
Documents/LOC2005.pdf.
[B27] Hayhurst, Kelly J., et al., A Practical Tutorial on Modified Condition/Decision Coverage. TM-2001-
210876, National Aeronautics and Space Administration, Langley Research Center, Hampton, Virginia.
2001. Available at http://shemesh.larc.nasa.gov/fm/papers/Hayhurst-2001-tm210876-MCDC.pdf.
[B28] Huang, Yennun, et al., “Software rejuvenation: Analysis, module and applications,” Proceedings
25th International Symposium on Fault-Tolerant Computing, 1995, pp. 381–390.
[B29] IEC 61014:2003 (2nd Edition), Programmes for Reliability Growth. 18
[B30] IEEE P24748-5 (D3 July 2015), IEEE Draft International Standard—Systems and Software
Engineering—Life Cycle Management—Part 5: Software Development Planning. 19
[B31] IEEE Std 610™-1990, IEEE Standard Computer Dictionary: A Compilation of IEEE Standard
Computer Glossaries (withdrawn). 20
[B32] IEEE Std 730™-2014, IEEE Standard for Software Quality Assurance. 21, 22
[B33] IEEE Std 1012™-2012, IEEE Standard for System and Software Verification and Validation.
[B34] IEEE Std 15026-3™-2013, IEEE Standard Adoption of ISO/IEC 15026-3—Systems and Software
Engineering—Systems and Software Assurance—Part 3: System Integrity Levels.
[B35] ISO/IEC 19761:2011, Software engineering—COSMIC: A functional size measurement method. 23
[B36] Jacobsen, Ivar, Grady Booch, and James Rumbaugh, The Unified Software Development Process.
Addison-Wesley, 1999.
18
IEC publications are available from the International Electrotechnical Commission (http://www.iec.ch/). IEC publications are also
available in the United States from the American National Standards Institute (http://www.ansi.org/).
19
Numbers preceded by P are IEEE authorized standards projects that were not approved by the IEEE-SA Standards Board at the time
this publication went to press. For information about obtaining drafts, contact the IEEE.
20
IEEE Std 610-1991 has been withdrawn; however, copies can be obtained from The Institute of Electrical and Electronics Engineers
(http://standards.ieee.org/).
21
The IEEE standards or products referred to in this clause are trademarks of The Institute of Electrical and Electronics Engineers,
Inc.
22
IEEE publications are available from The Institute of Electrical and Electronics Engineers (http://standards.ieee.org/).
23
ISO/IEC publications are available from the ISO Central Secretariat (http://www.iso.org/). ISO publications are also available in the
United States from the American National Standards Institute (http://www.ansi.org/).
255
IEEE Std 1633-2016
[B37] Jelinski, Z., and Moranda, P., “Software Reliability Research,” Statistical Computer Performance
Evaluation, Freiberger, W., ed., New York: Academic Press, 1972, pp. 465–484.
[B38] Joint Capabilities Integration and Development System (JCIDS) 12 February 2015.
[B39] Jones, Capers, Applied Software Measurement: Assuring Productivity and Quality. McGraw-Hill,
June 1996.
[B40] Jones, Capers, “Methods Needed to Achieve >99% Defect Removal Efficiency (DRE) for
Software,” Draft 2.0, August 10, 2015, Namcook Analytics LLC, Copyright © 2015 by Capers Jones.
[B41] Jones, Capers, “Software Industry Blindfolds: Invalid Metrics and Inaccurate Metrics”; Namcook
Analytics, November 2005.
[B42] Jones, Capers, “Software Risk Master (SRM) Sizing and Estimating Examples,” Namcook Analytics
LLC, Version 10.0, April 29, 2015.
[B43] Keene, S. J., “Modeling software R&M characteristics,” Parts I and II, Reliability Review, June and
September 1997. [The Keene-Cole model was developed in 1987 based on 14 data sets. It has not been
updated since that time.]
[B44] Kenny, G., Estimating defects in commercial software during operational use,” IEEE Transactions
on Reliability, vol. 42, no. 1, pp. 107-115, March 1993.
[B45] Lakey, Peter, and A. M. Neufelder, System Software Reliability Assurance Guidebook, Table 7-9,
1995, produced for Rome Laboratories.
[B46] Lakey, Peter, and A. M. Neufelder, System and Software Reliability Assurance Notebook, Rome
Laboratory, 1997.
[B47] Lakey, Peter, “Operational Profile Development,” 2015. Available at https://www.scribd.com/
doc/279880170/Operational-Profile-Development
[B48] Lakey, Peter, “Operational Profile Testing,” 2015. Available at https://www.scribd.com/
doc/279880252/Operational-Profile-Testing.
[B49] Lakey, Peter, “Software Reliability Assurance through Automated Operational Profile Testing,”
November 6, 2013.
[B50] Laplante, Phillip B., “Real Time Systems Design and Analysis—An Engineer’s Handbook,”
pp. 208–209, IEEE Press, Piscataway, NJ, 1992.
[B51] Larman, Craig, Agile and Iterative Development. Addison-Wesley Professional, 2004.
[B52] Mars rover; see http://www.cs.berkeley.edu/~brewer/cs262/PriorityInversion.html.
[B53] McCabe, Thomas, Structured System Testing, 12th edition, McCabe & Associates, Columbia, MD,
1985.
[B54] MIL-HDBK-338B, Military Handbook: Electronic Reliability Design Handbook, October 1, 1998. 24
[B55] MIL-HDBK-781A, Military Handbook: Reliability Test Methods, Plans, and Environments for
Engineering, Development Qualification, and Production (01 APR 1996).
[B56] MIL-STD 1629A, Procedures for Performing a Failure Mode, Effects and Criticality Analysis,
November 24, 1980.
[B57] Moranda, P., “Event-altered rate models for general reliability analysis,” IEEE Transactions on
Reliability, vol. 28, no. 5, pp. 376–381, Dec. 1979.
[B58] Musa, J. D., and Okumoto, K., “A logarithmic Poisson execution time model for software reliability
measurement,” Proceedings of the Seventh International Conference on Software Engineering, Orlando,
FL, pp. 230–238, Mar. 1984.
24
MIL publications are available from the U.S. Department of Defense (http://quicksearch.dla.mil/).
256
IEEE Std 1633-2016
[B59] Musa, J. D., B. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction,
Application. New York: McGraw-Hill, pp. 156–158, 1987.
[B60] Musa, J. D., “Operational Profiles in Software Reliability Engineering,” AT&T Bell Laboratories.
IEEE Software, March 1993, and “The operational profile in software reliability engineering: an overview,”
in Third International Symposium on Software Reliability Engineering, 1992.
[B61] NASA-GB-8719.13, NASA Software Safety Guidebook, 6.6.5 and 7.5.14, March 31, 2004. 25
[B62] NASA/SP-2007-6105 Rev1, NASA Systems Engineering Handbook, Section 4.0.
[B63] Neufelder, A. M., “A Practical Toolkit for Predicting Software Reliability,” ARS Symposium, June
14, 2006, Orlando, Florida. Copyright Softrel, LLC 2006.
[B64] Neufelder, A. M., “Effective Application of Software Failure Modes Effects Analysis,” A CSIAC
State-of-the-Art Report, CSIAC Report Number 519193, Contract FA8075-12-D-0001, Prepared for the
Defense Technical Information Center, 2014.
[B65] Neufelder, A. M., “Four things that are almost guaranteed to reduce the reliability of a software
intensive system,” Huntsville Society of Reliability Engineers RAMS VII Conference, November 4, 2014.
Copyright 2014.
[B66] Neufelder, A. M., “Software Reliability for Practitioners,” Technical Report by Softrel, LLC,
November, 2015.
[B67] Neufelder, A. M, “Software Reliability Toolkit for Predicting and Managing Software Defects,”
November 2010.
[B68] Neufelder, A. M., “The Cold Hard Truth about Reliable Software,” edition 6e, originally published
in 1993 and updated to version 6e in 2015. [This document describes the lookup tables for defect density as
well as the data that was used to compute the average defect densities in the table.]
[B69] Neumann, Peter G., and Donn B. Parker; “A summary of computer misuse techniques,” Proceedings
of the 12th National Computer Security Conference, pp. 396–407.
[B70] Pohland, Timothy, and David Bernreuther, Scorecard Reviews for Improved Software Reliability,
Defense AT&L, Jan-Feb 2014.
[B71] Putnam, Lawrence H., Measures for Excellence. Yourdon Press, 1992.
[B72] Quanterion Solutions Inc., “Handbook of 217Plus™:2015 Reliability Prediction Models” (Dec. 15,
2014) and “217Plus™:2015 Calculator.”
[B73] Radio Technical Commission for Aeronautics (RTCA), Software Considerations in Airborne
Systems and Equipment Certification, DO-178C, 12/13/11.
[B74] Rational Unified Process available at http://en.wikipedia.org/wiki/Unified_Process.
[B75] Reinder, J. Bril, “Real-Time Architectures 2006/2007, Scheduling policies—III Resource access
protocols” (courtesy of Johan J. Lukkien). Available at http://www.win.tue.nl/~rbril/education/
2IN20/RTA.B4-Policies-3.pdf.
[B76] Rexstad, F., and K. P. Burnham, User’s Guide for Interactive Program CAPTURE, Colorado
Cooperative Fish & Wildlife Research Unit, Colorado State University, Fort Collins, Colorado, 1991.
[B77] Science Applications International Corporation & Research Triangle Institute, Software Reliability
Measurement and Testing Guidebook, Final Technical Report, Rome Air Development Center, Griffiss Air
Force Base, New York, January 1992. Available at http://www.softrel.com/publications/RL.
[B78] Shi, Y., et al., “Metric-based Software Reliability Prediction Approach and its Application,”
Empirical Software Engineering Journal, 2015.
25
NASA publications are available from the National Aeronautics and Space Administration (http://www.nasa.gov/).
257
IEEE Std 1633-2016
[B79] Shi, Y., M. Li, and C. Smidts, “On the Use of Extended Finite State Machine Models for Software
Fault Propagation and Software Reliability Estimation,” 6th American Nuclear Society International
Topical Meeting on Nuclear Plant Instrumentation, Controls, and Human Machine Interface Technology,
Knoxville, Tennessee, 2009.
[B80] Shooman, M. L., and G. Richeson, “Reliability of Shuttle Mission Control Center Software,”
Proceedings of the Annual Reliability and Maintainability Symposium, 1983, pp. 125–135 [best conference
paper].
[B81] Shooman, M. L., Probabilistic Reliability: An Engineering Approach. New York: McGraw-Hill
Book Co., 1968 (2nd edition, Melbourne, FL: Krieger, 1990).
[B82] Shooman, M. L., Reliability of Computer Systems and Networks, Fault Tolerance, Analysis, and
Design. New York: McGraw-Hill, 2002. p. 234. [Dr. Shooman denotes the inherent defects as ET, which is
equivalent to N0.]
[B83] Shooman, M. L., “Software Reliability Growth Model Based on Bohr and Mandel Bugs,”
Proceedings of International Symposium on Software Reliability Engineering, Washington DC, Nov. 2–5,
2015.
[B84] Smidts, C., and M. Li, “Software Engineering Measures for Predicting Software Reliability in Safety
Critical Digital Systems,” NRC, Office of Nuclear Regulatory Research, Washington DC, NUREG/GR-
0019, 2000.
[B85] Smidts, C., et al., “A Large Scale Validation of a Methodology for Assessing Software Quality,”
NUREG report for the US Nuclear Regulatory Commission, NUREG/CR-7042, July 2011.
[B86] Society of Automotive Engineers, Recommended Practice, Software Reliability Program
Implementation Guide, Standard by SAE International, 05/07/2012. [The full-scale model is presented in
this document.] 26
[B87] Society of Automotive Engineers, SAE ARP 5580 Recommended Failure Modes and Effects
Analysis (FMEA) Practices for Non-Automobile Applications, July 2001.
[B88] Software Reliability Engineering: More Reliable Software Faster and Cheaper, Chapter 6: Guiding
Test, 2nd edition, Author House, 2004.
[B89] The Handbook of Software Reliability Engineering, edited by Michael R. Lyu, published by IEEE
Computer Society Press and McGraw-Hill Book Company, ISBN 9-07-039400-8. Available at
http://www.cse.cuhk.edu.hk/~lyu/book/reliability/.
[B90] Tian, J., “Integrating time domain and input domain analyses of software reliability using tree-based
models,” IEEE Transactions on Software Engineering, vol. 21, issue 12, pp. 945–958.
[B91] Tian, J., Software Quality Engineering: Testing, QA, and Quantifiable Improvement. Hoboken: John
Wiley & Sons, Inc., 2005, ISBN 0-471-71345-7. Purification is discussed briefly on p. 384. Available at
http://ff.tu-sofia.bg/~bogi/France/SoftEng/books/software_quality_engineering_testing_quality_assurance_
and_quantifiable_improvement_wiley.pdf.
[B92] US General Accounting Office, GAO-10-706T, “Defense acquisitions: observations on weapon
program performance and acquisition reforms,” May 19, 2010.
[B93] Vesely, W. E., et al., “Fault Tree Handbook NUREG 0492,” US Nuclear Regulatory Commission,
1981.
[B94] Voas, J. M., “PIE: A Dynamic Failure-Based Technique,” IEEE Transactions on Software
Engineering, vol. 18, pp. 717–727, 1992.
[B95] Von Alven, W. H., Reliability Engineering. Englewood Cliffs: Prentice Hall, 1964.
26
SAE publications are available from the Society of Automotive Engineers (http://www.sae.org/).
258
IEEE Std 1633-2016
[B96] White, G. C., et al., User’s Manual for Program CAPTURE. Logan: Utah State University Press,
1978.
[B97] Yamada, S., M. Ohba, and S. Osaki, “S-shaped reliability growth modeling for software error
detection,” IEEE Transactions on Reliability, vol. R-32, no. 5, pp. 475–478, Dec. 1983.
259
I
EEE
st
andards
.i
eee.
org
Phone:+17329810060 Fax:+17325621571
©IEEE

Ieee Recommended Practice On Software Reliability

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ieee Recommended Practice On Software Reliability

Transféré par

Droits d'auteur :

Formats disponibles

IEEE Recommended Practice on

IEEE Reliability Society

IEEE Recommended Practice on

Approved 22 September 2016

IEEE-SA Standards Board

Keywords: IEEE 1633™, software failure modes, software reliability

The Institute of Electrical and Electronics Engineers, Inc.

Copyright © 2017 by The Institute of Electrical and Electronics Engineers, Inc.

COCOMO is a registered trademark of Barry W. Boehm.

Price is a registered trademark of Price Systems, L.L.C.

217Plus is a trademark of Quanterion Solutions Incorporated.

PDF: ISBN 978-1-5044-3648-9 STD22370

IEEE prohibits discrimination, harassment, and bullying.

Notice and Disclaimer of Liability Concerning the Use of IEEE Standards

Secretary, IEEE-SA Standards Board

Laws and regulations

Updating of IEEE Standards documents

Ann Marie Neufelder, Chair

Jacob Axman Nathan Herbert Allen Nikora

Johann Amsenga Debra Haehn Ann Marie Neufelder

Jean-Philippe Faure, Chair

Chuck Adams Gary Hoffman Mehmet Ulema

This standard includes guidance on the following:

 This document is a revision of IEEE Std 1633-2008.

Structure of the recommended practice

 Clause 1 provides the overview.

Copyrights and Permissions

3. Definitions, acronyms, and abbreviations .................................................................................................15

4. Role, approach, and concepts ....................................................................................................................21

5. Software reliability procedure ...................................................................................................................34

6. Software reliability models ......................................................................................................................148

Annex A (informative) Software failure modes effects analysis templates .................................................168

Annex D (informative) Estimated relative cost of SRE tasks......................................................................200

Annex E (informative) Software reliability engineering related tools .........................................................203

Annex G (informative) Bibliography ..........................................................................................................254

3. Definitions, acronyms, and abbreviations

incremental development: A software development technique in which requirements definition, design,

3.2 Acronyms and abbreviations

API Application Programmers Interface

ASIC Application Specific Integrated Circuit

BIOS Basic Input Output System

BOM bill of materials

CASRE computer-aided software reliability estimates

CIL critical items list

CMMI® Capability Maturity Model Integration® 6

COTS commercial-off-the-shelf software

CSCI computer software configuration item

CTMC Continuous Time Markov Chain

DDL dynamically linked libraries

DDN defect days number

DLOC defects per line of code

EFSM extended finite state machine

EKSLOC Effective 1000 Source Lines of Code

FDIR fault/failure detection, isolation, and recovery

FDSC failure definition and scoring criteria

FMEA failure modes and effects analysis

FOSS free open source software

FPGA field programmable gate array

FTA fault tree analysis

GFS government furnished software

IDD interface design document

JVM Java™ Virtual Machine 7

KSLOC 1000 source lines of code