Tutorial 6:
5th International Conference on Reliable Software Technologies
(Ada-Europe 2000) June 26-30, 2000, Potsdam (Berlin), Germany
Tree-Based Reliability Models (TBRMs) for
Early Reliability Measurement and Improvement
Prof. Jeff Tian, SMU, Dallas, Texas, USA
Time: Monday, June 26, 2000
Duration: Half day (afternoon)
1. General Information
This tutorial surveys recent developments in software reliability engineering,
particularly recent work in using tree-based reliability models (TBRMs)
in analyzing product reliability and identifying high risk areas for focused
reliability improvement for large software systems.
Environmental constraints and existing analysis techniques
are carefully examined to select appropriate existing techniques
and develop new ones to build our integrated approach,
implementation strategies, and support tools suitable for
large software systems.
Specific activities in our integrated approach include:
- measure definition,
- data gathering,
- graphical test activity and defect tracking,
- overall quality assessment using software reliability growth models
(SRGMs),
- identification of problematic areas and risk management using TBRMs,
- followup actions and strategy validation.
Various existing software tools have been adapted
and integrated to support these analysis and modeling activities.
This approach has been used in the testing phase of several large
software products developed in the IBM Software Solutions Toronto Laboratory
and was demonstrated to be effective and efficient.
More recently,
this approach has also been applied to improve the reliability
of telecommunication software systems developed at Nortel Networks,
with promising initial results.
Various practical problems and solutions in implementing this strategy
are also discussed.
1.1. Keywords
- large software systems,
- reliability,
- testing,
- risk identification and management,
- tools and environments.
1.2. Audience
The tutorial is designed for general technical audience,
such as the general audience in
any of the related technical conferences.
Familiarity with the general software development activities,
process, and concept of quality is assumed. But no specific knowledge
about software reliability engineering and risk identification
techniques is assumed.
1.3. Reading Material
Each participant of the tutorial will be provided with a tutorial
notes packet, including the following material:
- General tutorial information (this document).
- Copy of the presentation slides.
- Copy of the following papers to be discussed in the tutorial:
-
J. Tian,
"Reliability Measurement, Analysis, and Improvement
for Large Software Systems",
in
Advances in Computers, Vol.46, pp.159-235, Academic Press, 1998.
-
J. Tian,
"Measurement and Continuous Improvement of Software Reliability
throughout Software Life-cycle",
Journal of Systems and Software,
Vol. 47, Nos.2-3, pp.189-195, July, 1999.
-
J. Tian,
"Techniques for Risk Identification and Quality Improvement",
Software Quality Professional,
Vol.2, No.2, pp.32-41, March, 2000.
1.4. Project Background and Acknowledgment
The work described in this tutorial is supported by the following
organizations and/or grants:
- IBM Software Solutions Toronto Laboratory,
where the tutorial presenter (Dr. Jeff Tian) worked between 1992 and 1995,
with continued support and collaboration provided to Dr. Tian since 1995.
- NSF/CAREER award CCR-9733588, 6/1/1998 -- 5/31/2002.
- Nortel Networks,
with collaboration and project sponsorship since 1996.
- Texas THECB/ATP award to Dr. Tian, 1/1/2000 -- 12/31/2001.
2. Topics to Be Covered
2.1. Techniques and Models for Analyzing Software Reliability
A survey of existing reliability analysis
techniques and commonly used software reliability models,
including both the time domain software reliability growth models (SRGMs)
and the input domain reliability models (IDRMs),
and discussions about their common assumptions and applicability,
are presented.
Specific topics include:
- Basic definitions and concepts about software
quality, reliability, and related analyses:
- A brief discussion about
testing techniques, operational profiles (OP)
and their relation to reliability.
- Definitions and techniques for
defining and measuring reliability in the time domain,
and a brief survey of various SRGMs used for this purpose.
- A brief survey of
input domain reliability analysis techniques
and some specific IDRMs.
- Discussions about
general assumptions common to many SRGMs and IDRM
and their implications .
2.2. Applying Existing Approaches in Large Software Systems
We first
describe the testing environment for large software systems and
the specific needs for quality assessment and improvement under
such an environment.
Specific topics in this area include:
- Examining the testing process and workload characteristics,
and characterizing scenario-based testing commonly used
in testing large software systems.
- Specifying testing environment, measurements and constraints.
- Discussing the appropriateness
of reliability analysis in scenario-based testing
by matching model assumptions with the application environment.
Secondly,
we discuss the test activities and workload measurements
and some recent results applying various SRGMs
in assessing and predicting reliability for large software systems.
Specific topics in this area include:
- Test workload measurement and reliability growth visualization
to examine the overall trend and pattern in failure arrivals.
- Using calendar time, run count, and execution time failure data in SRGMs,
and examine the modeling results.
- General conclusions and recommendations for effective usage of
SRGMs in large software systems.
2.3. Tree-Based Reliability Models (TBRMs)
We provide a thorough description of the tree-based reliability models (TBRMs)
and their application in identifying high risk (low reliability) areas
for focused reliability improvement.
Specific topics include:
- An assessment of SRGMs and IDRMs
for applications in large software systems,
and possibilities and motivations for integrated analysis.
- Integrated analysis and tree-based modeling,
and the resultant tree-based reliability models (TBRMs).
- Analyses integration and TBRM applications.
- TBRMs' impact on reliability improvement: A cross validation
study based on purification level comparisons of several IBM products.
- An extension of TBRMs:
A new type of SRGMs based on data clustering (SRGM-DC) analysis
and its applications,
including discussion about both the
direct usage of SRGM-DC and dual model based grouped data.
2.4. Integration, Implementation, and Tool Support
We describe implementation issues and software tool support for
various reliability analyses covered in this tutorial.
Specific topics include:
- General implementation and process linkage,
covering both the specific modifications to the existing testing
process and overall integration with the software development process.
- Tool support for data collection, analyses, and presentation,
and our existing implementation.
- Integration and future development.
2.5. Followup Studies: New Techniques and Complete Lifecycle Approach
Some followup studies cover the comparison of tree-based modeling
(TBM) with other risk identification techniques,
and extension of our TBRMs to support reliability measurement and
improvement over other development phases.
The techniques examined include:
- Traditional statistical techniques, including
correlation analysis, linear regression models,
logistic analysis, etc.
- New statistical techniques, including tree-based modeling,
principal component analysis and discriminate analysis.
- AI-based techniques,
including artificial neuron networks and optimal set reduction
(a pattern matching approach).
The comparative results and conclusions are discussed,
which points to the appropriateness of using tree-based modeling
technique for our integrated approach.
We are also conducting studies to extend the integrated approach
based on TBRMs to cover
other development phases.
In addition to software reliability engineering and
recent development in the areas being examined,
work in software measurement, inspection,
and overall process management is also studied
to derive our complete lifecycle approach.
Discussions of the future directions in this on-going research
and preliminary results are also presented.
Prepared by Jeff Tian
(tian@seas.smu.edu).
Last update March 16, 2000.
Back to Jeff Tian's home page