Tutorial 6: 5th International Conference on Reliable Software Technologies
(Ada-Europe 2000) June 26-30, 2000, Potsdam (Berlin), Germany

Tree-Based Reliability Models (TBRMs) for Early Reliability Measurement and Improvement

Prof. Jeff Tian, SMU, Dallas, Texas, USA

Time: Monday, June 26, 2000

Duration: Half day (afternoon)

1. General Information

This tutorial surveys recent developments in software reliability engineering, particularly recent work in using tree-based reliability models (TBRMs) in analyzing product reliability and identifying high risk areas for focused reliability improvement for large software systems. Environmental constraints and existing analysis techniques are carefully examined to select appropriate existing techniques and develop new ones to build our integrated approach, implementation strategies, and support tools suitable for large software systems. Specific activities in our integrated approach include:

measure definition,
data gathering,
graphical test activity and defect tracking,
overall quality assessment using software reliability growth models (SRGMs),
identification of problematic areas and risk management using TBRMs,
followup actions and strategy validation.

Various existing software tools have been adapted and integrated to support these analysis and modeling activities. This approach has been used in the testing phase of several large software products developed in the IBM Software Solutions Toronto Laboratory and was demonstrated to be effective and efficient. More recently, this approach has also been applied to improve the reliability of telecommunication software systems developed at Nortel Networks, with promising initial results. Various practical problems and solutions in implementing this strategy are also discussed.

1.1. Keywords

large software systems,
reliability,
testing,
risk identification and management,
tools and environments.

1.2. Audience

The tutorial is designed for general technical audience, such as the general audience in any of the related technical conferences.

Familiarity with the general software development activities, process, and concept of quality is assumed. But no specific knowledge about software reliability engineering and risk identification techniques is assumed.

1.3. Reading Material

Each participant of the tutorial will be provided with a tutorial notes packet, including the following material:

General tutorial information (this document).
Copy of the presentation slides.
Copy of the following papers to be discussed in the tutorial:
1. J. Tian, "Reliability Measurement, Analysis, and Improvement for Large Software Systems", in Advances in Computers, Vol.46, pp.159-235, Academic Press, 1998.
2. J. Tian, "Measurement and Continuous Improvement of Software Reliability throughout Software Life-cycle", Journal of Systems and Software, Vol. 47, Nos.2-3, pp.189-195, July, 1999.
3. J. Tian, "Techniques for Risk Identification and Quality Improvement", Software Quality Professional, Vol.2, No.2, pp.32-41, March, 2000.

1.4. Project Background and Acknowledgment

The work described in this tutorial is supported by the following organizations and/or grants:

IBM Software Solutions Toronto Laboratory, where the tutorial presenter (Dr. Jeff Tian) worked between 1992 and 1995, with continued support and collaboration provided to Dr. Tian since 1995.
NSF/CAREER award CCR-9733588, 6/1/1998 -- 5/31/2002.
Nortel Networks, with collaboration and project sponsorship since 1996.
Texas THECB/ATP award to Dr. Tian, 1/1/2000 -- 12/31/2001.

2. Topics to Be Covered

2.1. Techniques and Models for Analyzing Software Reliability

A survey of existing reliability analysis techniques and commonly used software reliability models, including both the time domain software reliability growth models (SRGMs) and the input domain reliability models (IDRMs), and discussions about their common assumptions and applicability, are presented. Specific topics include:

Basic definitions and concepts about software quality, reliability, and related analyses:
A brief discussion about testing techniques, operational profiles (OP) and their relation to reliability.
Definitions and techniques for defining and measuring reliability in the time domain, and a brief survey of various SRGMs used for this purpose.
A brief survey of input domain reliability analysis techniques and some specific IDRMs.
Discussions about general assumptions common to many SRGMs and IDRM and their implications .

2.2. Applying Existing Approaches in Large Software Systems

We first describe the testing environment for large software systems and the specific needs for quality assessment and improvement under such an environment. Specific topics in this area include:

Examining the testing process and workload characteristics, and characterizing scenario-based testing commonly used in testing large software systems.
Specifying testing environment, measurements and constraints.
Discussing the appropriateness of reliability analysis in scenario-based testing by matching model assumptions with the application environment.

Secondly, we discuss the test activities and workload measurements and some recent results applying various SRGMs in assessing and predicting reliability for large software systems. Specific topics in this area include:

Test workload measurement and reliability growth visualization to examine the overall trend and pattern in failure arrivals.
Using calendar time, run count, and execution time failure data in SRGMs, and examine the modeling results.
General conclusions and recommendations for effective usage of SRGMs in large software systems.

2.3. Tree-Based Reliability Models (TBRMs)

We provide a thorough description of the tree-based reliability models (TBRMs) and their application in identifying high risk (low reliability) areas for focused reliability improvement. Specific topics include:

An assessment of SRGMs and IDRMs for applications in large software systems, and possibilities and motivations for integrated analysis.
Integrated analysis and tree-based modeling, and the resultant tree-based reliability models (TBRMs).
Analyses integration and TBRM applications.
TBRMs' impact on reliability improvement: A cross validation study based on purification level comparisons of several IBM products.
An extension of TBRMs: A new type of SRGMs based on data clustering (SRGM-DC) analysis and its applications, including discussion about both the direct usage of SRGM-DC and dual model based grouped data.

2.4. Integration, Implementation, and Tool Support

We describe implementation issues and software tool support for various reliability analyses covered in this tutorial. Specific topics include:

General implementation and process linkage, covering both the specific modifications to the existing testing process and overall integration with the software development process.
Tool support for data collection, analyses, and presentation, and our existing implementation.
Integration and future development.

2.5. Followup Studies: New Techniques and Complete Lifecycle Approach

Some followup studies cover the comparison of tree-based modeling (TBM) with other risk identification techniques, and extension of our TBRMs to support reliability measurement and improvement over other development phases. The techniques examined include:

Traditional statistical techniques, including correlation analysis, linear regression models, logistic analysis, etc.
New statistical techniques, including tree-based modeling, principal component analysis and discriminate analysis.
AI-based techniques, including artificial neuron networks and optimal set reduction (a pattern matching approach).

The comparative results and conclusions are discussed, which points to the appropriateness of using tree-based modeling technique for our integrated approach.

We are also conducting studies to extend the integrated approach based on TBRMs to cover other development phases. In addition to software reliability engineering and recent development in the areas being examined, work in software measurement, inspection, and overall process management is also studied to derive our complete lifecycle approach. Discussions of the future directions in this on-going research and preliminary results are also presented.

Prepared by Jeff Tian (tian@seas.smu.edu). Last update March 16, 2000.

Tutorial 6: 5th International Conference on Reliable Software Technologies (Ada-Europe 2000) June 26-30, 2000, Potsdam (Berlin), Germany