Vous êtes sur la page 1sur 47

DARE

A Domain Analysis and Reuse Environment

Phase I Final Report


AUGUST 20, 1992

Sponsored by
Defense Advanced Research Projects Agency
Defense Small Business Innovation Research Program
DARPA Order No. 5916
Issued by U. S. Army Missile Command Under
Contract # DAAH01-92-C-R040

AUTHORS:
Dr. Rubn Prieto-Daz
Principal Investigator
Dr. Bill Frakes
Consultant
Mr. B.K. Gogia
Project Manager

Reuse, Inc.
12365 Washington Brice Rd.
Fairfax, VA 22033
703-620-5385
FAX: 703-620-5385

reuse inc.

12365 Washington Brice Rd.


Fairfax, VA 22033
703-620-5385

FOREWORD
This Phase I Final Report was prepared by Reuse, Incorporated, 12365 Washington Brice
Rd., Fairfax, Virginia, 22033, under DARPA Phase I SBIR Contract No. DAAH01-92-CR040, issued by U.S. Army Missile Command.
The Reuse, Inc. personnel involved in this program and in writing this report are Dr.
Rubn Prieto-Daz, the Principal Investigator, Dr. Bill Frakes, an external consultant, and
Mr. B.K. Gogia, the Project Manager. The final report covers the period of performance
from February 20, 1992 through August 20, 1992. The final report was submitted by
Reuse, Inc. August 20, 1992.
Although this report is unclassified, its distribution is limited to U.S. Government
agencies only; pages containing confidential proprietary information have been marked
as such. Other requests for this document must be referred to Director, Defense
Advanced Research Projects Agency, 3701 North Fairfax Drive, Arlington VA 222031714.
THE VIEWS AND CONCLUSIONS CONTAINED IN THIS DOCUMENT ARE THOSE OF THE
AUTHORS AND SHOULD NOT BE INTERPRETED AS REPRESENTING THE OFFICIAL
POLICIES, EITHER EXPRESS OR IMPLIED, OF THE DEFENSE ADVANCED RESEARCH
PROJECTS AGENCY OR THE U.S. GOVERNMENT.

ii

Table of Contents
Executive Summary..................................................................................................................... 1
Status of phase I research ............................................................................................... 1
Research results.............................................................................................................. 1
Report overview ............................................................................................................. 3
1- Introduction............................................................................................................................. 5
1.1- The software reuse problem .................................................................................... 5
1.2- Domain analysis...................................................................................................... 6
2- Survey of Domain Analysis Methods ..................................................................................... 6
2.1- Historical perspective.............................................................................................. 7
2.2- Domain analysis methods ....................................................................................... 9
Prieto-Daz Approach........................................................................................ 9
FODA ................................................................................................................ 10
IDeA .................................................................................................................. 11
2.3- Summary of main features ...................................................................................... 11
2.4- Discussion of key activities..................................................................................... 12
2.5- Potential for automation.......................................................................................... 12
3- Domain Analysis in the STARS Reuse Library Process Model (SRLPM) ........................... 13
3.1- Primitive operations ................................................................................................ 15
Prepare domain information .............................................................................. 15
Classify domain entities .................................................................................... 15
Derive domain models....................................................................................... 16
Expand and verify models and classification .................................................... 17
3.2- Overview of domain analysis activities .................................................................. 18
3.3- Selected activities in the SRLPM which can be automated .................................... 18
4- Underlying Technologies for Automating Domain Analysis ................................................. 19
4.1- Information retrieval systems.................................................................................. 19
4.2- Artificial intelligence .............................................................................................. 21
4.3- Code static and dynamic analysis tools................................................................... 23
4.4- Interface environments............................................................................................ 25
4.5- CASE tools.............................................................................................................. 25
5- A Domain Analysis and Reuse Environment (DARE) ........................................................... 26
5.1- DARE architecture .................................................................................................. 26
5.2- DARE supported domain analysis process ............................................................. 28
Acquire domain knowledge (A2) ...................................................................... 32
Structure domain knowledge (A3) .................................................................... 32
Identify commonalities (A4) ............................................................................. 36
Generate Domain Models (A5) ......................................................................... 36
5.3- DARE functional model.......................................................................................... 36
5.4- Architecture components ........................................................................................ 38
Document analysis tools.................................................................................... 38
Domain expert knowledge extraction tools....................................................... 38
Code analysis tools............................................................................................ 39
Reuse library tools............................................................................................. 39
Domain analysis interaction tools ..................................................................... 40
5.5- Architecture integration .......................................................................................... 40
6- Conclusion .............................................................................................................................. 41
References ................................................................................................................................... 41

iii

_______________________________________________________________________

reuse inc.

DARE
A Domain Analysis and Reuse Environment

Executive Summary
Status of phase I research
Domain analysis (DA) holds the key for the systematic, formal, and effective practice of
software reuse. Proposed approaches and methods for DA assume that domain
information exists and is readily usable. Experience indicates, however, that acquiring
and structuring domain information is the bottleneck of DA. This Phase I research report
presents the architecture and functional analysis of a support environment to automate
parts of the acquisition and structuring activities of DA.
This Phase I study assesses the potential for automation of DA. Existing techniques and
tools, in particular those from information retrieval and expert systems development,
provide support for activities in the DA process. Many of these tools can be used
immediately while certain DA activities may require the creation of new tools. There is,
therefore, a definite potential for automating parts of DA provided a basic framework to
conduct DA exists.
The framework for conducting DA is provided by a modified RLPM. The RLPM or
Reuse Library Process Model, is a methodology developed by Reuse Inc. for the STARS
Program. It emphasizes the early analysis for acquiring and structuring domain
information. The RLPM converts the ad-hoc nature of DA into a repeatable procedure
with well defined, tangible outputs. The modified RLPM presented here organizes the
key activities of acquisition and structuring of domain information in a way that can be
supported by independent but coordinated sets of tools.
The study proposes the Domain Analysis and Reuse Environment (DARE) as a practical
and viable support environment for partially automating the early activities of DA. The
research reported demonstrates that, although DA is a difficult and complex process,
several of its activities deal with small independent steps that can be automated, thus
reducing the complexity of DA to an interactive activity of grouping and organizing the
outputs of these small steps.
This research report shows the main components of a DARE architecture and how these
components interact through data and control flows. It also describes the specific tools
required to implement DARE.
Research results
The technical objectives of this study have been two-fold: to determine if and to what
extent current domain analysis technology is supportive of a reuse based domain specific
software development paradigm, and to determine the potential for automating domain
analysis activities. Both objectives will accelerate a paradigm shift towards domain
specific reuse based development.
The research objectives focus on providing answers to the following key questions:
1

_______________________________________________________________________

reuse inc.

1- Is it Possible and Feasible to Automate Domain Analysis?


An assessment of the domain analysis activities proposed in the STARS Reuse Library
Process Model [Prie91a] was conducted. The study analyzed and selected activities from
the perspective of potential automation and confirmed that it is possible and feasible to
automate DA. Existing software tools and techniques were identified and associated to
selected activities in a modified domain analysis model. Areas outside the software
engineering domain, such as Information Retrieval and Artificial Intelligence.
2- What Are the Physical, Human, and Technological Limitations to DA?
It was found that certain activities in the domain analysis process are human intensive
and difficult to automate with current technology. These activities were assessed to
determine their limitations regarding automation and human as well as technical
requirements. While some activities may be automated only partially, others may be
impossible to automate. Knowledge abstraction, concept association, and concept
unification are currently in the impossible to automate category. Model formation,
architecture development, and knowledge organization (i.e., classification) can be
partially automated. Text processing, vocabulary development, and several intermediate
activities can be automated.
3- What Existing Tools and Techniques Can Be Adapted for Domain Analysis?
Several tools were surveyed and analyzed based on the activities defined by a modified
domain analysis process based on RLPM. The study selected, among the identified tools
and techniques, those offering the least effort for adaptability to perform DA activities.
In the IR realm, PLS a commercial IR vendor, offers a library of reusable components
that we might use to construct the text processing parts of DARE. The code components
in the book Information Retrieval: Data Structures and Algorithms [FB92] could also be
used. The tools in the Unix/C environment [FFN91], described below, could be used for
the reverse engineering part of DARE. Other tools, such as those for plagiarism
detection, might also be used. AI methods for knowledge acquisition and representation
were also evaluated.
4- What New Tools and Techniques Are Required to Do DA?
While some activities in domain analysis can be related to existing tools, others are
unique and require tools or approaches yet to be created. One of the objectives of the
study was to identify what DA activities require special tools or techniques not currently
available. DA tasks requiring such tools include: bounding and defining the domain,
designing interviews and questionnaires for extracting domain knowledge from experts,
identifying and abstracting objects and operations from processed text, and some of the
activities conducted during domain model development. It was found that, in spite of the
absence of such tools, the availability of tools that support several of the other activities
in DA, when properly integrated, facilitate the DA process.
5- Can Tools and Techniques for DA Be Integrated in a Support Environment?

_______________________________________________________________________

reuse inc.

The answer is yes. This study presents a process model and an architecture to support
DA. The DARE environment is an integrated collection of tools that support domain
knowledge acquisition and structuring, as well as commonality analysis, model
development, and reuse. DARE is a highly interactive environment designed to facilitate
the intelligence intensive activities typical of DA.
DARE supports all parts of the domain analysis process, knowledge acquisition, concept
abstraction, classification, library population, and specification of reusable software.
Other features include library functions for search and retrieval, capture and analysis of
reuse metrics, and interfaces to other software development environments.
Report overview
The Phase I research effort shows that it is possible to partially automate DA and that it
can be done through a well orchestrated collection of tools operating under a well defined
process model. The following tasks were undertaken to demonstrate that DARE can be
developed successfully.
Task 1: Survey and analysis of current domain analysis methods.
Existing domain analysis methods and approaches were surveyed. The methods surveyed
include SEI's FODA [Kang90], MCC's IDeA heuristics [Luba88], SPC's domain analysis
process [Jawo90], IBM's product-oriented paradigm [McCa85], and Arango's learning
system approach [Ara88].
The survey includes listing of main features and discussion on similarities and
differences
Task 2- Detailed analysis of STARS Reuse Library Process Model
The activities of domain analysis were analyzed to determine their suitability for
automation. The analysis included each of the four activities proposed in the SRLPM
approach and their derived sub activities. Each activity is decomposed into several levels
of detail. The analysis determined to what extent these fine grained activities can be
automated. The outcomes of this analysis include:
Description of each low level activity of domain analysis in the STARS Reuse
Library Process Model.
Discussion on the potential for automation. How these activities relate to existing
tool capabilities and how feasible is it to adapt tools for these tasks.
Task 3- Selection of domain analysis activities with potential for automation
The purpose of this task was to select the DA activities that could be automated and to
develop a revised DA process model to integrate them into a coherent and rational
process that could be implemented by a coordinated collection of existing tools.
Task 4- Evaluation of tools and techniques that meet domain analysis requirements
The objective of this evaluation was to assess the availability of the technology
supporting DA activities. We surveyed automated reuse library systems and technology,

_______________________________________________________________________

reuse inc.

CASE technology provided by the Unix/C environment, and IR systems such as PLS
which we might use to construct the text analysis portion of DARE. We evaluated the
utility of each of these kinds of tools to support the domain analysis processes in our
model.
Task 5- Propose and specify a domain analysis and reuse support environment
The purpose of tasks one through four prepared the ground and built the basis for
specifying a domain analysis and reuse environment (DARE). Although DARE could
have been proposed without the effort of going through the first four tasks, a careful
assessment of the state of the art and existing technology was necessary if a realistic and
practical environment was to be proposed. Tasks one through four comprise a structured
research plan to determine the feasibility of DARE, and to provide the information
necessary to decide whether to pursue such an environment and what level of automation
to expect.
The DARE architecture (see figure 3) consists of a user interface, a domain analysis
support environment, and a software reuse library. Selected COTS (commercial of the
shelf) tools and tools specially designed to support reuse based domain specific
development are shared.
Summarizing the contents of this report, section one defines the software reuse problem
and domain analysis. Section two gives an historical overview of domain analysis
including a summary of the major domain analysis approaches, and their similarities and
differences. Section three presents the role of domain analysis in the STARS reuse
library process model. In section four we survey the underlying technologies for
automating domain analysis, including information retrieval (IR), artificial intelligence
(AI), static and dynamic analysis of software, interface environments, and CASE tools.
Section 5 presents the DARE architecture, providing a model and explanation of the
processes and tools in the model. Section six presents our conclusions.

_______________________________________________________________________

reuse inc.

1- Introduction
Domain analysis has become a topic of significant interest in the reuse community.
Domain analysis holds the key for the systematic, formal, and effective practice of
software reuse. Unfortunately, domain analysis is still an emerging technology and
practiced informally. There is a definite opportunity, however, for automating parts of the
domain analysis process. The domain analysis methodology proposed in the STARS
Reuse Library Process Model [Prie91a] can be used as a framework to identify parts of
the process that can be automated by adapting existing tools and techniques. The
opportunity for automation presented in this study is in the form of a Domain Analysis
and Reuse Environment (DARE). This section presents the reuse problem and its relation
to domain analysis.
1.1- The software reuse problem
One of the reasons software reuse has remained an elusive goal is the recurrent emphasis
on reusing code. Software reuse is still far from realizing the ideas of a software industry
based on interchangeable standard parts first proposed by Dough McIlroy over 20 years
ago [McIl69]. Reuse involves more than just code. It involves organizing and
encapsulating knowledge and experience, and setting the mechanisms and organizational
structures to make them available for reuse.
Software reuse is sensitive to several factors that make the simple hardware analogy of
software ICs [Cox86] difficult to apply. The context of the application domain, for
example, plays a critical role in the "reusability" of software. Software can not be
successfully reused in all domains. The reality is that narrow, well understood application
domains based on stable technologies and standardized architectures, such as compilers
(e.g., Lex and YACC) and database systems [Bato88], have demonstrated the significant
leverage that can be achieved with high level reuse. It is not simply a matter of going out
into the field and gathering up components to populate a repository. Casually assembled
libraries seldom are the basis of a high payoff reuse system. A reuse library offers
considerably more value when its collections consist of integrated packages of reusable
knowledge from a particular domain than if they consist of isolated and relatively
independent code components. A domain model in the form of a high level architecture,
for example, offers the potential reuser a basic structure to start building a new system.
Each element of the architecture can be implemented from library components specially
designed to meet the architecture requirements.
There is a need, therefore, to focus reuse on all the products of the software development
process such as requirements, specifications, designs, code, and test cases and plans. The
highest payoff is achieved by reusing high level representations of software products like
requirements and designs [Gilr89]. If we are able to reuse an existing software design
then we should be able to reuse its code implementation. We should, therefore, focus on
the process of capturing, organizing, and encapsulating such requirements and designs
for reuse.

_______________________________________________________________________

reuse inc.

1.2- Domain analysis


Domain analysis is a process by which information used in developing software systems
is identified, captured, and organized with the purpose of making it reusable [Prie90]. In
simpler terms, domain analysis is similar to systems analysis but instead of being
performed on a single system it is done for multiple related systems. During software
development, information of several kinds is generated, from requirements analysis to
specific designs to source code. One of the objectives of domain analysis is to make this
information readily available. In making a reusability decision, that is, in trying to decide
whether or not to reuse a component, a software engineer has to understand the context
which prompted the original designer to build the component the way it is. By making
this development information available, a reuser has leverage in making reuse more
effective.
A definite improvement in the reuse process results when we succeed, through domain
analysis, in deriving common architectures, generic models or specialized languages that
substantially leverage the software development process in a specific problem area. How
do we find these architectures or languages? It is by identifying features common to a
domain of applications, selecting and abstracting the objects and operations that
characterize those features, and creating procedures that automate these operations. These
intelligence-intensive activities result, typically, after several of the "same kind of"
systems have been constructed. It is then, decided to isolate, encapsulate, and standardize
certain recurring operations. This is the very process of domain analysis: identifying and
structuring information for reusability.
A formal methodology and environment for domain analysis is needed. Unfortunately,
domain analysis is often conducted in an ad-hoc manner, and success stories are more the
exception than the rule. Typically, knowledge of a domain evolves naturally over time
until enough experience has been accumulated and several systems have been
implemented. Only then can generic abstractions be isolated and reused. This natural
domain evolution takes a long time while the demand for software applications increases
at a faster rate. There is a need, therefore, to accelerate the domain maturation process.
2- Survey of Domain Analysis Methods
Figure 1 below provides an historical perspective on the main domain analysis
developments. It starts in 1980 when Neighbors introduced the concept of domain
analysis as a key activity to enable reuse practice. The figure shows that several efforts
spawn from his original ideas. These efforts range from the highly practical CAMP
experience [Cam87] to the more theoretical work of Arango [Aran88]. The rectangle
labeled Raytheon represents the reuse program established by the Raytheon Missile
Division Corp. [Lane79]. The Raytheon experience is an often quoted success story of
institutionalized reuse and its success is attributed to a thorough analysis of their
application domain, business applications. They observed that 60% of all business
application designs and code were redundant and could be standardized and reused. A
reuse program was then established to analyze their existing software and to exploit
reuse.

_______________________________________________________________________

reuse inc.

Over 5000 production COBOL source programs were examined and classified. Three
major module classes were identified: edit, update, and report. They also discovered that
most business applications fall into one of three logic structures or design templates (i.e.,
domain architectures). These logic structures were standardized and a library was created
to make all classified components available for reuse. Several modules were also
redesigned to fit the standard logic structures. New applications became slight variations
of the standard logic structures and were built by assembling modules from the library.
Programmers were trained to use the library and to recognize when a logic structure
could be reused. The report quotes an average of 60% reused code in their new systems
and a net 50% increase in productivity over a period of six years.
The remaining efforts represented by bubbles in Figure 1 are discussed in detail below.

Figure 1- Domain Analysis Partial Time Chart

2.1- Historical perspective


The term domain analysis was first introduced by Neighbors [Neig80] as "the activity of
identifying the objects and operations of a class of similar systems in a particular problem
domain." During his research with Draco, a code generator system that works by
integrating reusable components, he pointed out that "the key to reusable software is
captured in domain analysis in that it stresses the reusability of analysis and design, not
code." Neighbors later introduced the concept of "domain analyst" [Neig84] as the person
responsible for conducting domain analysis. The domain analyst plays a central role in
developing reusable components. The Draco system was the first successful
demonstration of the feasibility of domain-specific reuse-based software development.

_______________________________________________________________________

reuse inc.

The Common Ada Missile Packages (CAMP) Project [Cam87] extended Neighbors'
ideas into larger systems. The CAMP Project is the first explicitly reported domain
analysis experience, and they acknowledge that "[domain analysis] is the most difficult
part of establishing a software reusability program". Neither Neighbors nor the CAMP
project address the issue of how to do domain analysis. Both focus on the outcome, not
on the process.
McCain [McCa85], from IBM Federal Systems Division, Houston, TX, made an initial
attempt at addressing this issue by integrating the concept of domain analysis into the
software development process. He proposed a "conventional product development
model" as the basis for a methodology to construct reusable components. The main
concern in this approach is how to identify, a priori, the areas of maximum reuse in a
software application. McCain developed his model into a standard practice within IBM.
Drawing in part from the above experiences, Prieto-Daz [Prie87] proposed a more
cohesive procedural model for domain analysis. This model is based on a methodology
for deriving specialized classification schemes in library science [Prie91b]. In deriving a
faceted classification scheme, the objective is to create and structure a controlled
vocabulary that is standard not only for classifying but also for describing titles in a
domain specific collection. This method was successfully applied at GTE Government
Systems. The Prieto-Daz method was later updated and revised for the STARS Reuse
Library Process Model [Prie91a, Prie91c]. This method is a substantial modification of
the earlier approach. The emphasis is on the analysis aspect, especially on knowledge
acquisition and knowledge structuring. This newer version of the Prieto-Daz method is
presented as a SADT model with potential for partial automation.
Synthesis is a software development method and support environment developed by the
Software Productivity Consortium (SPC). Synthesis is based on the concept of program
families [Parn76] and proposes the engineering of domains to enable application
generators. The Synthesis domain analysis process was first proposed in a report by
Jaworski [Jawo90]. It is based mainly on object oriented concepts [Coad89] with
emphasis on domain design and implementation. The report includes an example of
domain analysis on the SOCC (Satellite Operations Control Center) domain. The
example shows the products of domain analysis such as the SOCC domain definition, the
SOCC taxonomy, and the SOCC stabilities and variations, but falls short on explaining
the process to obtain those products.
More recently, the SEI has proposed the FODA (Feature Oriented Domain Analysis)
methodology [Kang90]. FODA adopts several concepts and recommendations from the
SPS report [Gilr89], and presents a comprehensive approach based on feature analysis.
The method is illustrated by a domain analysis of window management systems and
explains what are the outputs of domain analysis but remains vague about the process to
obtain them.
In the SPS (Software Productivity Solutions, Inc.) report, concepts from Prieto-Daz'
model were integrated with object oriented analysis techniques into a more complete
approach to domain analysis. The SPS report adds object orientation to the process of
creating a domain architecture, and to the creation of reusable components. The
suggested method remains very general about the analysis aspect, but very specific about
the creation of reusable Ada components. They conclude that knowledge acquisition,

_______________________________________________________________________

reuse inc.

knowledge-based guidance, data storage, retrieval, and environment integration are the
key factors for automating domain analysis.
IDeA, Intelligent Design Aid, is an experimental reuse based design environment
developed by MCC [Luba88]. It supports reuse of abstract software designs. IDeA
provides mechanisms that help users select and adapt design abstractions to solve their
software problems. IDeA and its successor ROSE-1, were created as proof-of-concept
tools to demonstrate reuse of high level software workproducts other than source code.
Arango [Ara88] focuses on the theoretical and methodological aspects of domain
analysis. He argues for explicit definitions of objectives, mechanisms, and performance
of a reuse system as a context for comparing and evaluating domain analysis. His view of
software reusability is that of a learning system where domain analysis is an ongoing
process of knowledge acquisition, concept formation, and concept validation. The
changing requirements syndrome in software development is seen as a natural learning
process, and resolved through an evolving infrastructure that receives its input from
domain analysis.
Domain analysis and domain modeling have become topics of significant interest in the
software engineering community. A recommendation from the 1987 Minnowbrook
Workshop on Software Reuse [AM88] suggested "concentrating on specific application
domains (as opposed to developing a general reusability environment)." Soon thereafter,
the Rocky Mountain Workshop on Software Reuse [RM87] acknowledged the lack of a
theoretical or methodological framework for domain analysis. More recent workshops
have addressed domain analysis and domain modeling directly [DA88, RP89, DM89].
The most recent was the Domain Modeling Workshop at the 13th ICSE, Austin, TX
[DM91] where several approaches to domain modeling and domain analysis were
presented.
Other related work includes tools that were originally designed for other purposes and
turned out to be supportive of domain analysis. In this category are Batory's Genesis
system for constructing database management systems [Bato88], CTA's KAPTUR
(Knowledge Acquisition for the Preservation of Tradeoffs and Understanding Rationales)
system for analyzing software systems [Bail91], AT&T's LaSSIE software information
system [Deva91], and MCC's DESIRE (Design Recovery) tool [Bigg89]. These tools
present a broad spectrum of techniques and approaches for automating certain aspects of
domain analysis.
2.2- Domain analysis methods
Several approaches to domain analysis have emerged in the last few years. Three have
been selected to illustrate the differences of objectives, methods, styles, and products.
Prieto-Daz Approach
The Prieto-Daz approach was developed for the STARS S increment as part of a model
for reuse libraries [Prie91a]. It is based on methods for deriving classification schemes in
library science and on methods for systems analysis. The process is a "sandwich"

_______________________________________________________________________

reuse inc.

approach where bottom-up activities are supported by the classification process and topdown activities by systems analysis.
The objective is to produce a domain model in the form of a generic architecture or
standard design for all systems or their instantiations in the domain. Such models provide
a common basis for writing requirements for new systems in the domain. In other words,
requirements for new systems are based on, or derived from, the domain model thus
insuring reuse at the design level. To guarantee such reuse, low level components must
act as building blocks for composing a skeleton design or architecture. This is
accomplished by the bottom-up identification and classification of low level common
functions and by standardizing their interfaces.
During the top-down stage, high level designs and requirements of current and new
systems are analyzed for commonality. The outcome includes a canonical structure
common to all systems in the domain, identification of stable and variable characteristics,
a generic functional model, and information on the interrelationships among the structure
elements. During bottom-up, low level requirements, source code, and documentation
from existing systems is analyzed to produce a preliminary vocabulary, a taxonomy, a
classification structure, and standard descriptors.
The outcomes of both approaches are then integrated into reusable structures. This
integration process consists of associating the products of the bottom-up analysis with the
structures derived by the top-down analysis. Standard descriptors, for example, represent
elemental components, either available or specified, by using a standard language and
vocabulary. Low level components for the generic architecture are defined with these
standard descriptors. The result is a natural match between high level generic models and
low level components where the domain models can be used as skeleton guides in the
construction of new applications.
FODA
Feature Oriented Domain Analysis (FODA) is a domain analysis methodology developed
by the Software Engineering Institute [Kang91]. The FODA method is based on
identifying features common to a class of systems. It is the product of studying and
evaluating several DA approaches. Although based mainly on Object Oriented
techniques, it borrows significantly from other approaches such as Prieto-Daz' faceted
approach, SPS'Ada based approach, and MCC's DESIRE design recovery tool.
The FODA method defines three basic activities: context analysis, domain modeling, and
architecture modeling. During context analysis, domain analysts interact with users and
domain experts to bound and scope the domain and to select sources of information.
Domain modeling produces a domain model in multiple views. The domain analyst
proposes the domain model to domain experts, users, and requirements analysts for
review. The resulting model includes four views: features model, entity-relationship
model, dataflow diagrams model, and state-transition model. A standard vocabulary is
also produced during domain modeling.
During architecture modeling, the domain analyst produces an architectural model that
consists of a process interaction model and a module structure chart. The objective of the

10

_______________________________________________________________________

reuse inc.

architectural model is to guide developers in the construction of applications and to


provide mappings from the domain models to the architecture.
The FODA report [Kang91] illustrates the process by using the window management
systems domain. The example shows in detail how each of the products of domain
analysis is derived. It includes: textual descriptions of domain definition and scope,
structure and context diagrams, an E-R model, feature models, functional models, and
domain terminology dictionary. The example also explains how to use functional models
for system simulation. The example uses the Statemate tool for the functional model.
IDeA
IDeA, Intelligent Design Aid, is an experimental reuse based design environment
developed by MCC [Luba88]. It supports reuse of abstract software designs. IDeA
provides mechanisms that help users select and adapt design abstractions to solve their
software problems. A domain analysis methodology was developed to reduce the effort
required to identify, select, and characterize designs for the IDeA library. The process is
divided into domain analysis and domain engineering. Domain analysis, as in Prieto-Daz
and FODA, deals with identification of operations, data objects, properties, and
abstractions but focuses on their application for designing solutions to problems in the
domain. A "problem solution" is then used to generate specific software designs by
applying domain engineering techniques.
The IDeA method consists of three major steps: analysis of similar problem solutions,
analysis of solutions in an application domain, and analysis of an abstract application
domain. There are specific heuristics for conducting each step. The objective is to
characterize generic solutions to common problems in a domain and to provide a
reasonable mapping between problems and solutions to make reuse practical. It uses
design schemes as mechanisms for mapping problems to solutions, and identifies
activities in other domains that are common or similar to the ones in the domain of
interest. This approach covers vertical (within a domain) and horizontal (across domains)
reuse. The first two steps are aimed at identifying vertical reusable components while the
goal of the third step is to find horizontal components.
2.3- Summary of main features
The essential features of the domain analysis methods discussed can be summarized in
four basic activities: acquire domain information, analyze and classify domain entities,
structure domain knowledge, and generate reusable structures. We analyzed the activities
of each method and grouped them into these four activities .
1- Acquire domain information Information acquisition activities include: study the
domain, describe and define the domain, prepare domain information for analysis,
and perform a preliminary high-level functional analysis of common domain
activities. One objective of the acquisition activities is to bound and scope the domain
and to provide specific information for estimating the cost and level of effort required
for performing domain analysis.

11

_______________________________________________________________________

reuse inc.

2- Analyze and classify domain entities The focus of this activity is to identify
specific low level functions and objects, or common features derived from legacy
systems, existing documentation, and future requirements. The objective is to
classify these entities into a standard framework. The framework may take the form
of a taxonomy, a semantic net, or a features model.
3- Structure domain knowledge The purpose of this activity is to associate common
functions to system components. A preliminary domain architecture is proposed to
define high level system components. This high level architecture is refined by
decomposing system components into more specific functions. The decomposition
(i.e., refinement) process is carried on by selecting common functions from the
classification or features framework.
4- Generate reusable structures Generating reusable structures is the process of
grouping common functions, attaching them to specific architectural components, and
generalizing these specific architectural components. The outcome are generic
reusable structures consisting of standard functions and standard interfaces. These
generic structures form a domain architecture where different implementations of
domain features are plug-compatible reusable components.
2.4- Discussion of key activities
Acquiring domain information is the central activity for domain analysis. Success of the
remaining activities depends on the quantity, relevance, and quality of the information
acquired. Any discussion of automating domain analysis must start with information
acquisition.
Analysis and classification of domain entities is usually a bottom-up process of
identifying and extracting information about specific functions mainly from current
applications. Classification includes abstraction and clustering to generate classes of
functions with common attributes. A top-down approach can also be used. When
conducted as a top-down process, systems specifications and future requirements are
analyzed to identify features common to all systems in the domain. Both bottom-up and
top-down approaches result in identification of common basic (i.e., primitive) functions.
Structuring domain knowledge into a domain architecture allows for a mapping of
common functions to system components and provides the basis for defining and
specifying reusable components. Generating reusable structures is a process of
encapsulating architecture components.
For the purpose of automating domain analysis, acquiring domain information and
analyzing and classifying that information are essential for developing domain
architectures. Current SEE (Software Engineering Environments) technology, to some
degree, support the implementation of an architecture (i.e., requirements) into reusable
components (i.e., code), but, support for acquiring and structuring domain information is
not yet available.
2.5- Potential for automation
There is a definite potential for automating parts of the domain analysis process. An
essential prerequisite to automation is a framework of properly structured activities. Such

12

_______________________________________________________________________

reuse inc.

a framework is provided by the STARS Reuse Library Process Model (SRLPM) method
for domain analysis. A key activity in the SRLPM is to prepare domain information and
one of the essential tasks for preparing domain information is knowledge acquisition. It
requires "reading" information from several inputs such as technical literature, existing
implementations, and current and future requirements.
Existing techniques in information retrieval can be used to automatically extract
information from these sources. In fact, experience in practicing domain analysis has
shown that knowledge extraction is a definite bottleneck in the process. Other proposed
domain analysis methods make the unrealistic assumption that knowledge and experience
are available and readily usable, giving the impression of a smooth and simple process.
Once we get through the knowledge acquisition step, domain analysis is a more tractable
problem. Our experience has been, however, that the initial stage in domain analysis
(acquiring and structuring knowledge) is the most difficult and time consuming.
To classify domain entities, for example, the SRLPM methodology prescribes keyword
extraction, concept grouping, and class definition. Existing tools and techniques from
information retrieval and object oriented design can be adopted and integrated to support
these steps. There are other very specific sub activities in the methodology, like thesaurus
construction, for which automated tools already exist. Extracting knowledge from experts
is much more complex, and requires human interaction such as interviews and group
meetings. There are, however, techniques and support tools for building expert systems
that can be adapted for this purpose.
In summary, this study explores and analyzes the feasibility of automating parts of the
domain analysis process under the framework of the STARS Reuse Library Process
Model, and proposes and specifies a domain analysis support environment that automates
parts of the domain analysis process.
3- Domain Analysis in the STARS Reuse Library Process Model (SRLPM)
A Domain Analysis Process Model was developed as part of the SRLPM [Prie91a]. It is
based on methods for deriving classification schemes in library science and on methods
for systems analysis. The process is a "sandwich" approach where bottom-up activities
are supported by the classification process and top-down activities by systems analysis.
The domain analysis process is divided in four activities :
1- Prepare domain information (A51)
2- Classify domain entities (A52)
3- Derive domain models (A53)
4- Expand and verify models and classification (A54)
Figure 2 shows a detailed SADT model of how these activities are related, their inputs,
controls, and outputs as well as their respective enabling mechanisms. The domain
models produced consist of several partial products including domain definition, domain
architecture, domain classification scheme, vocabulary, functional model, and reusable
structures. Inputs are information on recommended and related domains, and existing
(i.e., legacy) systems.

13

_______________________________________________________________________

14

reuse inc.

_______________________________________________________________________

reuse inc.

3.1- Primitive operations


Prepare domain information
The tasks in A51 (see figure 2) are to prepare the information needed for domain analysis
and to do a preliminary or first-cut analysis. The objectives are to define the domain, to
acquire the relevant domain knowledge, and to perform a preliminary high-level
functional analysis. The outputs are a definition of the domain, a basic domain
architecture, and specific domain knowledge as it relates to building software systems for
the domain.
The inputs to A51 include available knowledge from recommended and related domains,
and information from existing systems. Domain information includes concepts and theory
of domain specific systems normally available in text books, research articles, and
company reports. In the domain of flight simulators, for example, concepts and theory
include stability and control equations, performance equations, linear algebra
transformations, feed-back theory and equations, numerical analysis algorithms, and any
company specific techniques.
Related domains for flight simulators may include, for example, video interfaces, signal
processing, and flight control systems. Information from existing flight simulation
systems include requirements documents from previous and current systems, designs,
source code, and documentation.
The control inputs for A51, are domain analysis guidelines, company needs, and a
statement of purpose.
Company needs are stated in an assessment report addressing specific production,
budgetary, and market requirements. The statement of purpose states the cope and
objective of domain analysis. A purpose statement should answer questions like is the
purpose limited:
To help in domain understanding?
To include development of generic architectures?
To support building reusable components?
To support populating a reuse library?
To support the development of an integrated, reuse-based environment?
The statement of purpose determines the breath and depth of the domain analysis activity.
It guides the domain analysis team in discriminating domain information and in placing
the domain in its proper context and relevance.
Classify domain entities
Activity A52, classify domain entities, focuses on the bottom-up analysis. It produces a
standard vocabulary, a classification scheme, and a taxonomy of classes. The process is
similar to the one used in library science for deriving faceted classification schemes for
special collections. Keywords are extracted from input documents, requirement
statements, and source code. Classes and facets are identified through a conceptual

15

_______________________________________________________________________

reuse inc.

clustering exercise where common terms are grouped and abstracted. A basic scheme is
postulated and then expanded and verified. The final step is the construction of thesauri
for vocabulary control. Vocabulary control is achieved by grouping synonyms around a
single concept.
The inputs to A52 include specific domain knowledge in the form of functional
requirements, documentation, source code from existing systems, and feedback
information regarding unclassified entries. Unclassified entries are components that can
not be classified with the current classification scheme. This information is used to
update and expand the classification scheme.
The outputs are a faceted classification scheme and a basic taxonomy (a taxonomy can
also be seen as an inheritance structure with some entity-relationship model
characteristics like agregation and generalization). The classification scheme includes a
controlled vocabulary and facet definitions. Together, taxonomic classes and facets form
a classification structure.
The classification scheme generates classification templates in the form of standard
descriptors. These standard descriptors are the basic conceptual units that form the
interface between domain architectures and reusable components. Standard descriptors
are high level mini-specs for a class of components. In the UNIX tools domain, for
example, "locate/identifier/table" is a standard descriptor for a component identified by
the statement "Locate line identifiers in data table", The terms "locate", "identifier", and
"table" represent concepts in a controlled vocabulary. Standard descriptors can also be
represented graphically as E-R models or semantic nets, thus facilitating component
encapsulation and parameterization.
The control inputs to A52 are domain definition and domain architecture. Both support
conceptual analysis. The domain definition, for example, includes global requirements
statements used to select keywords for the controlled vocabulary. The mechanism is the
domain analysis team. In its minimal form, it consists of a domain analyst, a domain
expert, and a librarian.
Derive domain models
Activity A53, derive domain models, consolidates the top-down analysis with the
bottom-up approach. The objective here is to produce a generic functional architecture or
model using functional decomposition as practiced in software systems design. The top
level in this decomposition is the preliminary architecture derived in A51 above. The
resulting functional model serves as a structure to consolidate the standard descriptors
from A52. The idea is to describe or specify low level functions using standard
descriptors from the controlled vocabulary, and to associate them with architectural
components.
The results are layers of functional clusters associated with architecture elements. The
core activity in A53 thus, is to assign these functional clusters to architecture units, and to
define their relationships. What results is a model that supports design and development
of new systems by composing reusable components. The output is the generic functional
model.
The inputs to A53 include:

16

_______________________________________________________________________

reuse inc.

1) The classification structure from A52 including vocabulary, classes, and standard
descriptors,
2) Specific domain knowledge in the form of global requirements and system
commonality information, (from A51),
3) Requirements from existing systems, and
4) Feedback information to update and refine the model. This last input is in the form of
earlier versions of the model (labeled incomplete models in diagram A5).
The control inputs are the domain definition and architecture produced by A51. The basic
architecture is used as a reference for the top-down decomposition. The mechanism is the
domain analysis team. In this case the analyst and an expert are the minimum required.
Expand and verify models and classification
Activity A54 expands and verifies domain models and the classification structure. The
objective in A54 is to update the products of domain analysis as new information from
current and future systems becomes available.
Activity A54 illustrates the continuing nature of the domain analysis process. All
products of domain analysis are reviewed continuously and remain in a permanent state
of evolution. The question of when a domain analysis is complete is still a research
question and is not discussed here. For the sake of practicality, any outcome of domain
analysis, as discussed in the SRLPM document, is considered usable. The library process
model assumes an implicit feedback loop for all its activities and a reviewing process for
all its outputs.
New requirements, vocabulary, functional components, and limitations and constraints
are extracted from existing and new systems. The classification structure and the
functional model are updated to accommodate them. The models are then verified against
existing systems. Specific designs and requirements from existing and future systems are
checked to see if they are represented by the generic model. That is, to check if the model
includes all expected instances of systems in the domain.
The output of this activity are reusable structures. Reusable structures are parts of the
generic functional model or parts of the classification structure that have been verified
and are complete enough to be reusable. These subsets of the domain models are
encapsulated and included in the customized library system to drive the construction
process. A reusable structure can be as simple as a standard descriptor (i.e., requirement
statement) for a class of functions or as elaborate as an architecture for a class of systems.
An example of the latter is the architecture for a general compiler; scanner which
includes a lexical analyzer, syntax analyzer, semantic analyzer, code generator, and
symbol table handler.
The inputs to A54 are the generic functional model from A53 and any new information
from current and future systems. The control inputs include domain definition and
architecture from A51, the classification structure from A52, and abstractions of the
generic functional model from A53. These abstractions are used to help identify reusable
structures. The mechanism is the domain analysis team.

17

_______________________________________________________________________

reuse inc.

3.2- Overview of domain analysis activities


A detailed overview of domain analysis activities is shown in Table 1 below. Each
indentation level corresponds to a decomposition diagram in the SRLPM document. The
four activities described in figure 2 above are decomposed to their lowest levels to
identify specific tasks.
Table 1- Specific Domain Analysis Tasks in the SRLPM
Abstract and classify
- Group terms
- Give names to clusters
- Arrange by facets
- Arrange by hierarchy
- Define standard classification
templates
-- Consult STARS standards
-- Check conflicts/duplication with
other libraries
Expand basic classification
- Refine meanings
- Integrate new classes and terms
- Group unclassified terms
- Give names to new clusters
- Define new templates
Construct thesauri
- Find internal synonyms
- Add external synonyms
- Form thesaurus entries
- Verify entries

Analyze domain

Prepare domain information (A51)


Define domain
- Select relevant information
- Bound domain
- Establish global requirements
- Verify and validate definition
Obtain domain knowledge
- Select sources of information
- Extract domain knowledge
-- Read
-- Consult
-- Study
-- Learn
- Review acquired domain information
-- Discuss
-- Evaluate
-- Integrate
-- Consolidate
Do high level functional analysis (topdown)
- Identify major functional units
- Find interrelationships
- Specify generic subsystems
- Classify subsystems
-- Analyze common features
-- Group and classify
- Select graphic representation
method

Derive domain models (consolidate topdown & bottom-up) (A53)


- Group descriptors/classes under
functional units
- Review domain models (refine initial
functional decomposition)
- Discover/define new functional units
- Rearrange structure (result: generic
functional model)
- Select model representations

Classify domain entities (bottom-up) (A52)


Identify objects and operations
- Analyze concepts
- Analyze requirements
- Extract component descriptors
- Inspect documentation
- Decompose statements by keywords

Expand models and classification (A54)


- Apply models to application
- Identify inconsistencies
- Update models and classification
- Define reusable structures

3.3- Selected activities in the SRLPM which can be automated


There are several activities in the SRLPM model that can be automated. Most of them
fall in the category of basic and indispensable tasks for domain analysis, as follows.

18

_______________________________________________________________________

reuse inc.

Extract domain knowledge Knowledge extraction from text documents can be done
automatically using of-the-shelf information retrieval tools. Knowledge extraction
from experts requires interviewing and questioning, but their written responses can be
processed automatically.
Identify major functional units Reverse engineering tools, specifically code
restructuring and requirements analysis tools, can be used to identify major functional
units from legacy systems.
Find interrelationships Relationships among components and major functional
units can also be identified by using revere engineering tools. Tools that produce call
structures and cross referencing information are useful for this task.
Specify generic subsystems The process of identifying generic subsystems within
specific system structures or designs can be assisted with program similarity analysis
tools.
Classify subsystems Subsystem classification can be assisted with the same kind of
tools used to find interrelationships.
Identify objects and operations This task can be automated with information
retrieval tools.
Abstract and classify Conceptual clustering tools and AI knowledge representation
techniques can be used to assist in this task.
Construct thesauri There are off-the-shelf tools to help construct thesauri.
Group descriptors/classes under functional units Conceptual clustering tools can
also be used to assist in this task.
Rearrange structure Architecture revision can be done semiautomatically with
reverse engineering tools.
Define reusable structures Reusable structures are refinements of previous domain
analysis models. A combination of reverse engineering, information retrieval, AI, and
conceptual clustering tools can be used to assist in this task.

4- Underlying Technologies for Automating Domain Analysis


The automation of domain analysis will rely on several basic technologies: information
storage and retrieval (IR), artificial intelligence (AI), primarily the sub fields of AI
concerned with knowledge acquisition and representation, and static and dynamic
analysis tools for code. In this section, we review these basic technologies.

4.1- Information retrieval systems


IR systems are used to automatically index and manage large amounts of documentation.
An IR system (see [FB92]) matches user queries formal statements of information

19

_______________________________________________________________________

reuse inc.

needs to documents stored in a database. A document is a data object, usually textual,


though it may also contain other types of data such as photographs, graphs, etc. An IR
system must support certain basic operations. There must be a way to enter documents
into a database, change the documents, and delete them. There must also be some way to
search for documents, and present them to a user. IR systems vary greatly in the ways
they accomplish these tasks.
Table 2 is a faceted classification of IR systems, containing important IR concepts and
vocabulary. The first row of the table specifies the facets that is, the attributes that IR
systems share. Facets represent the parts of IR systems that will tend to be constant from
system to system. For example, all IR systems must have a database structure they
vary in the database structures they have; some have inverted file structures, some have
flat file structures, and so on. Full explanations of these terms can be found in [FB92].
Terms within a facet are not mutually exclusive, and more than one term from a facet can
be used for a given system. Some decisions constrain others. If one chooses a Boolean
conceptual model for example, then one must choose a parse method for queries.

Table 2: Faceted Classification of IR Systems


Conceptual
Model

File
Structure

Query
Term
Document Hardware
Operations Operations Operations

Boolean
Extended Boolean
Probabilistic
String Search
Vector Space

Flat File
Inverted File
Signature
Pat Trees
Graphs
Hashing

Feedback
Parse
Boolean
Cluster

Stem
Weight
Thesaurus
Stoplist
Truncation

Parse
VonNeumann
Display
Parallel
Cluster
IR-Specific
Rank
Optical Disk
Sort
Mag. Disk
Field Mask
Assign ID's
Viewed another way, each facet is a design decision point in developing the architecture
for an IR system. The system designer must choose, for each facet, from the alternative
terms for that facet. A given IR system can be classified by the facets and facet values,
called terms, that it has. For example, the CATALOG system [Frak84] can be classified
as shown in Table 3:
Table 3: Facets and Terms for CATALOG IR System
Facets
File Structure
Query Operations
Term Operations
Hardware
Document Operations
Conceptual Model

Terms
Inverted file
Parse, Boolean
Stem, Stoplist, Truncation
VonNeumann, Mag. Disk
parse, display, sort, field mask, assign ID's
Boolean

20

_______________________________________________________________________

reuse inc.

IR systems are capable of automatically extracting important vocabulary from text and
using it to index documents, in this case reusable software components. Frakes and
Nejmeh [Frak88] first proposed using IR systems to classify and store reusable software
components. They discussed the use of Catalog for this purpose, and defined the types of
indexing fields that might be useful Since then, several other uses of IR systems as reuse
libraries have been reported (see [Frak90] for a review). One such system of special
interest is GURU [Maar91]. Guru uses simple phrase extraction techniques to
automatically derive two word phrases from text. Both individual keywords and phrases
composed of those keywords may be useful for identifying domain vocabulary and
concepts in DARE.
In terms of Table 3, the key operations that will be needed for automatic vocabulary and
concept identification are text parsing, stoplist operations, stemming, and truncation.
Text parsing involves breaking the text into its component keywords. Stemming is a
process of removing prefixes and suffixes from words so that related words can be
grouped together. Stemming, for example, is capable of conflating variants such as
domain and domains into a single concept. Truncation is manual stemming. Truncation
will be a useful feature in the search portion of DARE, since it will help users search
using related keywords.
4.2- Artificial intelligence
Artificial intelligence (AI) the use of computers to do tasks that previously required
human intelligence is a broad field with an immense literature. Of special interest to
reuse and domain analysis are the AI sub fields of knowledge extraction/acquisition and
knowledge representation.
All AI systems are constrained by the amount and quality of the knowledge they contain.
Builders of AI systems have found that the so called knowledge acquisition barrier is
usually the most difficult problem they must solve in building successful AI systems.
Most knowledge acquisition techniques are manual and rely on various interviewing
techniques There are also some automatic techniques based on machine learning.
[Hart86] and [Kidd87] provide a good summary of knowledge acquisition techniques.
One technique for eliciting knowledge, for example, is to ask the same question in
different ways. Say, for example, that an expert is asked to identify important sub
domains, but is unable to do so. The interviewer might then ask him how the
organization is structured, recognizing that organizations are often structured along
domain specific lines.
Once knowledge has been acquired, it must be represented in a form that the machine can
use to do useful work. Many knowledge representation techniques have been proposed.
Some of the more popular are production rules, frames, and semantic nets. All of these
techniques have been used to represent reusable software components (see [Frak90] for a
review).
A semantic net is a directed graph whose nodes correspond to conceptual objects and
whose arcs correspond to relationships between those objects. Production rules are
perhaps the best known of knowledge representation formalisms because of their use in

21

_______________________________________________________________________

reuse inc.

many expert system shells. Production rules might be used to classify reusable
components based on attribute value pairs as follows.
IF algorithm needed IS a sort
AND sort speed required IS fastest
AND implementation language IS C
THEN sort to use IS quicksort.c
Frames are data structures, composed of slots and fillers, used for knowledge
representation. For example,
Sort
AKO
:algorithm
operation
:ordering
operands
:data objects
The slots here are in the left hand column, and the fillers in the right following the
colons. Sort is a special slot which names the frame. AKO, which stands for a kind of
is commonly used in frame representations. While the knowledge in frames can be
accessed and used in many kinds of inferencing, the inferencing technique usually
associated with frames is inheritance. In inheritance, one frame inherits slots, and
optionally fillers, from another.
Two useful factors to consider when evaluating knowledge representations are
representational adequacy and heuristic power Representational adequacy refers to how
much one can express with the representation. A simple list of keywords, for example,
has poor representational adequacy because the syntactic and semantic relationships
between the keywords is missing. Heuristic power refers to the kinds of inferencing one
can do with the representation. Logical inference, for example, is a powerful type of
processing only possible with some representations. One appeal of the knowledge based
approach to reuse representation is that the representations offer a powerful way of
expressing the relationships between system components. This is probably extremely
important for helping a user understand the function of components. It may be, for
example, that information of the form, component transforms input A to output B under
condition X will be important for expressing knowledge about a code domain.
[Deva91] have used frames to represent software components from System 75, a large
switching system consisting of about 1 million lines of C code. Their reuse system, called
Lassie, attempts to support multiple views of System 75: a functional view which
describes what components are doing, an architectural view of the hardware and software
configuration, a feature view that relates basic system functions to features such as call
forwarding, and a code view which captures how code components relate to each other.
Their taxonomy is based on four categories: object, action, doer, and state. For example,
a frame using this taxonomy might describe an object called a user-connect-action which
is both a network-action and a call-control-action having a generic process as an actor,
that attempts to move from a call-state to a talking-state by using a bus-controller. One
interesting aspect of such a scheme is the way it allows the relationships among the
various conceptual parts of a system to be made explicit. In addition to the domain
specific information about System 75, Lassie also stores information about the

22

_______________________________________________________________________

reuse inc.

environment (UNIX and C) used to develop the system. Lassie also uses a natural
language interface as part of its query facility.
Systems such as Lassie demonstrated that AI can be used to support reuse and domain
analysis, but also again showed the problems associated with knowledge acquisition for
such systems. Lassie's authors managed to represent only a small part of System 75, and
most of that had to be done manually. Practical problems of getting enough of the
System 75 engineer's time to get the knowledge and validate the results was also a
problem.
4.3- Code static and dynamic analysis tools
One important kind of knowledge about software systems is derived by static and
dynamic analysis of code. Static analysis tools analyze code before execution, and
provide information about program structure and attributes. Dynamic analysis tools are
used to monitor the runtime behavior and performance of code. There are many such
tools available for various languages and programming environments. We will use the
Unix/C environment for purposes of our discussion. See [FFN91] for a fuller discussion
of this topic, and the tools that follow.
Cf and cflow produce C system function hierarchies. Such information takes the form
function1
function2
function3
....
function-n
which says that function1 calls function2 which calls function3 and so on. This
information can be used for a variety of purposes including identifying potentially
reusable components, and calculation of reuse metrics [Frak92].
Another important class of static analysis tools compute software metrics, i.e.
quantitative indicators of software attributes. Many such metrics have been reported in
the literature [CDS86]. Many of these measure software complexity. ccount, for example,
computes simple metrics such as NCSL and CSL and their ratios.
Another important source of static information is make. Information from the make utility
can be used to determine structure at the file level.
Cscope parses C code and builds a cross reference table that allows the following kinds
of information to be reported.
List references to this C symbol:
List functions called by this function:
List functions calling this function:
List lines containing this text string:
List file names containing this text string:
List files #including this file:

23

_______________________________________________________________________

reuse inc.

An even more powerful static analysis tool is CIA [CNR90], a tool that extracts from C
source code information about functions, files, data types, macros, and global variables,
and stores this information in a relational database. This information can then be accessed
via CIAs reporting tools, by awk, or by other database tools. One type of information,
for example, that CIA captures is the calling relationship among functions. CIAs
reporting tools can then be used, with other graphics tools, to generate a graphical
representation of the information in the database. Such graphical representations can be
used as preliminary domain architectures during domain analysis.
Some of the types of information CIA output might be used to derive are:
Software metrics -

CIA can be used to compute many software metrics. Since C


program objects are explicitly stored, it is obviously possible to
count them. More sophisticated metrics can also be generated. A
measure of the coupling between two functions can be calculated,
for example, by counting the number of program objects jointly
referenced by the two functions.

Program version comparisons - Two versions of a program can be compared by looking


at differences in the CIA databases for the versions.
This comparison can reveal declarations created,
deleted, or modified, and changes in relationships
among program objects. This is different than the
UNIX system diff command which only compares
lines.
Reuse - The information CIA produces about which functions are most used by other
functions could be used to identify reusable components.
Dynamic analysis tools are used to investigate the runtime behavior of software. They are
used to find and remove bugs, and measure the execution speed of programs and program
components. Such measurement is used to see if the component meet requirements,
and/or if the components need to be optimized. Several tools are available in the UNIX/C
environment for measuring execution speed.
The time tool reports on program CPU usage. For example the command
time who
will produce as output:
real

0m2.91s

user

0m0.25s

sys

0m0.30s

This information shows that 2.91 milliseconds of elapsed clock time took place during
execution of the who command, 0.25 milliseconds of time was spent in the who program,
and it took 0.30 milliseconds for the kernel to execute the command.
The prof utility can be used to measure the time each function in the system takes to
execute. When code is compiled in this way, a file called mon.out is generated during

24

_______________________________________________________________________

reuse inc.

execution. This file contains data correlated with the object file and readable by prof to
produce a report of the time consumed by individual functions in the program.
When run on an example program, prof produced the following report:
%Time
50.0
50.0
0.0
0.0

Seconds
Cumsecs
0.02 0.02
1
0.02 0.03
8
0.00
0.03
0.00 0.03
1

#Calls

msec/call
17.
2.

Name
_read
_write

0.
0.

_monitor
_creat

This report shows that the function read was called once, and this call took 17
milliseconds, about 50% of the total execution time for the program. The functions
monitor and creat show zero execution times because they used amounts of time too
small for prof to measure. Such information might be used in the analysis of real time
domains where execution efficiency is critical. Designers of reusable components have
reported that component efficiency is a primary factor in their acceptance by users. Thus,
dynamic analysis tools will play a key role in a reuse and domain analysis environment.
4.4- Interface environments
Environments for building high quality bit mapped interfaces have proliferated in the past
few years. NeWs and X based environments are the most common, with X based
environments becoming the standard. The X environment is, in fact, a good example of
successful horizontal reuse. Many powerful tools have been written on top of X,
including higher level toolsets such as Motif, and interface generators such as TAE from
NASA. These tools make it relatively easy to develop high quality window based
environments with graphics.
A high quality interface will be very important for DARE since so much data input and
manipulation will be required. One of the key challenges in developing DARE is to
identify good interface strategies and data representations. We will probably use an X
based interface development environment because of its power and portability across
platforms.
4.5- CASE tools
A CASE (computer aided software engineering) toolset supports the activities of
software engineering through analysis, software views, and repositories. The UNIX
programmers workbench (PWB), containing tools such as cflow and CIA, is an example
of a set of coordinated tools of this type. There are also many commercial CASE tools
on the market. Some that we may consider for DARE are Cadre's Teamwork, Interactive
Development Environments STP (software through pictures), and Softbench from
Hewlett-Packard. These tools, as does UNIX PWB, provide support for reverse
engineering software, which will provide a good source of knowledge about a domain.

25

_______________________________________________________________________

reuse inc.

5- A Domain Analysis and Reuse Environment (DARE)


A domain analysis and reuse environment is presented in this section. The objective of
DARE is to support partial automation of the domain analysis process. The focus in on
knowledge acquisition and knowledge structuring activities and it is based on currently
available tools from IR, AI, and reverse engineering.
5.1- DARE architecture
A high level complete DARE architecture is shown in Figure 3. It consists of a user
interface, a domain analysis support environment, a software reuse library, and an
environment to support software construction by composition. This Phase I study has
focused on the domain analysis support component of the architecture and its interface to
the reuse library and other support tools.
One of the most difficult components is the user interface. The user interface must
provide support to a diversity of users: domain analysts, domain experts, systems
designers, software engineers, and librarians. Each user should be able to interact with
their specific tools through a uniform and standard look and feel. The reuse library is a
common repository of domain specific software assets. Assets are produced by the
domain analysis support side and consumed by the software construction support side.
Software assets include domain specific architectures, generic designs, requirements,
code components, test scripts, and any other software work product with potential for
reusability. Special effort will be made to use existing STARS reuse libraries as part of
the DARE library component.

26

_______________________________________________________________________

DOMAIN
ANALYSIS
SUPPORT
domain
analyst

reuse inc.

interfaces

domain
expert

systems
designer

REUSE
LIBRARY

COTS &
SPECIAL
TOOLS
support

software
engineer
librarian
SOFTWARE
CONSTRUCTION
SUPPORT
Figure 3- High Level DARE Architecture

The domain analysis support part of DARE supports specific domain analysis activities.
A common interface is proposed to integrate existing and new tools. The outcome of
these tools are tangible domain analysis products such as domain taxonomies, domain
vocabularies, systems architectures, standard designs, software specifications, reusable
code components, and specifications for new components.
The software construction support would enable users to select library assets for building
new systems. A domain architecture, for example, which serves as a framework to search
the library, could be used to select components explicitly designed to fit parts of the
architecture. Similarly, standard designs could be used for selecting the appropriate
reusable code components. The interface could also allow other environments to use
DARE's facilities.
DARE is not intended to support tasks already covered by existing software development
environments. Such tasks include code development (compiling, editing, debugging),
project management support, system maintenance, etc. DARE will provide support to
systems and software designers in selecting reusable components. The actual reuse-based
construction and development of new systems would be conducted in their respective
environments.

27

_______________________________________________________________________

reuse inc.

5.2- DARE supported domain analysis process


We have tailored the domain analysis process of section three (SRLPM) to meet the
DARE requirements. This customized process pays special attention to the enabling tools
and focuses on the early stages of domain analysis. The objective in this section is to
describe through a set of SADT diagrams the key activities that DARE will support and
to identify the specific tools that can be used to implement these activities. Figure 4
shows the context SADT model. The viewpoint is that of the DARE architect or
developer (i.e., those interested in developing a DARE environment).
There are five inputs to the process. The first two, domain knowledge and domain
experience include, general and sometimes informal and undocumented, knowledge
about the domain. This knowledge is used primarily for defining and scoping the domain.
Although the DARE environment will not be intended to support definition and scoping
of a domain directly, it will indirectly assist analysts and experts to refine their initial
definition. The other three inputs, existing systems, domain related documents, and
expert knowledge, are the core inputs to DARE. DARE will rely on written
documentation as the source for knowledge acquisition and use text analysis techniques
for abstracting and structuring domain knowledge.
The domain analysis process is guided (controlled in SADT terminology) by organization
objectives and a reuse strategy. Organization objectives may include understanding the
domain, developing a generic domain architecture, creating reusable components, or even
developing an application generator for the domain. A reuse strategy determines the road
map to follow to accomplish the organizational objectives. One reuse strategy, for
example, may be to do domain analysis incrementally and to assess its benefits before
advancing to the next step.
The outputs include tangible products: a domain definition, recorded domain knowledge,
domain structures, and domain models. A domain definition is a written document
specifying the domain. Recorded domain knowledge consists of domain knowledge that
has been captured and registered in a database and made available for analysis. Domain
structures are the products of structuring recorded domain knowledge through a process
of recurrent abstraction, classification, and clustering. Domain models are common
domain structures made reusable through a commonality analysis process.
Enabling agents (mechanisms, in SADT terminology) are the domain analysts, the
domain experts, and the DARE support environment. DARE will consist of an integrated
collection of tools providing automated, semi-automated, and interactive support for
domain analysis activities.

28

_______________________________________________________________________

Reuse
Strategy

Organization
Objectives

Domain Knowledge
Domain Experience
Existing Systems

Do
DARE Supported
Domain Analysis

Domain Related Documents


A0

Expert Knowledge

Domain
Experts

reuse inc.

DARE

Domain Definition
Recorded Domain Knowledge
Domain Structures
Domain Models

Domain
Analysts

PURPOSE: To illustrate a practical domain analysis process with potential for automation
and to identify the kinds of tools required for each process stage.
VIEWPOINT: DARE Architect/Developer

Figure 4- Context SADT Model for DARE Supported Domain Analysis

Figure 5 shows the SADT level A0 decomposition. It consists of five main activities:
A1- Define Domain
A2- Acquire Domain Knowledge
A3- Structure Domain Knowledge
A4- Identify Commonalities
A5- Generate Domain Models
A2 through A5 are the activities that will be supported by DARE. A1, although a
necessary step, is assumed to be conducted outside the context of DARE. The output of
Define Domain is a domain definition and it controls, together with a reuse strategy,
the knowledge acquisition process.
Acquire Domain Knowledge will be supported by knowledge acquisition tools. Some
of these tools are fully automatic such as scanners, compilers, reuse libraries, and editors
or text processors, while others are interactive and semi-automatic like questionnaire
templates and interview guidelines. Knowledge is acquired from the three main inputs:
29

_______________________________________________________________________

reuse inc.

existing systems (i.e., source code), domain related documents, and experts. Knowledge
from documents will be extracted automatically while knowledge from experts will be
converted semiautomatically to text form first, using interviews and questionnaires.
Source code from existing systems will be selected manually based on quality,
documentation, and relevance, and then re-structured using reengineering tools. Not all
source code will be selected. Domain analysts and domain experts are essential support
agents. The output is recorded domain knowledge which includes: scanned documents,
answered questionnaires, recorded interviews, surveys, and processable source code.
The objective in structuring domain knowledge is to create domain structures suitable for
commonality analysis. Such structures include: faceted classification, domain
vocabulary, high level functional descriptions, design rationale charts, SADT diagrams,
systems code structures, data dictionaries, survey reports, and knowledge structures.
Tools that support this process include: lexical analyzers, keyword filters, indexing
support, numeric and conceptual clustering, thesaurus construction, reverse engineering,
statistical analysis, and tools that support semantic net construction and production rule
development. Most structuring activities will be done automatically. Structuring expert
knowledge, however, will be conducted semiautomatically.
Domain structures and recorded domain knowledge are used to identify commonalities
using conceptual clustering techniques, reverse engineering, and code similarity detection
tools. Commonality analysis is a highly interactive activity that requires easy and
effective access to all recorded domain knowledge and all domain structures. Both
activities: identify commonalities (A4) and generate domain models (A5), must be
conducted concurrently, and are connected with a continuous feedback link. Interactive
support through a common interface is essential for conducting these activities.
The final outputs are domain models. Domain models provide the basis for designing and
implementing reusable components, for providing requirements standards, and for
supporting reuse-based development. Domain models are in a continuous evolution
process, and are fed back to A2 for refinement. The domain models produced through
DARE will be concise, well defined, and practical. These include: a common vocabulary,
a common architecture, a classification scheme, and functional specifications for reusable
components.

30

_______________________________________________________________________

31

reuse inc.

_______________________________________________________________________

reuse inc.

Acquire domain knowledge (A2)


Acquire domain knowledge is decomposed into four activities (see figure 6):
A21- Extract Information from Documents
A22- Extract Information from Experts
A23- Extract Information from Existing Systems
A24- Enter into Domain Database
The first three activities focus on extracting information from three different sources of
domain knowledge. Extracting information from documents will be conducted
automatically using scanners and text processors. The output will be scanned machine
readable text. Extracting information from experts involves interviewing experts and
answering questionnaires. The process can be automated significantly by designing
standard interview protocols and questionnaire templates and presenting them to the
experts through a common interface [Fick88]. Answered questionnaires and interviews
will be recorded automatically. Extracting information from existing systems will be
done through a combination of scanning, browsing, compiling, and restructuring selected
source code. System documentation will also be scanned as in A21. All extracted
information will be entered into a domain database and made available for structuring
and analysis.
Structure domain knowledge (A3)
Structure domain knowledge is decomposed into three activities as shown in figure 7:
A31- Structure Scanned Documents
A32- Reengineer Selected Systems
A33- Structure Knowledge from Experts
Again, each activity corresponds to each of the three main inputs: scanned text from
domain related documents, source code and functional specifications from existing
systems, and expert knowledge extracted from questionnaires and interviews.

32

_______________________________________________________________________

33

reuse inc.

_______________________________________________________________________

reuse inc.

The output of structuring scanned documents are conceptual structures. The objective is
to extract concepts from text and organize them into structures that relate those concepts
to each other and to domain entities. These conceptual structures convey domain
understanding and support the creation of domain models. The conceptual structures
produced by this activity will include: a preliminary faceted classification, a domain
vocabulary, functional descriptions of domain components, and a domain taxonomy. The
production of these structures will be automatic. The tools to be used include: lexical
analyzers, filters (i.e., stop lists), indexing tools, clustering tools, and thesaurus
construction tools. Most of these tools are readily available from commercial information
retrieval systems.
The focus in reengineering selected systems (A32) is in recovering the original structures
of these systems through reverse engineering techniques. Source code is structured and
reverse engineered to produce designs. Functional specifications are used to verify and
validate the resulting designs and to support the creation of systems architectures. The
outputs of this activity are implementation structures that include: code structures (i.e.,
structured code), systems designs, specific architectures, data dictionaries, ER diagrams,
and data structures. These outputs will be generated automatically with off-the-shelf
reengineering and reverse engineering tools.
Structuring knowledge from experts is an interactive process aimed at producing
knowledge structures such as semantic nets, ER diagrams, behavior models, design
rationales, and production rules. Textual input from answered questionnaires and expert
interviews is processed interactively by domain analysts using knowledge structuring
tools. Knowledge structuring tools include semantic net construction tools, conceptual
clustering support tools, statistical analysis tools, and integrated tools such as AVIEN
(see section 5.4).
Coordinated control and direction to structuring knowledge from experts is provided by
conceptual and implementation structures and by the domain specification. All outputs
from these three activities are used by A4 to identify commonalities.

34

_______________________________________________________________________

35

reuse inc.

_______________________________________________________________________

reuse inc.

Identify commonalities (A4)


Identify commonalities is also an interactive process. The domain analyst compares all
domain structures from A3 to look for similarities and common features. The process will
be supported by some of the structuring tools plus commonality analysis tools.
Commonality is explored within each class of structures first. Reverse engineered
designs, for example, are compared with each other first, then they are compared with
specific architectures. Similarly, faceted classification terms are analyzed for similarity
first, and then compared to the taxonomy and to the vocabulary. After commonalities
within each group are identified, all structures from all groups are compared. The
objective is to identify all possible commonalities at the conceptual structure level. The
key to this activity is to have all information available through a single interface and
linked by a common database. This process will be conducted concurrently with the
generation of domain models .
Generate Domain Models (A5)
The process of generating domain models consists of grouping the commonalities
identified in A4 into revised domain structures. Domain models have similar format and
presentation as the domain structures of A4 but are made generic and include domain
variabilities. The domain models that will be produced with DARE support include a
common domain vocabulary, a common domain architecture, a classification scheme,
and functional specifications for reusable components. One of the purposes of the domain
models is to provide a common base for standardization not only of common domain
elements, but, of reuse-based software development processes. A common vocabulary,
for example, provides a standard terminology that can be used for specifying new
systems. A common architecture, also, may provide a standard skeleton for building new
applications. Domain models are the final outputs of domain analysis.
5.3- DARE functional model
The DARE functional model in figure 8 shows the major activities and data used in the
DARE process. The model shows that there are three major types of information inputs
to DARE document sources, expert sources, and code sources. Document sources will
be automatically analyzed into keywords and phrases using automated techniques from
IR. These keywords and phrases will be conceptually clustered into a faceted
classification of the domain. Expert sources will be dealt with manually, using various
knowledge extraction techniques. Code sources will be automatically analyzed using
reverse engineering and metrics tools. Outputs from these processes will be input to a
process of commonality analysis, that is identifying what is constant across systems in
the domain. The commonality analysis process may be partially automated through the
use of plagiarism detection tools. The major outputs are a domain model and a set of
reusable components in a reuse library.

36

_______________________________________________________________________

37

reuse inc.

_______________________________________________________________________

reuse inc.

5.4- Architecture components


Document analysis tools
The document analysis tools in DARE will automatically extract semantically important
words and phrases from documents in the domain as described in 4.1. Automated text
scanners can be used to input the text into machine readable form. The first tool will be a
text lexical analyzer that will break text into its constituent words. This tool must work
in conjunction with a text context tool that will keep track of document numbers, text
field identifiers, and sentence and paragraph context information. The next tools will be
text filtering tools. One of these will be a stoplist comparison tool capable of comparing
words from a document with words in a stoplist, and excluding from the indexing list any
words found in the stoplist. A stemming tool will be used to relate morphologically
similar words by reducing similar words to a common stem. The frequency analysis tool
will count the occurrences of words and stems in the document and in the database.
Indexing tools will take information extracted in the first phases and build indexes. One
type of index will be an inverted file which relates indexing information to the
documents from which the information was extracted. An inverted file is a kind of
indexed file. The structure of an inverted file entry is usually keyword, document-ID,
field-ID. A keyword is an indexing term that describes the document, document-ID is a
unique identifier for a document, and field-ID is a unique name that indicates from which
field in the document the keyword came. Some systems also include information about
the paragraph and sentence location where the term occurs.
Lexical affinities, i.e. the relationships between words, will be automatically derived
using phrase extraction techniques. Lexical affinities are based, in this case, on the cooccurrence of words in a text. For a given word in a text, w, the words in its
neighborhood (+,- 5 words) are examined and 2 word affinities (w,w') are constructed
together with information about the frequency of their occurrence. The resolving power
of the affinities is calculated by the formula pd(w,w') = Pd x INFO(w,w') where Pd is the
frequency of the lexical affinities in a given document D, and INFO is the quantity of the
information of the two words in the entire document collection. Z scores are then
computed on the affinities, and the most representative are selected. Z scores are
statistical measures of variance around a mean.
Term clustering tools can be used to automatically group semantically related terms
based on their co-occurrences in documents and databases. This information can then be
used to produce initial groupings for faceted classification and domain analysis.
Domain expert knowledge extraction tools
Knowledge extraction from experts is a human intensive process (see section 4.2). The
difficulty of knowledge extraction has been one of the primary impediments to the
implementation of large scale AI systems. To be useful, DARE will need to incorporate
expert knowledge about domains. This will be extracted using questionnaires, and from
analysis of software systems in the domain. Such questionnaires will ask about domain
suitability, size, structure, and key concepts. Knowledge derived from the questionnaires
38

_______________________________________________________________________

reuse inc.

will be represented using knowledge representation methods such as semantic nets and
rules. This area will need to be explored further, before we can be sure of what
techniques to use. In AVIEN [FF88], a expert system building tool, for example, we
developed a techniques based on having experts draw decision charts which laid out the
inferencing structure to use in a given expert system. This technique proved successful in
several domains including mineral identification and solder faults analysis. We
eventually developed a tool called GROK which allowed an expert to input knowledge as
a decision tree. GROK then automatically transformed the tree into rules.
Code analysis tools
The code analysis tools for DARE will be primarily static analysis tools (see section 4-3).
Such tools are used to get information about the structure and properties of source code
for systems considered in the domain analysis process. The tools we discuss are
representative of the static code analysis tools available today. The tools here are ordered
from less powerful to more powerful.
Cflow shows the call hierarchy of functions in a system, an important type of system
architectural information. It also reports total function calls, number of function
declared, levels of nesting, and how many times calls are made in functions.
Cscope parses C code and build a cross reference table that allows various kinds of
information to be reported.
CIA is a tool that extracts from C source code information about functions, files, data
types macros, and global variables, and stores this information in a relational database.
This information can then be accessed via CIAs reporting tools, by awk, or by other
database tools.
DARE will use complexity metrics such as NCSL (non-commentary source lines),
McCabe's cyclomatic complexity, and Halstead's metric to help quantify information
about the components and architectures in a domain, and about the potential of those
components and architectures for re-engineering. Reuse level metrics can be used to find
out how much of the software in a domain is already reused from external sources
(external reuse) and domain internal sources (internal reuse).
Automated commonalty analysis tools Finding the common parts of systems in a
domain is central to the process of domain analysis. While this activity is still primarily
manual, some tools can be used to help. One problem in finding common parts in a
domain is that similar architectures often have different names for their components and
variables. Tools such as those used for plagiarism detection in universities ignore
differences among programs due to surface features such as variable names and will be
used to detect system commonalties automatically.
Reuse library tools
Many reuse library tools have been reported. A recent review and summary [Frak90]
listed many of these systems, and found that the systems used many different indexing
strategies and implementation platforms. The indexing strategies fell into three major
categories, library and information science, AI, and hypertext. Nearly all systems in
39

_______________________________________________________________________

reuse inc.

industrial use are based on library science methods. The three library science methods
used most often are enumerated, faceted, and free text keyword. Recent experiments
[FP92] indicate that these methods are equally effective, but that enumerated supports
faster searching. Free text keyword is least expensive. Faceted has the advantage of
supporting the domain analysis method discussed above. Platforms used include IR
systems, DBMS, and AI systems. Though many commercial DBMS, IR, and AI systems
are available, there is currently no commercial reuse library system available.
Domain analysis interaction tools
The design of DARE's interaction tools for domain analysis is critical to its utility. While
the interface has not been worked out in detail, we have some ideas for making the
domain analysis interface friendly and useful. The text extraction and analysis tools will
provide lists of keywords and phrases. These will be grouped via a graphical interface
into concepts, and finally into facets. This grouping process will primarily be manual, but
term clustering tools will provide the user with initial concepts. These facets will be the
primary conceptual model for the domain.
In the same way, information from questionnaires and interviews will be textually
analyzed to identify concepts and facets. These will be combined with manual analysis to
yield knowledge for representation in the system. The reverse engineering and metrics
analysis tools will provide standard reports that can be analyzed manually or transformed
with tools. Commonality analysis tools will provide initial estimates of which portions of
systems will be common and therefore likely reusable components.
5.5- Architecture integration
In summary, the architecture in Figure 8 works as follows. There are three main types of
information that the system uses: document sources, expert sources, and code sources.
The initial processing of document sources are scanning automatically inputting
documents with a scanner, and lexical analysis breaking the text into words. Three
types of operation then take place filtering including stoplist and stemming activities,
the calculation of lexical affinities, and the creation of indexes. Document source
information is used in term clustering, or the organizing of terms into semantic clusters.
Some term clustering can be done automatically and some manually. Term clustering is
the first phase in faceted classification of the vocabulary in the domain.
Expert information is derived from questionnaires and manual knowledge acquisition.
This information will be captured using knowledge representations such as semantic nets
and rules. Simple rule based systems may then be constructed from these knowledge
representations. Such systems can be used to support domain architecture navigation.
Information from code sources is derived with static analysis tools of various
complexities. Some of these tools produce simple call graphs that reveal the functional
hierarchical structure of systems in a domain. Other tools can be used to derive and
graphically represent more general relationships among program objects. Metrics tools
will provide information about the complexity and reusability of components. This
program structure information is used to perform commonality analyses the
identification of common parts and structures of the various systems in the domain.

40

_______________________________________________________________________

reuse inc.

Tools similar to those used for plagiarism detection can also be used for finding
commonalities.
All domain information will be stored in a central library database. Users will interact
with the library via a windowed interface to extract and analyze data about domains.
DARE will provide tools to setup and administer such databases.
6- Conclusion
In this report, we defined the software reuse problem and domain analysis, provided a
survey of basic approaches to domain analysis, and discussed basic technology needed to
support automation of domain analysis including information retrieval (IR), artificial
intelligence (AI), static and dynamic analysis of software, interface environments, and
CASE tools. Our conclusion based on this work is that it is feasible to build an
environment (DARE) that can support many of the activities needed for domain analysis.
Some of the processes, primarily those regarding knowledge extraction and
representation will not be able to be completely automated, and will continue to need
manual support.
We have provided a high level functional architecture of DARE, and plan to use this
architecture to guide the future implementation of DARE. We believe that DARE, when
implemented, will be of significant value to software engineers engaged in the complex
task of domain analysis.
References
[AM88]

Agresty, W. and F. McGarry. The Minnowbrook Workshop on Software


Reuse: A Summary Report. Contract NAS 5-31500, Computer Sciences
Corporation, Systems Sciences Division, 4600 Powder Mill Rd., Beltsville,
MD 20705, March, 1988.

[Ara88]

Arango, G. Domain Engineering for Software Reuse. Ph.D. Thesis,


Department of Information and Computer Science, University of California,
Irvine, 1988.

[Bail91]

Moore, J.M. and S.C. Bailin. "Domain Analysis: Framework for Reuse." In
Domain Analysis and Software Systems Modeling, R. Prieto-Daz & G.
Arango (Eds.), pp: 179-203, IEEE Computer Society Press, Los Alamitos,
CA, 1991.

[Bato88] Batory, D.S. "Concepts for a Database System Compiler." In Proceedings of


ACM Principles of Database Systems Conference, 1988. Also in Domain
Analysis and Software Systems Modeling, R. Prieto-Daz & G. Arango (Eds.),
pp: 250-257, IEEE Computer Society Press, Los Alamitos, CA, 1991.
[Berg84] Berghel, H.L., and D.L. Sallach. Measurements of Program Similarity in
Identical Tasking Environments. SIGPLAN Notices, 19(8), 1984.

41

_______________________________________________________________________

reuse inc.

[Bigg89] T.J. Biggerstaff. Design Recovery for Maintenance and Reuse. IEEE
Computer 22(7):36-49, July, 1989.
[Cam87] CAMP, Common Ada Missile Packages, Final Technical Report, Vols. 1, 2,
and 3. AD-B-102 654, 655, 656. Air Force Armament Laboratory,
AFATL/FXG, Elgin AFB, FL, 1987.
[CDS86] Conte, S., H. Dunsmore, V. Shen, Software Engineering Models and Metrics,
Menlo Park, CA: Benjamin Cummings, 1986.
[CNR90] Chen, Y., M.Y. Nishimoto, and C.V. Ramamoorthy. "The C Information
Abstraction System." IEEE Transactions on Software Engineering,
16(3):325-334, March, 1990.
[Coad89] Coad, P. OOA- Object-Oriented Analysis, Object International, Inc., Austin,
TX, 1989.
[Cox86]

Cox, B.J. "Object-Oriented Programming, Software-ICs and System


Building." In Proceedings of the National Conference on Software
Reuseability and Maintainability. National Institute of Software Quality and
Productivity, Washington, D.C., September 10-11, 1986.

[Deva91] Devanbu, P., R.J. Brachman, P.G. Selfridge, and B.W. Ballard. "LaSSIE: A
Knowledge-Based Software Information System." Communications of the
ACM, 34(5):34-49, May, 1991.
[Dona81] Donalson, J.L., et. al. A Plagiarism Detection System. SIGCSE Bulletin,
13(1), 1981.
[DA88]

Attendees' proceedings, Domain Analysis: Object-Oriented Methodologies


and Techniques Workshop, OOPSLA, San Diego, CA, November, 1988.

[DM89]

Attendees' proceedings, Domain Modeling for Software Engineering


Workshop, OOPSLA, New Orleans, LA, 1989.

[DM91]

Proceedings of the Domain Modeling Workshop. 13th International


Conference on Software Engineering, Austin, TX, May 13, 1991.

[FB92]

Frakes, W.B. and Baeza-Yates, R. (Eds.), Information Retrieval: Data


Structures and Algorithms , Prentice-Hall, 1992 .

[FF88]

Frakes, W.B. and Fox, C.J., "An Expert System Subroutine Library for the
UNIX/C Environment", The AT&T Technical Journal , May/June, 1988.

[FFN91]

Frakes, W.B., Fox, C.J., Nejmeh, B.A., Software Engineering in the UNIX/C
Environment , Prentice-Hall, 1991.

[FP92]

Frakes, W.B. and Pole, T., "An Empirical Study of Representation Methods
for Reusable Software Components", submitted to IEEE Transactions on
Software Engineering. June, 1992.

42

_______________________________________________________________________

reuse inc.

[Fick88]

Fickas, S. and P. Nagarajan, Criting Software Specifications, IEEE


Software 5(6):37-47, November, 1988.

[Frak92]

Frakes, W., "Software Reuse, Quality, and Productivity", Proceedings of the


International Software Quality Exchange Forum,
Wilton, CT: Juran
Institute, March, 1992.

[Frak90]

Frakes, W.B., Gandel, P.B., "Representing Reusable Software", Information


and Software Technology , 32(10), December, 1990.

[Frak88]

Frakes, W. B. and Nejmeh, B. A., "An Information System for Software


Reuse", in Tracz, W. (Ed.), IEEE Tutorial: Software Reuse: Emerging
Technology , IEEE Computer Society, 1988.

[Frak84]

Frakes, W.B. "Term Conflation for Information Retrieval", in VanRijsbergen


C.J. Ed. Research and Development in Information Retrieval . Cambridge:
Cambridge University Press, 1984.

[Gilr89]

Gilroy, K.A., E.R. Comer, J.K. Grau and P.A. Merlet. Impact of Domain
Analysis on Reuse Methods. Final Report C04-087LD-0001-00, U.S. Army
Communications-Electronics Command, Ft. Monmouth, NJ, November,
1989.

[Hart86]

Hart, A., Knowledge Acquisition for Expert Systems, New York: McGrawHill, 1986.

[Jawo90] Jaworski, A., F. Hills, T.A. Durek, S. Faulk and J. Gaffney. A Domain
Analysis Process. Interim Report 90001-N, Software Productivity
Consortium, Herndon, VA, January, 1990.
[Kang90] Kang, K., S. Cohen, J. Hess, W. Novak, and S. Peterson. Feature-Oriented
Domain Analysis (FODA) Feasibility Study. CMU/SEI-90-TR-21. Software
Engineering Institute, Carnegie-Mellon University, Pittsburgh, PA,
November, 1990.
[Kidd87] Kidd, A. (Ed.), Knowledge Acquisition for Expert Systems: A Practical
Handbook, New York: Plenum Press, 1987.
[Lane79] Lanergan, R.G. and B.A. Poynton "Reusable Code: The Application
Development Technique of the Future." In Proceedings of the IBM
SHARE/GUIDE Software Symposium, IBM, Monterey, CA, October, 1979.
[Luba88] Lubars, M.D. Domain Analysis and Domain Engineering in IDeA. Technical
Report STP-295-88, MCC, Austin, TX, September, 1988.
[Maar91] Maarek, Y., Berry, D. , Kaiser, G. "An Information Retrieval Approach for
Automatically Constructing Software Libraries", IEEE Transactions on
Software Engineering, 17(8), August, 1991

43

_______________________________________________________________________

reuse inc.

[McCa85] McCain, R. "Reusable Software Component Construction: A ProductOriented Paradigm." In Proceedings of the 5th AIAA/ACM/NASA/IEEE
Computers in Aerospace Conference, Long Beach, CA, pp:125-135, October
21-23, 1985.
[McIl69] McIlroy, M.D. "Mass-produced Software Components". In Software Eng.
Concepts and Techniques, 1968 NATO Conf. Software Eng., ed. J.M. Buxton,
P. Naur, and B. Randell, pp 88-98, 1976.
[Neig80] Neighbors, J. Software Construction Using Components, Ph.D. dissertation,
Department of Information and Computer Science, University of California,
Irvine, 1980.
[Neig84] Neighbors, J. "The Draco Approach to Constructing Software from Reusable
Components." IEEE Transactions on Software Engineering, SE-10:564-573,
September 1984.
[Otte76]

Ottenstein, K.J. An Algorithmic Approach to the Detection and Prevention


of Plagiarism. SIGCSE Bulletin 8(5), 1976.

[Parn76]

Parnas, D.L. On the Design and Development of Program Families. IEEE


Transactions on Software Engineering SE-2(1):1-9, March, 1976.

[Prie87]

Prieto-Daz, R. "Domain Analysis for Reusability". In Proceedings of


COMPSAC87, pp: 23-29, Tokyo, Japan, October 7-9, 1987.

[Prie90]

Prieto-Daz, R. "Domain Analysis: An Introduction". ACM SIGSOFT


Software Engineering Notes (15)2:47-54, April, 1990.

[Prie91a] Prieto-Daz, R. Reuse Library Process Model. Final Report, STARS Reuse
Library Program, Contract F19628-88-D-0032, Task IS40, Electronic
Systems Division, Air Force Systems Command, USAF, Hanscom AFB, MA
01731, March, 1991.
[Prie91b] Prieto-Daz, R. "Implementing Faceted Classification for Software Reuse".
Communications of the ACM, 34(5):88-97, May, 1991.
[Prie91c] Prieto-Daz, R. "A Domain Analysis Methodology". In Proceedings of the
Workshop on Domain Modeling, pp 138-140, 13th International Conference
on Software Engineering, Austin, TX, May 13, 1991.
[RM87]

Proceedings of the Workshop on Software Reuse. Rocky Mountain Institute


of Software Engineering, Boulder, CO, October 14-16, 1987.

[RP89]

Attendees' proceedings, Reuse in Practice Workshop, Software Engineering


Institute, Pittsburgh, PA, 1989.

44

Vous aimerez peut-être aussi