Académique Documents
Professionnel Documents
Culture Documents
Sponsored by
Defense Advanced Research Projects Agency
Defense Small Business Innovation Research Program
DARPA Order No. 5916
Issued by U. S. Army Missile Command Under
Contract # DAAH01-92-C-R040
AUTHORS:
Dr. Rubn Prieto-Daz
Principal Investigator
Dr. Bill Frakes
Consultant
Mr. B.K. Gogia
Project Manager
Reuse, Inc.
12365 Washington Brice Rd.
Fairfax, VA 22033
703-620-5385
FAX: 703-620-5385
reuse inc.
FOREWORD
This Phase I Final Report was prepared by Reuse, Incorporated, 12365 Washington Brice
Rd., Fairfax, Virginia, 22033, under DARPA Phase I SBIR Contract No. DAAH01-92-CR040, issued by U.S. Army Missile Command.
The Reuse, Inc. personnel involved in this program and in writing this report are Dr.
Rubn Prieto-Daz, the Principal Investigator, Dr. Bill Frakes, an external consultant, and
Mr. B.K. Gogia, the Project Manager. The final report covers the period of performance
from February 20, 1992 through August 20, 1992. The final report was submitted by
Reuse, Inc. August 20, 1992.
Although this report is unclassified, its distribution is limited to U.S. Government
agencies only; pages containing confidential proprietary information have been marked
as such. Other requests for this document must be referred to Director, Defense
Advanced Research Projects Agency, 3701 North Fairfax Drive, Arlington VA 222031714.
THE VIEWS AND CONCLUSIONS CONTAINED IN THIS DOCUMENT ARE THOSE OF THE
AUTHORS AND SHOULD NOT BE INTERPRETED AS REPRESENTING THE OFFICIAL
POLICIES, EITHER EXPRESS OR IMPLIED, OF THE DEFENSE ADVANCED RESEARCH
PROJECTS AGENCY OR THE U.S. GOVERNMENT.
ii
Table of Contents
Executive Summary..................................................................................................................... 1
Status of phase I research ............................................................................................... 1
Research results.............................................................................................................. 1
Report overview ............................................................................................................. 3
1- Introduction............................................................................................................................. 5
1.1- The software reuse problem .................................................................................... 5
1.2- Domain analysis...................................................................................................... 6
2- Survey of Domain Analysis Methods ..................................................................................... 6
2.1- Historical perspective.............................................................................................. 7
2.2- Domain analysis methods ....................................................................................... 9
Prieto-Daz Approach........................................................................................ 9
FODA ................................................................................................................ 10
IDeA .................................................................................................................. 11
2.3- Summary of main features ...................................................................................... 11
2.4- Discussion of key activities..................................................................................... 12
2.5- Potential for automation.......................................................................................... 12
3- Domain Analysis in the STARS Reuse Library Process Model (SRLPM) ........................... 13
3.1- Primitive operations ................................................................................................ 15
Prepare domain information .............................................................................. 15
Classify domain entities .................................................................................... 15
Derive domain models....................................................................................... 16
Expand and verify models and classification .................................................... 17
3.2- Overview of domain analysis activities .................................................................. 18
3.3- Selected activities in the SRLPM which can be automated .................................... 18
4- Underlying Technologies for Automating Domain Analysis ................................................. 19
4.1- Information retrieval systems.................................................................................. 19
4.2- Artificial intelligence .............................................................................................. 21
4.3- Code static and dynamic analysis tools................................................................... 23
4.4- Interface environments............................................................................................ 25
4.5- CASE tools.............................................................................................................. 25
5- A Domain Analysis and Reuse Environment (DARE) ........................................................... 26
5.1- DARE architecture .................................................................................................. 26
5.2- DARE supported domain analysis process ............................................................. 28
Acquire domain knowledge (A2) ...................................................................... 32
Structure domain knowledge (A3) .................................................................... 32
Identify commonalities (A4) ............................................................................. 36
Generate Domain Models (A5) ......................................................................... 36
5.3- DARE functional model.......................................................................................... 36
5.4- Architecture components ........................................................................................ 38
Document analysis tools.................................................................................... 38
Domain expert knowledge extraction tools....................................................... 38
Code analysis tools............................................................................................ 39
Reuse library tools............................................................................................. 39
Domain analysis interaction tools ..................................................................... 40
5.5- Architecture integration .......................................................................................... 40
6- Conclusion .............................................................................................................................. 41
References ................................................................................................................................... 41
iii
_______________________________________________________________________
reuse inc.
DARE
A Domain Analysis and Reuse Environment
Executive Summary
Status of phase I research
Domain analysis (DA) holds the key for the systematic, formal, and effective practice of
software reuse. Proposed approaches and methods for DA assume that domain
information exists and is readily usable. Experience indicates, however, that acquiring
and structuring domain information is the bottleneck of DA. This Phase I research report
presents the architecture and functional analysis of a support environment to automate
parts of the acquisition and structuring activities of DA.
This Phase I study assesses the potential for automation of DA. Existing techniques and
tools, in particular those from information retrieval and expert systems development,
provide support for activities in the DA process. Many of these tools can be used
immediately while certain DA activities may require the creation of new tools. There is,
therefore, a definite potential for automating parts of DA provided a basic framework to
conduct DA exists.
The framework for conducting DA is provided by a modified RLPM. The RLPM or
Reuse Library Process Model, is a methodology developed by Reuse Inc. for the STARS
Program. It emphasizes the early analysis for acquiring and structuring domain
information. The RLPM converts the ad-hoc nature of DA into a repeatable procedure
with well defined, tangible outputs. The modified RLPM presented here organizes the
key activities of acquisition and structuring of domain information in a way that can be
supported by independent but coordinated sets of tools.
The study proposes the Domain Analysis and Reuse Environment (DARE) as a practical
and viable support environment for partially automating the early activities of DA. The
research reported demonstrates that, although DA is a difficult and complex process,
several of its activities deal with small independent steps that can be automated, thus
reducing the complexity of DA to an interactive activity of grouping and organizing the
outputs of these small steps.
This research report shows the main components of a DARE architecture and how these
components interact through data and control flows. It also describes the specific tools
required to implement DARE.
Research results
The technical objectives of this study have been two-fold: to determine if and to what
extent current domain analysis technology is supportive of a reuse based domain specific
software development paradigm, and to determine the potential for automating domain
analysis activities. Both objectives will accelerate a paradigm shift towards domain
specific reuse based development.
The research objectives focus on providing answers to the following key questions:
1
_______________________________________________________________________
reuse inc.
_______________________________________________________________________
reuse inc.
The answer is yes. This study presents a process model and an architecture to support
DA. The DARE environment is an integrated collection of tools that support domain
knowledge acquisition and structuring, as well as commonality analysis, model
development, and reuse. DARE is a highly interactive environment designed to facilitate
the intelligence intensive activities typical of DA.
DARE supports all parts of the domain analysis process, knowledge acquisition, concept
abstraction, classification, library population, and specification of reusable software.
Other features include library functions for search and retrieval, capture and analysis of
reuse metrics, and interfaces to other software development environments.
Report overview
The Phase I research effort shows that it is possible to partially automate DA and that it
can be done through a well orchestrated collection of tools operating under a well defined
process model. The following tasks were undertaken to demonstrate that DARE can be
developed successfully.
Task 1: Survey and analysis of current domain analysis methods.
Existing domain analysis methods and approaches were surveyed. The methods surveyed
include SEI's FODA [Kang90], MCC's IDeA heuristics [Luba88], SPC's domain analysis
process [Jawo90], IBM's product-oriented paradigm [McCa85], and Arango's learning
system approach [Ara88].
The survey includes listing of main features and discussion on similarities and
differences
Task 2- Detailed analysis of STARS Reuse Library Process Model
The activities of domain analysis were analyzed to determine their suitability for
automation. The analysis included each of the four activities proposed in the SRLPM
approach and their derived sub activities. Each activity is decomposed into several levels
of detail. The analysis determined to what extent these fine grained activities can be
automated. The outcomes of this analysis include:
Description of each low level activity of domain analysis in the STARS Reuse
Library Process Model.
Discussion on the potential for automation. How these activities relate to existing
tool capabilities and how feasible is it to adapt tools for these tasks.
Task 3- Selection of domain analysis activities with potential for automation
The purpose of this task was to select the DA activities that could be automated and to
develop a revised DA process model to integrate them into a coherent and rational
process that could be implemented by a coordinated collection of existing tools.
Task 4- Evaluation of tools and techniques that meet domain analysis requirements
The objective of this evaluation was to assess the availability of the technology
supporting DA activities. We surveyed automated reuse library systems and technology,
_______________________________________________________________________
reuse inc.
CASE technology provided by the Unix/C environment, and IR systems such as PLS
which we might use to construct the text analysis portion of DARE. We evaluated the
utility of each of these kinds of tools to support the domain analysis processes in our
model.
Task 5- Propose and specify a domain analysis and reuse support environment
The purpose of tasks one through four prepared the ground and built the basis for
specifying a domain analysis and reuse environment (DARE). Although DARE could
have been proposed without the effort of going through the first four tasks, a careful
assessment of the state of the art and existing technology was necessary if a realistic and
practical environment was to be proposed. Tasks one through four comprise a structured
research plan to determine the feasibility of DARE, and to provide the information
necessary to decide whether to pursue such an environment and what level of automation
to expect.
The DARE architecture (see figure 3) consists of a user interface, a domain analysis
support environment, and a software reuse library. Selected COTS (commercial of the
shelf) tools and tools specially designed to support reuse based domain specific
development are shared.
Summarizing the contents of this report, section one defines the software reuse problem
and domain analysis. Section two gives an historical overview of domain analysis
including a summary of the major domain analysis approaches, and their similarities and
differences. Section three presents the role of domain analysis in the STARS reuse
library process model. In section four we survey the underlying technologies for
automating domain analysis, including information retrieval (IR), artificial intelligence
(AI), static and dynamic analysis of software, interface environments, and CASE tools.
Section 5 presents the DARE architecture, providing a model and explanation of the
processes and tools in the model. Section six presents our conclusions.
_______________________________________________________________________
reuse inc.
1- Introduction
Domain analysis has become a topic of significant interest in the reuse community.
Domain analysis holds the key for the systematic, formal, and effective practice of
software reuse. Unfortunately, domain analysis is still an emerging technology and
practiced informally. There is a definite opportunity, however, for automating parts of the
domain analysis process. The domain analysis methodology proposed in the STARS
Reuse Library Process Model [Prie91a] can be used as a framework to identify parts of
the process that can be automated by adapting existing tools and techniques. The
opportunity for automation presented in this study is in the form of a Domain Analysis
and Reuse Environment (DARE). This section presents the reuse problem and its relation
to domain analysis.
1.1- The software reuse problem
One of the reasons software reuse has remained an elusive goal is the recurrent emphasis
on reusing code. Software reuse is still far from realizing the ideas of a software industry
based on interchangeable standard parts first proposed by Dough McIlroy over 20 years
ago [McIl69]. Reuse involves more than just code. It involves organizing and
encapsulating knowledge and experience, and setting the mechanisms and organizational
structures to make them available for reuse.
Software reuse is sensitive to several factors that make the simple hardware analogy of
software ICs [Cox86] difficult to apply. The context of the application domain, for
example, plays a critical role in the "reusability" of software. Software can not be
successfully reused in all domains. The reality is that narrow, well understood application
domains based on stable technologies and standardized architectures, such as compilers
(e.g., Lex and YACC) and database systems [Bato88], have demonstrated the significant
leverage that can be achieved with high level reuse. It is not simply a matter of going out
into the field and gathering up components to populate a repository. Casually assembled
libraries seldom are the basis of a high payoff reuse system. A reuse library offers
considerably more value when its collections consist of integrated packages of reusable
knowledge from a particular domain than if they consist of isolated and relatively
independent code components. A domain model in the form of a high level architecture,
for example, offers the potential reuser a basic structure to start building a new system.
Each element of the architecture can be implemented from library components specially
designed to meet the architecture requirements.
There is a need, therefore, to focus reuse on all the products of the software development
process such as requirements, specifications, designs, code, and test cases and plans. The
highest payoff is achieved by reusing high level representations of software products like
requirements and designs [Gilr89]. If we are able to reuse an existing software design
then we should be able to reuse its code implementation. We should, therefore, focus on
the process of capturing, organizing, and encapsulating such requirements and designs
for reuse.
_______________________________________________________________________
reuse inc.
_______________________________________________________________________
reuse inc.
Over 5000 production COBOL source programs were examined and classified. Three
major module classes were identified: edit, update, and report. They also discovered that
most business applications fall into one of three logic structures or design templates (i.e.,
domain architectures). These logic structures were standardized and a library was created
to make all classified components available for reuse. Several modules were also
redesigned to fit the standard logic structures. New applications became slight variations
of the standard logic structures and were built by assembling modules from the library.
Programmers were trained to use the library and to recognize when a logic structure
could be reused. The report quotes an average of 60% reused code in their new systems
and a net 50% increase in productivity over a period of six years.
The remaining efforts represented by bubbles in Figure 1 are discussed in detail below.
_______________________________________________________________________
reuse inc.
The Common Ada Missile Packages (CAMP) Project [Cam87] extended Neighbors'
ideas into larger systems. The CAMP Project is the first explicitly reported domain
analysis experience, and they acknowledge that "[domain analysis] is the most difficult
part of establishing a software reusability program". Neither Neighbors nor the CAMP
project address the issue of how to do domain analysis. Both focus on the outcome, not
on the process.
McCain [McCa85], from IBM Federal Systems Division, Houston, TX, made an initial
attempt at addressing this issue by integrating the concept of domain analysis into the
software development process. He proposed a "conventional product development
model" as the basis for a methodology to construct reusable components. The main
concern in this approach is how to identify, a priori, the areas of maximum reuse in a
software application. McCain developed his model into a standard practice within IBM.
Drawing in part from the above experiences, Prieto-Daz [Prie87] proposed a more
cohesive procedural model for domain analysis. This model is based on a methodology
for deriving specialized classification schemes in library science [Prie91b]. In deriving a
faceted classification scheme, the objective is to create and structure a controlled
vocabulary that is standard not only for classifying but also for describing titles in a
domain specific collection. This method was successfully applied at GTE Government
Systems. The Prieto-Daz method was later updated and revised for the STARS Reuse
Library Process Model [Prie91a, Prie91c]. This method is a substantial modification of
the earlier approach. The emphasis is on the analysis aspect, especially on knowledge
acquisition and knowledge structuring. This newer version of the Prieto-Daz method is
presented as a SADT model with potential for partial automation.
Synthesis is a software development method and support environment developed by the
Software Productivity Consortium (SPC). Synthesis is based on the concept of program
families [Parn76] and proposes the engineering of domains to enable application
generators. The Synthesis domain analysis process was first proposed in a report by
Jaworski [Jawo90]. It is based mainly on object oriented concepts [Coad89] with
emphasis on domain design and implementation. The report includes an example of
domain analysis on the SOCC (Satellite Operations Control Center) domain. The
example shows the products of domain analysis such as the SOCC domain definition, the
SOCC taxonomy, and the SOCC stabilities and variations, but falls short on explaining
the process to obtain those products.
More recently, the SEI has proposed the FODA (Feature Oriented Domain Analysis)
methodology [Kang90]. FODA adopts several concepts and recommendations from the
SPS report [Gilr89], and presents a comprehensive approach based on feature analysis.
The method is illustrated by a domain analysis of window management systems and
explains what are the outputs of domain analysis but remains vague about the process to
obtain them.
In the SPS (Software Productivity Solutions, Inc.) report, concepts from Prieto-Daz'
model were integrated with object oriented analysis techniques into a more complete
approach to domain analysis. The SPS report adds object orientation to the process of
creating a domain architecture, and to the creation of reusable components. The
suggested method remains very general about the analysis aspect, but very specific about
the creation of reusable Ada components. They conclude that knowledge acquisition,
_______________________________________________________________________
reuse inc.
knowledge-based guidance, data storage, retrieval, and environment integration are the
key factors for automating domain analysis.
IDeA, Intelligent Design Aid, is an experimental reuse based design environment
developed by MCC [Luba88]. It supports reuse of abstract software designs. IDeA
provides mechanisms that help users select and adapt design abstractions to solve their
software problems. IDeA and its successor ROSE-1, were created as proof-of-concept
tools to demonstrate reuse of high level software workproducts other than source code.
Arango [Ara88] focuses on the theoretical and methodological aspects of domain
analysis. He argues for explicit definitions of objectives, mechanisms, and performance
of a reuse system as a context for comparing and evaluating domain analysis. His view of
software reusability is that of a learning system where domain analysis is an ongoing
process of knowledge acquisition, concept formation, and concept validation. The
changing requirements syndrome in software development is seen as a natural learning
process, and resolved through an evolving infrastructure that receives its input from
domain analysis.
Domain analysis and domain modeling have become topics of significant interest in the
software engineering community. A recommendation from the 1987 Minnowbrook
Workshop on Software Reuse [AM88] suggested "concentrating on specific application
domains (as opposed to developing a general reusability environment)." Soon thereafter,
the Rocky Mountain Workshop on Software Reuse [RM87] acknowledged the lack of a
theoretical or methodological framework for domain analysis. More recent workshops
have addressed domain analysis and domain modeling directly [DA88, RP89, DM89].
The most recent was the Domain Modeling Workshop at the 13th ICSE, Austin, TX
[DM91] where several approaches to domain modeling and domain analysis were
presented.
Other related work includes tools that were originally designed for other purposes and
turned out to be supportive of domain analysis. In this category are Batory's Genesis
system for constructing database management systems [Bato88], CTA's KAPTUR
(Knowledge Acquisition for the Preservation of Tradeoffs and Understanding Rationales)
system for analyzing software systems [Bail91], AT&T's LaSSIE software information
system [Deva91], and MCC's DESIRE (Design Recovery) tool [Bigg89]. These tools
present a broad spectrum of techniques and approaches for automating certain aspects of
domain analysis.
2.2- Domain analysis methods
Several approaches to domain analysis have emerged in the last few years. Three have
been selected to illustrate the differences of objectives, methods, styles, and products.
Prieto-Daz Approach
The Prieto-Daz approach was developed for the STARS S increment as part of a model
for reuse libraries [Prie91a]. It is based on methods for deriving classification schemes in
library science and on methods for systems analysis. The process is a "sandwich"
_______________________________________________________________________
reuse inc.
approach where bottom-up activities are supported by the classification process and topdown activities by systems analysis.
The objective is to produce a domain model in the form of a generic architecture or
standard design for all systems or their instantiations in the domain. Such models provide
a common basis for writing requirements for new systems in the domain. In other words,
requirements for new systems are based on, or derived from, the domain model thus
insuring reuse at the design level. To guarantee such reuse, low level components must
act as building blocks for composing a skeleton design or architecture. This is
accomplished by the bottom-up identification and classification of low level common
functions and by standardizing their interfaces.
During the top-down stage, high level designs and requirements of current and new
systems are analyzed for commonality. The outcome includes a canonical structure
common to all systems in the domain, identification of stable and variable characteristics,
a generic functional model, and information on the interrelationships among the structure
elements. During bottom-up, low level requirements, source code, and documentation
from existing systems is analyzed to produce a preliminary vocabulary, a taxonomy, a
classification structure, and standard descriptors.
The outcomes of both approaches are then integrated into reusable structures. This
integration process consists of associating the products of the bottom-up analysis with the
structures derived by the top-down analysis. Standard descriptors, for example, represent
elemental components, either available or specified, by using a standard language and
vocabulary. Low level components for the generic architecture are defined with these
standard descriptors. The result is a natural match between high level generic models and
low level components where the domain models can be used as skeleton guides in the
construction of new applications.
FODA
Feature Oriented Domain Analysis (FODA) is a domain analysis methodology developed
by the Software Engineering Institute [Kang91]. The FODA method is based on
identifying features common to a class of systems. It is the product of studying and
evaluating several DA approaches. Although based mainly on Object Oriented
techniques, it borrows significantly from other approaches such as Prieto-Daz' faceted
approach, SPS'Ada based approach, and MCC's DESIRE design recovery tool.
The FODA method defines three basic activities: context analysis, domain modeling, and
architecture modeling. During context analysis, domain analysts interact with users and
domain experts to bound and scope the domain and to select sources of information.
Domain modeling produces a domain model in multiple views. The domain analyst
proposes the domain model to domain experts, users, and requirements analysts for
review. The resulting model includes four views: features model, entity-relationship
model, dataflow diagrams model, and state-transition model. A standard vocabulary is
also produced during domain modeling.
During architecture modeling, the domain analyst produces an architectural model that
consists of a process interaction model and a module structure chart. The objective of the
10
_______________________________________________________________________
reuse inc.
11
_______________________________________________________________________
reuse inc.
2- Analyze and classify domain entities The focus of this activity is to identify
specific low level functions and objects, or common features derived from legacy
systems, existing documentation, and future requirements. The objective is to
classify these entities into a standard framework. The framework may take the form
of a taxonomy, a semantic net, or a features model.
3- Structure domain knowledge The purpose of this activity is to associate common
functions to system components. A preliminary domain architecture is proposed to
define high level system components. This high level architecture is refined by
decomposing system components into more specific functions. The decomposition
(i.e., refinement) process is carried on by selecting common functions from the
classification or features framework.
4- Generate reusable structures Generating reusable structures is the process of
grouping common functions, attaching them to specific architectural components, and
generalizing these specific architectural components. The outcome are generic
reusable structures consisting of standard functions and standard interfaces. These
generic structures form a domain architecture where different implementations of
domain features are plug-compatible reusable components.
2.4- Discussion of key activities
Acquiring domain information is the central activity for domain analysis. Success of the
remaining activities depends on the quantity, relevance, and quality of the information
acquired. Any discussion of automating domain analysis must start with information
acquisition.
Analysis and classification of domain entities is usually a bottom-up process of
identifying and extracting information about specific functions mainly from current
applications. Classification includes abstraction and clustering to generate classes of
functions with common attributes. A top-down approach can also be used. When
conducted as a top-down process, systems specifications and future requirements are
analyzed to identify features common to all systems in the domain. Both bottom-up and
top-down approaches result in identification of common basic (i.e., primitive) functions.
Structuring domain knowledge into a domain architecture allows for a mapping of
common functions to system components and provides the basis for defining and
specifying reusable components. Generating reusable structures is a process of
encapsulating architecture components.
For the purpose of automating domain analysis, acquiring domain information and
analyzing and classifying that information are essential for developing domain
architectures. Current SEE (Software Engineering Environments) technology, to some
degree, support the implementation of an architecture (i.e., requirements) into reusable
components (i.e., code), but, support for acquiring and structuring domain information is
not yet available.
2.5- Potential for automation
There is a definite potential for automating parts of the domain analysis process. An
essential prerequisite to automation is a framework of properly structured activities. Such
12
_______________________________________________________________________
reuse inc.
a framework is provided by the STARS Reuse Library Process Model (SRLPM) method
for domain analysis. A key activity in the SRLPM is to prepare domain information and
one of the essential tasks for preparing domain information is knowledge acquisition. It
requires "reading" information from several inputs such as technical literature, existing
implementations, and current and future requirements.
Existing techniques in information retrieval can be used to automatically extract
information from these sources. In fact, experience in practicing domain analysis has
shown that knowledge extraction is a definite bottleneck in the process. Other proposed
domain analysis methods make the unrealistic assumption that knowledge and experience
are available and readily usable, giving the impression of a smooth and simple process.
Once we get through the knowledge acquisition step, domain analysis is a more tractable
problem. Our experience has been, however, that the initial stage in domain analysis
(acquiring and structuring knowledge) is the most difficult and time consuming.
To classify domain entities, for example, the SRLPM methodology prescribes keyword
extraction, concept grouping, and class definition. Existing tools and techniques from
information retrieval and object oriented design can be adopted and integrated to support
these steps. There are other very specific sub activities in the methodology, like thesaurus
construction, for which automated tools already exist. Extracting knowledge from experts
is much more complex, and requires human interaction such as interviews and group
meetings. There are, however, techniques and support tools for building expert systems
that can be adapted for this purpose.
In summary, this study explores and analyzes the feasibility of automating parts of the
domain analysis process under the framework of the STARS Reuse Library Process
Model, and proposes and specifies a domain analysis support environment that automates
parts of the domain analysis process.
3- Domain Analysis in the STARS Reuse Library Process Model (SRLPM)
A Domain Analysis Process Model was developed as part of the SRLPM [Prie91a]. It is
based on methods for deriving classification schemes in library science and on methods
for systems analysis. The process is a "sandwich" approach where bottom-up activities
are supported by the classification process and top-down activities by systems analysis.
The domain analysis process is divided in four activities :
1- Prepare domain information (A51)
2- Classify domain entities (A52)
3- Derive domain models (A53)
4- Expand and verify models and classification (A54)
Figure 2 shows a detailed SADT model of how these activities are related, their inputs,
controls, and outputs as well as their respective enabling mechanisms. The domain
models produced consist of several partial products including domain definition, domain
architecture, domain classification scheme, vocabulary, functional model, and reusable
structures. Inputs are information on recommended and related domains, and existing
(i.e., legacy) systems.
13
_______________________________________________________________________
14
reuse inc.
_______________________________________________________________________
reuse inc.
15
_______________________________________________________________________
reuse inc.
clustering exercise where common terms are grouped and abstracted. A basic scheme is
postulated and then expanded and verified. The final step is the construction of thesauri
for vocabulary control. Vocabulary control is achieved by grouping synonyms around a
single concept.
The inputs to A52 include specific domain knowledge in the form of functional
requirements, documentation, source code from existing systems, and feedback
information regarding unclassified entries. Unclassified entries are components that can
not be classified with the current classification scheme. This information is used to
update and expand the classification scheme.
The outputs are a faceted classification scheme and a basic taxonomy (a taxonomy can
also be seen as an inheritance structure with some entity-relationship model
characteristics like agregation and generalization). The classification scheme includes a
controlled vocabulary and facet definitions. Together, taxonomic classes and facets form
a classification structure.
The classification scheme generates classification templates in the form of standard
descriptors. These standard descriptors are the basic conceptual units that form the
interface between domain architectures and reusable components. Standard descriptors
are high level mini-specs for a class of components. In the UNIX tools domain, for
example, "locate/identifier/table" is a standard descriptor for a component identified by
the statement "Locate line identifiers in data table", The terms "locate", "identifier", and
"table" represent concepts in a controlled vocabulary. Standard descriptors can also be
represented graphically as E-R models or semantic nets, thus facilitating component
encapsulation and parameterization.
The control inputs to A52 are domain definition and domain architecture. Both support
conceptual analysis. The domain definition, for example, includes global requirements
statements used to select keywords for the controlled vocabulary. The mechanism is the
domain analysis team. In its minimal form, it consists of a domain analyst, a domain
expert, and a librarian.
Derive domain models
Activity A53, derive domain models, consolidates the top-down analysis with the
bottom-up approach. The objective here is to produce a generic functional architecture or
model using functional decomposition as practiced in software systems design. The top
level in this decomposition is the preliminary architecture derived in A51 above. The
resulting functional model serves as a structure to consolidate the standard descriptors
from A52. The idea is to describe or specify low level functions using standard
descriptors from the controlled vocabulary, and to associate them with architectural
components.
The results are layers of functional clusters associated with architecture elements. The
core activity in A53 thus, is to assign these functional clusters to architecture units, and to
define their relationships. What results is a model that supports design and development
of new systems by composing reusable components. The output is the generic functional
model.
The inputs to A53 include:
16
_______________________________________________________________________
reuse inc.
1) The classification structure from A52 including vocabulary, classes, and standard
descriptors,
2) Specific domain knowledge in the form of global requirements and system
commonality information, (from A51),
3) Requirements from existing systems, and
4) Feedback information to update and refine the model. This last input is in the form of
earlier versions of the model (labeled incomplete models in diagram A5).
The control inputs are the domain definition and architecture produced by A51. The basic
architecture is used as a reference for the top-down decomposition. The mechanism is the
domain analysis team. In this case the analyst and an expert are the minimum required.
Expand and verify models and classification
Activity A54 expands and verifies domain models and the classification structure. The
objective in A54 is to update the products of domain analysis as new information from
current and future systems becomes available.
Activity A54 illustrates the continuing nature of the domain analysis process. All
products of domain analysis are reviewed continuously and remain in a permanent state
of evolution. The question of when a domain analysis is complete is still a research
question and is not discussed here. For the sake of practicality, any outcome of domain
analysis, as discussed in the SRLPM document, is considered usable. The library process
model assumes an implicit feedback loop for all its activities and a reviewing process for
all its outputs.
New requirements, vocabulary, functional components, and limitations and constraints
are extracted from existing and new systems. The classification structure and the
functional model are updated to accommodate them. The models are then verified against
existing systems. Specific designs and requirements from existing and future systems are
checked to see if they are represented by the generic model. That is, to check if the model
includes all expected instances of systems in the domain.
The output of this activity are reusable structures. Reusable structures are parts of the
generic functional model or parts of the classification structure that have been verified
and are complete enough to be reusable. These subsets of the domain models are
encapsulated and included in the customized library system to drive the construction
process. A reusable structure can be as simple as a standard descriptor (i.e., requirement
statement) for a class of functions or as elaborate as an architecture for a class of systems.
An example of the latter is the architecture for a general compiler; scanner which
includes a lexical analyzer, syntax analyzer, semantic analyzer, code generator, and
symbol table handler.
The inputs to A54 are the generic functional model from A53 and any new information
from current and future systems. The control inputs include domain definition and
architecture from A51, the classification structure from A52, and abstractions of the
generic functional model from A53. These abstractions are used to help identify reusable
structures. The mechanism is the domain analysis team.
17
_______________________________________________________________________
reuse inc.
Analyze domain
18
_______________________________________________________________________
reuse inc.
Extract domain knowledge Knowledge extraction from text documents can be done
automatically using of-the-shelf information retrieval tools. Knowledge extraction
from experts requires interviewing and questioning, but their written responses can be
processed automatically.
Identify major functional units Reverse engineering tools, specifically code
restructuring and requirements analysis tools, can be used to identify major functional
units from legacy systems.
Find interrelationships Relationships among components and major functional
units can also be identified by using revere engineering tools. Tools that produce call
structures and cross referencing information are useful for this task.
Specify generic subsystems The process of identifying generic subsystems within
specific system structures or designs can be assisted with program similarity analysis
tools.
Classify subsystems Subsystem classification can be assisted with the same kind of
tools used to find interrelationships.
Identify objects and operations This task can be automated with information
retrieval tools.
Abstract and classify Conceptual clustering tools and AI knowledge representation
techniques can be used to assist in this task.
Construct thesauri There are off-the-shelf tools to help construct thesauri.
Group descriptors/classes under functional units Conceptual clustering tools can
also be used to assist in this task.
Rearrange structure Architecture revision can be done semiautomatically with
reverse engineering tools.
Define reusable structures Reusable structures are refinements of previous domain
analysis models. A combination of reverse engineering, information retrieval, AI, and
conceptual clustering tools can be used to assist in this task.
19
_______________________________________________________________________
reuse inc.
File
Structure
Query
Term
Document Hardware
Operations Operations Operations
Boolean
Extended Boolean
Probabilistic
String Search
Vector Space
Flat File
Inverted File
Signature
Pat Trees
Graphs
Hashing
Feedback
Parse
Boolean
Cluster
Stem
Weight
Thesaurus
Stoplist
Truncation
Parse
VonNeumann
Display
Parallel
Cluster
IR-Specific
Rank
Optical Disk
Sort
Mag. Disk
Field Mask
Assign ID's
Viewed another way, each facet is a design decision point in developing the architecture
for an IR system. The system designer must choose, for each facet, from the alternative
terms for that facet. A given IR system can be classified by the facets and facet values,
called terms, that it has. For example, the CATALOG system [Frak84] can be classified
as shown in Table 3:
Table 3: Facets and Terms for CATALOG IR System
Facets
File Structure
Query Operations
Term Operations
Hardware
Document Operations
Conceptual Model
Terms
Inverted file
Parse, Boolean
Stem, Stoplist, Truncation
VonNeumann, Mag. Disk
parse, display, sort, field mask, assign ID's
Boolean
20
_______________________________________________________________________
reuse inc.
IR systems are capable of automatically extracting important vocabulary from text and
using it to index documents, in this case reusable software components. Frakes and
Nejmeh [Frak88] first proposed using IR systems to classify and store reusable software
components. They discussed the use of Catalog for this purpose, and defined the types of
indexing fields that might be useful Since then, several other uses of IR systems as reuse
libraries have been reported (see [Frak90] for a review). One such system of special
interest is GURU [Maar91]. Guru uses simple phrase extraction techniques to
automatically derive two word phrases from text. Both individual keywords and phrases
composed of those keywords may be useful for identifying domain vocabulary and
concepts in DARE.
In terms of Table 3, the key operations that will be needed for automatic vocabulary and
concept identification are text parsing, stoplist operations, stemming, and truncation.
Text parsing involves breaking the text into its component keywords. Stemming is a
process of removing prefixes and suffixes from words so that related words can be
grouped together. Stemming, for example, is capable of conflating variants such as
domain and domains into a single concept. Truncation is manual stemming. Truncation
will be a useful feature in the search portion of DARE, since it will help users search
using related keywords.
4.2- Artificial intelligence
Artificial intelligence (AI) the use of computers to do tasks that previously required
human intelligence is a broad field with an immense literature. Of special interest to
reuse and domain analysis are the AI sub fields of knowledge extraction/acquisition and
knowledge representation.
All AI systems are constrained by the amount and quality of the knowledge they contain.
Builders of AI systems have found that the so called knowledge acquisition barrier is
usually the most difficult problem they must solve in building successful AI systems.
Most knowledge acquisition techniques are manual and rely on various interviewing
techniques There are also some automatic techniques based on machine learning.
[Hart86] and [Kidd87] provide a good summary of knowledge acquisition techniques.
One technique for eliciting knowledge, for example, is to ask the same question in
different ways. Say, for example, that an expert is asked to identify important sub
domains, but is unable to do so. The interviewer might then ask him how the
organization is structured, recognizing that organizations are often structured along
domain specific lines.
Once knowledge has been acquired, it must be represented in a form that the machine can
use to do useful work. Many knowledge representation techniques have been proposed.
Some of the more popular are production rules, frames, and semantic nets. All of these
techniques have been used to represent reusable software components (see [Frak90] for a
review).
A semantic net is a directed graph whose nodes correspond to conceptual objects and
whose arcs correspond to relationships between those objects. Production rules are
perhaps the best known of knowledge representation formalisms because of their use in
21
_______________________________________________________________________
reuse inc.
many expert system shells. Production rules might be used to classify reusable
components based on attribute value pairs as follows.
IF algorithm needed IS a sort
AND sort speed required IS fastest
AND implementation language IS C
THEN sort to use IS quicksort.c
Frames are data structures, composed of slots and fillers, used for knowledge
representation. For example,
Sort
AKO
:algorithm
operation
:ordering
operands
:data objects
The slots here are in the left hand column, and the fillers in the right following the
colons. Sort is a special slot which names the frame. AKO, which stands for a kind of
is commonly used in frame representations. While the knowledge in frames can be
accessed and used in many kinds of inferencing, the inferencing technique usually
associated with frames is inheritance. In inheritance, one frame inherits slots, and
optionally fillers, from another.
Two useful factors to consider when evaluating knowledge representations are
representational adequacy and heuristic power Representational adequacy refers to how
much one can express with the representation. A simple list of keywords, for example,
has poor representational adequacy because the syntactic and semantic relationships
between the keywords is missing. Heuristic power refers to the kinds of inferencing one
can do with the representation. Logical inference, for example, is a powerful type of
processing only possible with some representations. One appeal of the knowledge based
approach to reuse representation is that the representations offer a powerful way of
expressing the relationships between system components. This is probably extremely
important for helping a user understand the function of components. It may be, for
example, that information of the form, component transforms input A to output B under
condition X will be important for expressing knowledge about a code domain.
[Deva91] have used frames to represent software components from System 75, a large
switching system consisting of about 1 million lines of C code. Their reuse system, called
Lassie, attempts to support multiple views of System 75: a functional view which
describes what components are doing, an architectural view of the hardware and software
configuration, a feature view that relates basic system functions to features such as call
forwarding, and a code view which captures how code components relate to each other.
Their taxonomy is based on four categories: object, action, doer, and state. For example,
a frame using this taxonomy might describe an object called a user-connect-action which
is both a network-action and a call-control-action having a generic process as an actor,
that attempts to move from a call-state to a talking-state by using a bus-controller. One
interesting aspect of such a scheme is the way it allows the relationships among the
various conceptual parts of a system to be made explicit. In addition to the domain
specific information about System 75, Lassie also stores information about the
22
_______________________________________________________________________
reuse inc.
environment (UNIX and C) used to develop the system. Lassie also uses a natural
language interface as part of its query facility.
Systems such as Lassie demonstrated that AI can be used to support reuse and domain
analysis, but also again showed the problems associated with knowledge acquisition for
such systems. Lassie's authors managed to represent only a small part of System 75, and
most of that had to be done manually. Practical problems of getting enough of the
System 75 engineer's time to get the knowledge and validate the results was also a
problem.
4.3- Code static and dynamic analysis tools
One important kind of knowledge about software systems is derived by static and
dynamic analysis of code. Static analysis tools analyze code before execution, and
provide information about program structure and attributes. Dynamic analysis tools are
used to monitor the runtime behavior and performance of code. There are many such
tools available for various languages and programming environments. We will use the
Unix/C environment for purposes of our discussion. See [FFN91] for a fuller discussion
of this topic, and the tools that follow.
Cf and cflow produce C system function hierarchies. Such information takes the form
function1
function2
function3
....
function-n
which says that function1 calls function2 which calls function3 and so on. This
information can be used for a variety of purposes including identifying potentially
reusable components, and calculation of reuse metrics [Frak92].
Another important class of static analysis tools compute software metrics, i.e.
quantitative indicators of software attributes. Many such metrics have been reported in
the literature [CDS86]. Many of these measure software complexity. ccount, for example,
computes simple metrics such as NCSL and CSL and their ratios.
Another important source of static information is make. Information from the make utility
can be used to determine structure at the file level.
Cscope parses C code and builds a cross reference table that allows the following kinds
of information to be reported.
List references to this C symbol:
List functions called by this function:
List functions calling this function:
List lines containing this text string:
List file names containing this text string:
List files #including this file:
23
_______________________________________________________________________
reuse inc.
An even more powerful static analysis tool is CIA [CNR90], a tool that extracts from C
source code information about functions, files, data types, macros, and global variables,
and stores this information in a relational database. This information can then be accessed
via CIAs reporting tools, by awk, or by other database tools. One type of information,
for example, that CIA captures is the calling relationship among functions. CIAs
reporting tools can then be used, with other graphics tools, to generate a graphical
representation of the information in the database. Such graphical representations can be
used as preliminary domain architectures during domain analysis.
Some of the types of information CIA output might be used to derive are:
Software metrics -
0m2.91s
user
0m0.25s
sys
0m0.30s
This information shows that 2.91 milliseconds of elapsed clock time took place during
execution of the who command, 0.25 milliseconds of time was spent in the who program,
and it took 0.30 milliseconds for the kernel to execute the command.
The prof utility can be used to measure the time each function in the system takes to
execute. When code is compiled in this way, a file called mon.out is generated during
24
_______________________________________________________________________
reuse inc.
execution. This file contains data correlated with the object file and readable by prof to
produce a report of the time consumed by individual functions in the program.
When run on an example program, prof produced the following report:
%Time
50.0
50.0
0.0
0.0
Seconds
Cumsecs
0.02 0.02
1
0.02 0.03
8
0.00
0.03
0.00 0.03
1
#Calls
msec/call
17.
2.
Name
_read
_write
0.
0.
_monitor
_creat
This report shows that the function read was called once, and this call took 17
milliseconds, about 50% of the total execution time for the program. The functions
monitor and creat show zero execution times because they used amounts of time too
small for prof to measure. Such information might be used in the analysis of real time
domains where execution efficiency is critical. Designers of reusable components have
reported that component efficiency is a primary factor in their acceptance by users. Thus,
dynamic analysis tools will play a key role in a reuse and domain analysis environment.
4.4- Interface environments
Environments for building high quality bit mapped interfaces have proliferated in the past
few years. NeWs and X based environments are the most common, with X based
environments becoming the standard. The X environment is, in fact, a good example of
successful horizontal reuse. Many powerful tools have been written on top of X,
including higher level toolsets such as Motif, and interface generators such as TAE from
NASA. These tools make it relatively easy to develop high quality window based
environments with graphics.
A high quality interface will be very important for DARE since so much data input and
manipulation will be required. One of the key challenges in developing DARE is to
identify good interface strategies and data representations. We will probably use an X
based interface development environment because of its power and portability across
platforms.
4.5- CASE tools
A CASE (computer aided software engineering) toolset supports the activities of
software engineering through analysis, software views, and repositories. The UNIX
programmers workbench (PWB), containing tools such as cflow and CIA, is an example
of a set of coordinated tools of this type. There are also many commercial CASE tools
on the market. Some that we may consider for DARE are Cadre's Teamwork, Interactive
Development Environments STP (software through pictures), and Softbench from
Hewlett-Packard. These tools, as does UNIX PWB, provide support for reverse
engineering software, which will provide a good source of knowledge about a domain.
25
_______________________________________________________________________
reuse inc.
26
_______________________________________________________________________
DOMAIN
ANALYSIS
SUPPORT
domain
analyst
reuse inc.
interfaces
domain
expert
systems
designer
REUSE
LIBRARY
COTS &
SPECIAL
TOOLS
support
software
engineer
librarian
SOFTWARE
CONSTRUCTION
SUPPORT
Figure 3- High Level DARE Architecture
The domain analysis support part of DARE supports specific domain analysis activities.
A common interface is proposed to integrate existing and new tools. The outcome of
these tools are tangible domain analysis products such as domain taxonomies, domain
vocabularies, systems architectures, standard designs, software specifications, reusable
code components, and specifications for new components.
The software construction support would enable users to select library assets for building
new systems. A domain architecture, for example, which serves as a framework to search
the library, could be used to select components explicitly designed to fit parts of the
architecture. Similarly, standard designs could be used for selecting the appropriate
reusable code components. The interface could also allow other environments to use
DARE's facilities.
DARE is not intended to support tasks already covered by existing software development
environments. Such tasks include code development (compiling, editing, debugging),
project management support, system maintenance, etc. DARE will provide support to
systems and software designers in selecting reusable components. The actual reuse-based
construction and development of new systems would be conducted in their respective
environments.
27
_______________________________________________________________________
reuse inc.
28
_______________________________________________________________________
Reuse
Strategy
Organization
Objectives
Domain Knowledge
Domain Experience
Existing Systems
Do
DARE Supported
Domain Analysis
Expert Knowledge
Domain
Experts
reuse inc.
DARE
Domain Definition
Recorded Domain Knowledge
Domain Structures
Domain Models
Domain
Analysts
PURPOSE: To illustrate a practical domain analysis process with potential for automation
and to identify the kinds of tools required for each process stage.
VIEWPOINT: DARE Architect/Developer
Figure 5 shows the SADT level A0 decomposition. It consists of five main activities:
A1- Define Domain
A2- Acquire Domain Knowledge
A3- Structure Domain Knowledge
A4- Identify Commonalities
A5- Generate Domain Models
A2 through A5 are the activities that will be supported by DARE. A1, although a
necessary step, is assumed to be conducted outside the context of DARE. The output of
Define Domain is a domain definition and it controls, together with a reuse strategy,
the knowledge acquisition process.
Acquire Domain Knowledge will be supported by knowledge acquisition tools. Some
of these tools are fully automatic such as scanners, compilers, reuse libraries, and editors
or text processors, while others are interactive and semi-automatic like questionnaire
templates and interview guidelines. Knowledge is acquired from the three main inputs:
29
_______________________________________________________________________
reuse inc.
existing systems (i.e., source code), domain related documents, and experts. Knowledge
from documents will be extracted automatically while knowledge from experts will be
converted semiautomatically to text form first, using interviews and questionnaires.
Source code from existing systems will be selected manually based on quality,
documentation, and relevance, and then re-structured using reengineering tools. Not all
source code will be selected. Domain analysts and domain experts are essential support
agents. The output is recorded domain knowledge which includes: scanned documents,
answered questionnaires, recorded interviews, surveys, and processable source code.
The objective in structuring domain knowledge is to create domain structures suitable for
commonality analysis. Such structures include: faceted classification, domain
vocabulary, high level functional descriptions, design rationale charts, SADT diagrams,
systems code structures, data dictionaries, survey reports, and knowledge structures.
Tools that support this process include: lexical analyzers, keyword filters, indexing
support, numeric and conceptual clustering, thesaurus construction, reverse engineering,
statistical analysis, and tools that support semantic net construction and production rule
development. Most structuring activities will be done automatically. Structuring expert
knowledge, however, will be conducted semiautomatically.
Domain structures and recorded domain knowledge are used to identify commonalities
using conceptual clustering techniques, reverse engineering, and code similarity detection
tools. Commonality analysis is a highly interactive activity that requires easy and
effective access to all recorded domain knowledge and all domain structures. Both
activities: identify commonalities (A4) and generate domain models (A5), must be
conducted concurrently, and are connected with a continuous feedback link. Interactive
support through a common interface is essential for conducting these activities.
The final outputs are domain models. Domain models provide the basis for designing and
implementing reusable components, for providing requirements standards, and for
supporting reuse-based development. Domain models are in a continuous evolution
process, and are fed back to A2 for refinement. The domain models produced through
DARE will be concise, well defined, and practical. These include: a common vocabulary,
a common architecture, a classification scheme, and functional specifications for reusable
components.
30
_______________________________________________________________________
31
reuse inc.
_______________________________________________________________________
reuse inc.
32
_______________________________________________________________________
33
reuse inc.
_______________________________________________________________________
reuse inc.
The output of structuring scanned documents are conceptual structures. The objective is
to extract concepts from text and organize them into structures that relate those concepts
to each other and to domain entities. These conceptual structures convey domain
understanding and support the creation of domain models. The conceptual structures
produced by this activity will include: a preliminary faceted classification, a domain
vocabulary, functional descriptions of domain components, and a domain taxonomy. The
production of these structures will be automatic. The tools to be used include: lexical
analyzers, filters (i.e., stop lists), indexing tools, clustering tools, and thesaurus
construction tools. Most of these tools are readily available from commercial information
retrieval systems.
The focus in reengineering selected systems (A32) is in recovering the original structures
of these systems through reverse engineering techniques. Source code is structured and
reverse engineered to produce designs. Functional specifications are used to verify and
validate the resulting designs and to support the creation of systems architectures. The
outputs of this activity are implementation structures that include: code structures (i.e.,
structured code), systems designs, specific architectures, data dictionaries, ER diagrams,
and data structures. These outputs will be generated automatically with off-the-shelf
reengineering and reverse engineering tools.
Structuring knowledge from experts is an interactive process aimed at producing
knowledge structures such as semantic nets, ER diagrams, behavior models, design
rationales, and production rules. Textual input from answered questionnaires and expert
interviews is processed interactively by domain analysts using knowledge structuring
tools. Knowledge structuring tools include semantic net construction tools, conceptual
clustering support tools, statistical analysis tools, and integrated tools such as AVIEN
(see section 5.4).
Coordinated control and direction to structuring knowledge from experts is provided by
conceptual and implementation structures and by the domain specification. All outputs
from these three activities are used by A4 to identify commonalities.
34
_______________________________________________________________________
35
reuse inc.
_______________________________________________________________________
reuse inc.
36
_______________________________________________________________________
37
reuse inc.
_______________________________________________________________________
reuse inc.
_______________________________________________________________________
reuse inc.
will be represented using knowledge representation methods such as semantic nets and
rules. This area will need to be explored further, before we can be sure of what
techniques to use. In AVIEN [FF88], a expert system building tool, for example, we
developed a techniques based on having experts draw decision charts which laid out the
inferencing structure to use in a given expert system. This technique proved successful in
several domains including mineral identification and solder faults analysis. We
eventually developed a tool called GROK which allowed an expert to input knowledge as
a decision tree. GROK then automatically transformed the tree into rules.
Code analysis tools
The code analysis tools for DARE will be primarily static analysis tools (see section 4-3).
Such tools are used to get information about the structure and properties of source code
for systems considered in the domain analysis process. The tools we discuss are
representative of the static code analysis tools available today. The tools here are ordered
from less powerful to more powerful.
Cflow shows the call hierarchy of functions in a system, an important type of system
architectural information. It also reports total function calls, number of function
declared, levels of nesting, and how many times calls are made in functions.
Cscope parses C code and build a cross reference table that allows various kinds of
information to be reported.
CIA is a tool that extracts from C source code information about functions, files, data
types macros, and global variables, and stores this information in a relational database.
This information can then be accessed via CIAs reporting tools, by awk, or by other
database tools.
DARE will use complexity metrics such as NCSL (non-commentary source lines),
McCabe's cyclomatic complexity, and Halstead's metric to help quantify information
about the components and architectures in a domain, and about the potential of those
components and architectures for re-engineering. Reuse level metrics can be used to find
out how much of the software in a domain is already reused from external sources
(external reuse) and domain internal sources (internal reuse).
Automated commonalty analysis tools Finding the common parts of systems in a
domain is central to the process of domain analysis. While this activity is still primarily
manual, some tools can be used to help. One problem in finding common parts in a
domain is that similar architectures often have different names for their components and
variables. Tools such as those used for plagiarism detection in universities ignore
differences among programs due to surface features such as variable names and will be
used to detect system commonalties automatically.
Reuse library tools
Many reuse library tools have been reported. A recent review and summary [Frak90]
listed many of these systems, and found that the systems used many different indexing
strategies and implementation platforms. The indexing strategies fell into three major
categories, library and information science, AI, and hypertext. Nearly all systems in
39
_______________________________________________________________________
reuse inc.
industrial use are based on library science methods. The three library science methods
used most often are enumerated, faceted, and free text keyword. Recent experiments
[FP92] indicate that these methods are equally effective, but that enumerated supports
faster searching. Free text keyword is least expensive. Faceted has the advantage of
supporting the domain analysis method discussed above. Platforms used include IR
systems, DBMS, and AI systems. Though many commercial DBMS, IR, and AI systems
are available, there is currently no commercial reuse library system available.
Domain analysis interaction tools
The design of DARE's interaction tools for domain analysis is critical to its utility. While
the interface has not been worked out in detail, we have some ideas for making the
domain analysis interface friendly and useful. The text extraction and analysis tools will
provide lists of keywords and phrases. These will be grouped via a graphical interface
into concepts, and finally into facets. This grouping process will primarily be manual, but
term clustering tools will provide the user with initial concepts. These facets will be the
primary conceptual model for the domain.
In the same way, information from questionnaires and interviews will be textually
analyzed to identify concepts and facets. These will be combined with manual analysis to
yield knowledge for representation in the system. The reverse engineering and metrics
analysis tools will provide standard reports that can be analyzed manually or transformed
with tools. Commonality analysis tools will provide initial estimates of which portions of
systems will be common and therefore likely reusable components.
5.5- Architecture integration
In summary, the architecture in Figure 8 works as follows. There are three main types of
information that the system uses: document sources, expert sources, and code sources.
The initial processing of document sources are scanning automatically inputting
documents with a scanner, and lexical analysis breaking the text into words. Three
types of operation then take place filtering including stoplist and stemming activities,
the calculation of lexical affinities, and the creation of indexes. Document source
information is used in term clustering, or the organizing of terms into semantic clusters.
Some term clustering can be done automatically and some manually. Term clustering is
the first phase in faceted classification of the vocabulary in the domain.
Expert information is derived from questionnaires and manual knowledge acquisition.
This information will be captured using knowledge representations such as semantic nets
and rules. Simple rule based systems may then be constructed from these knowledge
representations. Such systems can be used to support domain architecture navigation.
Information from code sources is derived with static analysis tools of various
complexities. Some of these tools produce simple call graphs that reveal the functional
hierarchical structure of systems in a domain. Other tools can be used to derive and
graphically represent more general relationships among program objects. Metrics tools
will provide information about the complexity and reusability of components. This
program structure information is used to perform commonality analyses the
identification of common parts and structures of the various systems in the domain.
40
_______________________________________________________________________
reuse inc.
Tools similar to those used for plagiarism detection can also be used for finding
commonalities.
All domain information will be stored in a central library database. Users will interact
with the library via a windowed interface to extract and analyze data about domains.
DARE will provide tools to setup and administer such databases.
6- Conclusion
In this report, we defined the software reuse problem and domain analysis, provided a
survey of basic approaches to domain analysis, and discussed basic technology needed to
support automation of domain analysis including information retrieval (IR), artificial
intelligence (AI), static and dynamic analysis of software, interface environments, and
CASE tools. Our conclusion based on this work is that it is feasible to build an
environment (DARE) that can support many of the activities needed for domain analysis.
Some of the processes, primarily those regarding knowledge extraction and
representation will not be able to be completely automated, and will continue to need
manual support.
We have provided a high level functional architecture of DARE, and plan to use this
architecture to guide the future implementation of DARE. We believe that DARE, when
implemented, will be of significant value to software engineers engaged in the complex
task of domain analysis.
References
[AM88]
[Ara88]
[Bail91]
Moore, J.M. and S.C. Bailin. "Domain Analysis: Framework for Reuse." In
Domain Analysis and Software Systems Modeling, R. Prieto-Daz & G.
Arango (Eds.), pp: 179-203, IEEE Computer Society Press, Los Alamitos,
CA, 1991.
41
_______________________________________________________________________
reuse inc.
[Bigg89] T.J. Biggerstaff. Design Recovery for Maintenance and Reuse. IEEE
Computer 22(7):36-49, July, 1989.
[Cam87] CAMP, Common Ada Missile Packages, Final Technical Report, Vols. 1, 2,
and 3. AD-B-102 654, 655, 656. Air Force Armament Laboratory,
AFATL/FXG, Elgin AFB, FL, 1987.
[CDS86] Conte, S., H. Dunsmore, V. Shen, Software Engineering Models and Metrics,
Menlo Park, CA: Benjamin Cummings, 1986.
[CNR90] Chen, Y., M.Y. Nishimoto, and C.V. Ramamoorthy. "The C Information
Abstraction System." IEEE Transactions on Software Engineering,
16(3):325-334, March, 1990.
[Coad89] Coad, P. OOA- Object-Oriented Analysis, Object International, Inc., Austin,
TX, 1989.
[Cox86]
[Deva91] Devanbu, P., R.J. Brachman, P.G. Selfridge, and B.W. Ballard. "LaSSIE: A
Knowledge-Based Software Information System." Communications of the
ACM, 34(5):34-49, May, 1991.
[Dona81] Donalson, J.L., et. al. A Plagiarism Detection System. SIGCSE Bulletin,
13(1), 1981.
[DA88]
[DM89]
[DM91]
[FB92]
[FF88]
Frakes, W.B. and Fox, C.J., "An Expert System Subroutine Library for the
UNIX/C Environment", The AT&T Technical Journal , May/June, 1988.
[FFN91]
Frakes, W.B., Fox, C.J., Nejmeh, B.A., Software Engineering in the UNIX/C
Environment , Prentice-Hall, 1991.
[FP92]
Frakes, W.B. and Pole, T., "An Empirical Study of Representation Methods
for Reusable Software Components", submitted to IEEE Transactions on
Software Engineering. June, 1992.
42
_______________________________________________________________________
reuse inc.
[Fick88]
[Frak92]
[Frak90]
[Frak88]
[Frak84]
[Gilr89]
Gilroy, K.A., E.R. Comer, J.K. Grau and P.A. Merlet. Impact of Domain
Analysis on Reuse Methods. Final Report C04-087LD-0001-00, U.S. Army
Communications-Electronics Command, Ft. Monmouth, NJ, November,
1989.
[Hart86]
Hart, A., Knowledge Acquisition for Expert Systems, New York: McGrawHill, 1986.
[Jawo90] Jaworski, A., F. Hills, T.A. Durek, S. Faulk and J. Gaffney. A Domain
Analysis Process. Interim Report 90001-N, Software Productivity
Consortium, Herndon, VA, January, 1990.
[Kang90] Kang, K., S. Cohen, J. Hess, W. Novak, and S. Peterson. Feature-Oriented
Domain Analysis (FODA) Feasibility Study. CMU/SEI-90-TR-21. Software
Engineering Institute, Carnegie-Mellon University, Pittsburgh, PA,
November, 1990.
[Kidd87] Kidd, A. (Ed.), Knowledge Acquisition for Expert Systems: A Practical
Handbook, New York: Plenum Press, 1987.
[Lane79] Lanergan, R.G. and B.A. Poynton "Reusable Code: The Application
Development Technique of the Future." In Proceedings of the IBM
SHARE/GUIDE Software Symposium, IBM, Monterey, CA, October, 1979.
[Luba88] Lubars, M.D. Domain Analysis and Domain Engineering in IDeA. Technical
Report STP-295-88, MCC, Austin, TX, September, 1988.
[Maar91] Maarek, Y., Berry, D. , Kaiser, G. "An Information Retrieval Approach for
Automatically Constructing Software Libraries", IEEE Transactions on
Software Engineering, 17(8), August, 1991
43
_______________________________________________________________________
reuse inc.
[McCa85] McCain, R. "Reusable Software Component Construction: A ProductOriented Paradigm." In Proceedings of the 5th AIAA/ACM/NASA/IEEE
Computers in Aerospace Conference, Long Beach, CA, pp:125-135, October
21-23, 1985.
[McIl69] McIlroy, M.D. "Mass-produced Software Components". In Software Eng.
Concepts and Techniques, 1968 NATO Conf. Software Eng., ed. J.M. Buxton,
P. Naur, and B. Randell, pp 88-98, 1976.
[Neig80] Neighbors, J. Software Construction Using Components, Ph.D. dissertation,
Department of Information and Computer Science, University of California,
Irvine, 1980.
[Neig84] Neighbors, J. "The Draco Approach to Constructing Software from Reusable
Components." IEEE Transactions on Software Engineering, SE-10:564-573,
September 1984.
[Otte76]
[Parn76]
[Prie87]
[Prie90]
[Prie91a] Prieto-Daz, R. Reuse Library Process Model. Final Report, STARS Reuse
Library Program, Contract F19628-88-D-0032, Task IS40, Electronic
Systems Division, Air Force Systems Command, USAF, Hanscom AFB, MA
01731, March, 1991.
[Prie91b] Prieto-Daz, R. "Implementing Faceted Classification for Software Reuse".
Communications of the ACM, 34(5):88-97, May, 1991.
[Prie91c] Prieto-Daz, R. "A Domain Analysis Methodology". In Proceedings of the
Workshop on Domain Modeling, pp 138-140, 13th International Conference
on Software Engineering, Austin, TX, May 13, 1991.
[RM87]
[RP89]
44